ML 3.0 Smoothed Aggregation User's Guide - Trilinos

8 downloads 0 Views 338KB Size Report
ML is configured and built using the GNU autoconf [4] and automake [5] tools. It can be configured and build as a standalone package without or with Aztec 2.1 ...
SAND2004–2195 Unlimited Release Printed May 2004

ML 3.0 Smoothed Aggregation User’s Guide

Marzio Sala Computational Math & Algorithms Sandia National Laboratories P.O. Box 5800 Albuquerque, NM 87185-1110 Jonathan J. Hu and Ray S. Tuminaro Computational Math & Algorithms Sandia National Laboratories P.O. Box 0969 Livermore, CA 94551-0969

Abstract ML is a multigrid preconditioning package intended to solve linear systems of equations Ax = b where A is a user supplied n × n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. ML should be used on large sparse linear systems arising from partial differential equation (PDE) discretizations. While technically any linear system can be considered, ML should be used on linear systems that correspond to things that work well with multigrid methods (e.g. elliptic PDEs). ML can be used as a stand-alone package or to generate preconditioners for a traditional iterative solver package (e.g. Krylov methods). We have supplied support for working with the Aztec 2.1 and AztecOO iterative package [15]. However, other solvers can be used by supplying a few functions. This document describes one specific algebraic multigrid approach: smoothed aggregation. This approach is used within several specialized multigrid methods: one for the eddy current formulation for Maxwell’s equations, and a multilevel and domain decomposition method for symmetric and nonsymmetric systems of equations (like elliptic equations, or compressible and incompressible fluid dynamics problems). Other methods exist within ML but are not described in this document. Examples are given illustrating the problem definition and exercising multigrid options.

3

(page intentionally left blank)

4

Contents 1 Notational Conventions

8

2 Overview

8

3 Multigrid Background

8

4 Configuring and Building ML 4.1 Building in Standalone Mode . . . . . . . . . . . . . . 4.2 Building with Aztec 2.1 Support . . . . . . . . . . . 4.3 Building with Trilinos Support (RECOMMENDED) 4.3.1 Enabling Third Party Library Support . . . . . 4.3.2 Enabling Profiling . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

9 10 10 10 11 12

5 ML and Epetra: Getting Started with the MultiLevelPreconditioner Class 13 5.1 Example 1: ml example epetra preconditioner.cpp . . . . . . . . . . . . . . . 13 5.2 Example 2: ml example epetra preconditioner 2level.cpp . . . . . . . . . . . 15 6 Parameters for the ML Epetra::MultiLevelPreconditioner 6.1 Setting Options on a Specific Level . . . . . . . . . . . . . . 6.2 General Usage of the Parameter List . . . . . . . . . . . . . 6.3 Default Parameter Settings for Common Problem Types . . 6.4 Commonly Used Parameters . . . . . . . . . . . . . . . . . . 6.5 List of All Parameters for MultiLevelPreconditioner Class . . 6.5.1 General Options . . . . . . . . . . . . . . . . . . . . . 6.5.2 Aggregation Parameters . . . . . . . . . . . . . . . . 6.5.3 Smoothing Parameters . . . . . . . . . . . . . . . . . 6.5.4 Coarsest Grid Parameters . . . . . . . . . . . . . . .

Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

16 17 17 18 20 20 20 22 22 23

7 Advanced Usage of ML

24

8 Multigrid & Smoothing Options

25

9 Smoothed Aggregation Options 27 9.1 Aggregation Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 9.2 Interpolation Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 10 Advanced Usage of ML and Epetra

29

11 Using ML without Epetra 30 11.1 Creating a ML matrix: Single Processor . . . . . . . . . . . . . . . . . . . . 30 11.2 Creating a ML matrix: Multiple Processors . . . . . . . . . . . . . . . . . . 32 12 Visualization Capabilities

35

5

13 ML Functions AZ ML Set Amat . . . . . . . . . . . . . . . . . AZ set ML preconditioner . . . . . . . . . . . . ML Aggregate Create . . . . . . . . . . . . . . ML Aggregate Destroy . . . . . . . . . . . . . . ML Aggregate Set CoarsenScheme Coupled . . ML Aggregate Set CoarsenScheme MIS . . . . . ML Aggregate Set CoarsenScheme Uncoupled . ML Aggregate Set CoarsenScheme METIS . . . ML Aggregate Set CoarsenScheme ParMETIS . ML Aggregate Set DampingFactor . . . . . . . ML Aggregate Set MaxCoarseSize . . . . . . . . ML Aggregate Set NullSpace . . . . . . . . . . ML Aggregate Set SpectralNormScheme Calc . ML Aggregate Set SpectralNormScheme Anorm ML Aggregate Set Threshold . . . . . . . . . . ML Create . . . . . . . . . . . . . . . . . . . . . ML Destroy . . . . . . . . . . . . . . . . . . . . ML Gen Blocks Aggregates . . . . . . . . . . . ML Gen Blocks Metis . . . . . . . . . . . . . . ML Gen CoarseSolverSuperLU . . . . . . . . . ML Gen MGHierarchy UsingAggregation . . . . ML Gen SmootherAmesos . . . . . . . . . . . . ML Gen SmootherAztec . . . . . . . . . . . . . ML Gen Smoother BlockGaussSeidel . . . . . . ML Gen Smoother GaussSeidel . . . . . . . . . ML Gen Smoother Jacobi . . . . . . . . . . . . ML Gen Smoother SymGaussSeidel . . . . . . . ML Gen Smoother VBlockJacobi . . . . . . . . ML Gen Smoother VBlockSymGaussSeidel . . . ML Gen Solver . . . . . . . . . . . . . . . . . . ML Get Amatrix . . . . . . . . . . . . . . . . . ML Get MyGetrowData . . . . . . . . . . . . . ML Get MyMatvecData . . . . . . . . . . . . . ML Get MySmootherData . . . . . . . . . . . . ML Init Amatrix . . . . . . . . . . . . . . . . . ML Iterate . . . . . . . . . . . . . . . . . . . . ML Operator Apply . . . . . . . . . . . . . . . ML Operator Get Diag . . . . . . . . . . . . . . ML Operator Getrow . . . . . . . . . . . . . . . ML Set Amatrix Getrow . . . . . . . . . . . . . ML Set Amatrix Matvec . . . . . . . . . . . . . ML Set ResidualOutputFrequency . . . . . . . ML Set Smoother . . . . . . . . . . . . . . . . . ML Set Tolerance . . . . . . . . . . . . . . . . .

6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 48 48 49 50 51 52 52 53 54 55 55 56 56 57 58 58 59 59 60 61 61 62 63

ML Solve MGV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

63

1

Notational Conventions

In this guide, we show typed commands in this font: % a_really_long_command The character % indicates any shell prompt1 . Function names are shown as ML Gen Solver. Names of packages or libraries as reported in small caps, as Epetra. Mathematical entities are shown in italics.

2

Overview

This guide describes the use of an algebraic multigrid method within the ML package. The algebraic multigrid method can be used to solve linear system systems of type Ax = b

(1)

where A is a user supplied n × n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. ML is intended to be used on (distributed) large sparse linear systems arising from partial differential equation (PDE) discretizations. While technically any linear system can be considered, ML should be used on linear systems that correspond to things that work well with multigrid methods (e.g. elliptic PDEs). The ML package is used by creating a ML object and then associating a matrix, A, and a set of multigrid parameters which describe the specifics of the solver. Once created and initialized, the ML object can be used to solve linear systems. This manual is structured as follows. Multigrid and multilevel methods are briefly recalled in Section 3. The process of configuring and building ML is outlined in Section 4. Section 5 shows the basic usage of ML as a black-box preconditioner for Epetra matrices. The definition of (parallel) preconditioners using ML Epetra::MultiLevelPreconditioner is detailed. This class only requires the linear system matrix, and a list of options. Available parameters for ML Epetra::MultiLevelPreconditioner are reported in Section 6. More advanced uses of ML are presented in Section 7. Here, we present how to define and fine-tune smoothers, coarse grid solver, and the multilevel hierarchy. Multigrid options are reported in Section 8. Smoothing options are reported in Section 9, where we also present how to construct a user’s defined smoother. Advanced usage of ML with Epetra objects is reported in Section 10. Section 11 reports how to define matrices in ML format without depending on epetra. Section 12 detailes the (limited) visualization capabilities of ML.

3

Multigrid Background

A brief multigrid description is given (see [1], [6], or [7] for more information). A multigrid solver tries to approximate the original PDE problem of interest on a hierarchy of grids and use ‘solutions’ from coarse grids to accelerate the convergence on the finest grid. A simple multilevel iteration is illustrated in Figure 1. In the above method, the Sk1 ()’s and Sk2 ()’s 1 For simplicity, commands are shown as they would be issued in a Linux or Unix environment. Note, however, that ML has and can be built successfully in a Windows environment.

8

/* Solve Ak u = b (k is current grid level) proc multilevel(Ak , b, u, k) u = Sk1 (Ak , b, u); if ( k 6= Nlevel − 1) Pk = determine interpolant( Ak ); rˆ = PkT (b − Ak u) ; Aˆk+1 = PkT Ak Pk ; v = 0; multilevel(Aˆk+1 , rˆ, v, k + 1); u = u + Pk v; u = Sk2 (Ak , b, u);

*/

Figure 1: High level multigrid V cycle consisting of ‘Nlevel’ grids to solve (1), with A 0 = A.

are approximate solvers corresponding to k steps of pre and post smoothing, respectively. These smoothers are discussed in Section 8. For now, it suffices to view them as basic iterative methods (e.g. Gauss-Seidel) which effectively smooth out the error associated with the current approximate solution. The Pk ’s (interpolation operators that transfer solutions from coarse grids to finer grids) are the key ingredient that are determined automatically by the algebraic multigrid method2 . For the purposes of this guide, it is important to understand that when the multigrid method is used, a hierarchy of grids, grid transfer operators (Pk ), and coarse grid discretizations (Ak ) are created. To complete the specification of the multigrid method, smoothers must be supplied on each level. There are several smoothers within ML or an iterative solver package can be used, or users can write their own smoother (see Section 8).

4

Configuring and Building ML

ML is configured and built using the GNU autoconf [4] and automake [5] tools. It can be configured and build as a standalone package without or with Aztec 2.1 support (as detailed in Section 4.1 and 4.2), or as a part of the Trilinos framework [8] (as described in Section 4.3). Even though ML can be compiled and used as a standalone package, the recommended approach is to build ML as part of the Trilinos framework, as a richer set of features are then available. ML has been configured and built successfully on a wide variety of operating systems, and with a variety of compilers (as reported in Table 1). Operating System Linux IRIX N32, IRIX 64, HPUX, Solaris, DEC ASCI Red CPlant Windows

Compilers(s) GNU and Intel Native Native and Portland Group Native Microsoft

Table 1: Main operating systems and relative compilers supported by ML. 2 The

Pk ’s are usually determined as a preprocessing step and not computed within the iteration.

9

Although it is possible to configure directly in the ML home directory, we strongly advise against this. Instead, we suggest working in an independent directory and configuring and building there. 4.1

Building in Standalone Mode

To configure and build ML as a standalone package without any Aztec support, do the following. It’s assumed that the shell variable $ML_HOME identifies the ML directory. % % % %

cd $ML_HOME mkdir standalone cd standalone $ML_HOME/configure --disable-epetra --disable-aztecoo \ --prefix=$ML_HOME/standalone % make % make install The ML library file libml.a and the header files will be installed in the directory specified in --prefix. 4.2

Building with Aztec 2.1 Support

To enable the supports for Aztec 2.1, ML must be configured with the options reported in the previous section, plus --with-ml aztec2 1 (defaulted to no). All of the Aztec 2.1 functionality that ML accesses is contained in the file ml_aztec_utils.c. In principal by creating a similar file, other solver packages could work with ML in the same way. For the Aztec users there are essentially three functions that are important. The first is AZ ML Set Amat which converts Aztec matrices into ML matrices by making appropriate ML calls (see Section 11.1 and Section 11.2). It is important to note that when creating ML matrices from Aztec matrices information is not copied. Instead, wrapper functions are made so that ML can access the same information as Aztec. The second is ML Gen SmootherAztec that is used for defining Aztec iterative methods as smoothers (discussed in Section 8 and Section 13). The third function, AZ set ML preconditioner, can be invoked to set the Aztec preconditioner to use the multilevel ‘V’ cycle constructed in ML. Thus, it is possible to invoke several instances of Aztec within one solve: smoother on different multigrid levels and/or outer iterative solve. 4.3

Building with Trilinos Support (RECOMMENDED)

We recommend to configure and build ML as part of the standard Trilinos build and configure process. In fact, ML is built by default if you follow the standard Trilinos configure and build directions. Please refer to the Trilinos documentation for information about the configuration and building of other Trilinos packages. To configure and build ML through Trilinos, you may need do the following (actual configuration options may vary depending on the specific architecture, installation, and user’s need). It’s assumed that shell variable $TRILINOS_HOME identifies the Trilinos directory, and, for example, that we are compiling under LINUX and MPI. 10

% % % %

cd $TRILINOS_HOME mkdir LINUX_MPI cd LINUX_MPI $TRILINOS_HOME/configure --with-mpi-compilers \ --prefix=$TRILINOS_HOME/LINUX_MPI % make % make install If required, other Trilinos and ML options can be specified in the configure line. A complete list of ML options is given in Section 4.3.1 and 4.3.2. You can also find a complete list and explanations by typing ./configure --help in the ML home directory. 4.3.1

Enabling Third Party Library Support

ML can be configured with the following third party libraries (TPLs): SuperLU, SuperLU dist, Metis, and ParMetis. It can take advantage of the following Trilinos packages: Ifpack, Teuchos, Triutils, Amesos. Through Amesos, ML can interface with the direct solvers Klu, Umfpack , SuperLU, SuperLU dist 3 , Mumps. It is assumed that you have already built the appropriate libraries (e.g., libsuperlu.a) and have the header files. To configure ML with one of the above TPLs, you must enable the particular TPL interface in ML. All of the options below are disabled by default. The same configure options that one uses to enable certain other Trilinos packages also enables the interfaces to those packages within ML: --enable-epetra Enable support for the Epetra package. --enable-aztecoo

Enable support for the AztecOO package.

--enable-amesos

Enables support for the Amesos package. Amesos is an interface with several direct solvers. ML supports Umfpack [2], Klu, SuperLU dist (1.0 and 2.0), Mumps [14]. This package is used only in function ML Gen SmootherAmesos.

--enable-teuchos

Enables support for the Teuchos package. This package is used only in the definition of class ML Epetra::MultiLevelPreconditioner (see Section 5). and by the Amesos smoother

--enable-triutils

Enables support for the Triutils package. ML uses Triutils only in some examples, to create the linear system matrix.

--enable-ifpack

Enable support for the Ifpack package [9]. Ifpack is used only to create smoothers via ML Gen SmootherIfpack.

3 Currently,

ML can support SuperLU dist directly (without Amesos support), or through Amesos.

11

--enable-anasazi

Enable support for the Anasazi package. Anasazi is a high level interface package for various eigenvalue computations.

The following configure line options enable interfaces in ML to certain TPLs. --with-ml metis Enables interface for Metis [12]. --with-ml parmetis2x

Enables interface for ParMetis, version 2.x.

--with-ml parmetis3x

Enables interface for ParMetis [11], version 3.x.

--with-ml superlu

Enables ML interface for serial SuperLU [3]. The ML interface to SuperLU is deprecated in favor of the Amesos interface.

--with-ml superlu dist

Enables ML interface for SuperLU dist [3]. The ML interface to SuperLU dist is deprecated in favor of the Amesos interface.

For Metis, ParMETIS, and the ML interface to SuperLU and SuperLU dist, the user must specify the location of the header files, with the option --with-incdirs=include-locations (Header files for Trilinos libraries are automatically located if ML is built through the Trilinos configure.) In order to link the ML examples, the user must indicate the location of all the enabled packages’ libraries4 , with the option --with-ldflags=lib-locations The user might find useful the option --disable-examples which turns off compilation and linking of the examples. More details about the installation of Trilinos can be found at the Trilinos web site, http://software.sandia.gov/Trilinos and [10, Chapter 1]. 4.3.2

Enabling Profiling

All of the options below are disabled by default. --enable-ml timing This prints out timing of key ML routines. --enable-ml flops

This enables printing of flop counts.

Timing and flop counts are printed when the associated object is destroyed. 4 An example of configuration line that enables Metis and ParMetis might be as follows: ./configure --with-mpi-compilers --enable-ml metis --enable-ml parmetis3x --with-cflags="-I$HOME/include" --with-cppflags="-I$HOME/include" --with-ldflags="-L$HOME/lib/LINUX MPI -lparmetis-3.1 -lmetis-4.0" .

12

5

ML and Epetra: Getting Started with the MultiLevelPreconditioner Class

In this Section we show how to use ML as a preconditioner to Epetra and AztecOO through the MultiLevelPreconditioner class5 in the ML Epetra namespace.6 Although limited to algebraic multilevel preconditioners, this allows the use of ML as a black-box preconditioner. The MultiLevelPreconditioner class automatically constructs all the components of the preconditioner, using the parameters specified in a Teuchos parameter list. The constructor of this class takes as input an Epetra RowMatrix pointer and a Teuchos parameter list7 . In order to compile, it may also be necessary to include the following files: ml_config.h (as first ML include), Epetra_ConfigDefs.h (as first Epetra include), Epetra_RowMatrix.h, Epetra_MultiVector.h, Epetra_LinearProblem.h, and AztecOO.h. Check the Epetra and AztecOO documentation for more details. Additionally, the user must include the header file "ml_epetra_preconditioner.h". Also note that the macro HAVE_CONFIG_H must be defined either in the user’s code or as a compiler flag. 5.1

Example 1: ml example epetra preconditioner.cpp

We now give a very simple fragment of code that uses the MultiLevelPreconditioner. For the complete code, see $ML_HOME/examples/ml_example_epetra_preconditioner.cpp. (In order to be effectively compiled, this example requires ML to be configured with option --enable-triutils; see Section 4.) The linear operator A is derived from an Epetra RowMatrix, Solver is an AztecOO object, and Problem is an Epetra LinearProblem object. #include "ml_include.h" #include "ml_epetra_preconditioner.h" #include "Teuchos_ParameterList.hpp" ... Teuchos::ParameterList MList; // set default values for smoothed aggregation in MLList ML_Epetra::SetDefaults("SA",MLList); // overwrite with user’s defined parameters MLList.set("max levels",6); MLList.set("increasing or decreasing","decreasing"); MLList.set("aggregation: type", "MIS"); MLList.set("coarse: type","Amesos-KLU"); 5 The

MultiLevelPreconditioner class is derived from the Epetra RowMatrix class. does not rely on any particular matrix format or iterative solver. Examples of using of ML as a preconditioner for user-defined matrices (i.e., non-Epetra matrices) are reported in Section 11.1 and 11.2. 7 In order to use the MultiLevelPreconditioner class, ML must be configured with options -enable-epetra --enable-teuchos. 6 ML

13

// create the preconditioner ML_Epetra::MultiLevelPreconditioner * MLPrec = new ML_Epetra::MultiLevelPreconditioner(A, MLList, true); // create an AztecOO solver AztecOO Solver(Problem) // set preconditioner and solve Solver.SetPrecOperator(MLPrec); Solver.SetAztecOption(AZ_solver, AZ_gmres); Solver.Iterate(Niters, 1e-12); ... delete MLPrec; We now detail the general procedure to define the MultiLevelPreconditioner. First, the user defines a Teuchos parameter list8 . Table 2 briefly reports the most important methods of this class. set(Name,Value) get(Name,DefValue) subList(Name)

Add entry Name with value and type specified by Value. Any C++ type (like int, double, a pointer, etc.) is valid. Get value (whose type is automatically specified by DefValue). If not present, return DefValue. Get a reference to sublist List. If not present, create the sublist.

Table 2: Some methods of Teuchos::ParameterList class.

Input parameters are set via method set(Name,Value), where Name is a string defining the parameter, and Value is the specified parameter, that can be any C++ object or pointer. A complete list of parameters available for class MultiLevelPreconditioner is reported in Section 6. The parameter list is passed to the constructor, together with a pointer to the matrix, and a boolean flag. If this flag is set to false, the constructor will not create the multilevel hierarchy until when MLPrec->ComputePreconditioner() is called. The hierarchy can be destroyed using MLPrec->Destroy()9 . For instance, the user may define a code like: // A is still not filled with numerical values ML_Epetra::MultiLevelPreconditioner * MLPrec = new ML_Epetra::MultiLevelPreconditioner(A, MLList, false); // compute the elements of A ... // now compute the preconditioner 8 See

the Teuchos documentation for a detailed overview of this class. suggest to create the preconditioning object with new and to free memory with delete. Some MPI calls occur in Destroy(), so the user should not call MPI Finalize() or delete the communicator used by ML before the preconditioning object is destroyed. 9 We

14

MLPrec->ComputePreconditioner(); // solve the linear system ... // destroy the previously define preconditioner, and build a new one MLPrec->Destroy(); // re-compute the elements of A // now re-compute (if required) the preconditioner MLPrec->ComputePreconditioner(); // re-solve the linear system In this fragment of code, the user defines the ML preconditioner, but the preconditioner is created only with the call ComputePreconditioner(). This may be useful, for example, when ML is used in conjunction with nonlinear solvers (like Nox [13]). 5.2

Example 2: ml example epetra preconditioner 2level.cpp

As a second example, here we explain with some details the construction of a 2-level domain decomposition preconditioner, with a coarse space defined using aggregation. File $ML HOME/examples/ml example epetra preconditioner 2level.cpp reports the entire code. In the example, the linear system matrix A, coded as an Epetra CrsMatrix, corresponds to the discretization of a 2D Laplacian on a Cartesian grid. x and b are the solution vector and the right-hand side, respectively. The AztecOO linear problem is defined as Epetra_LinearProblem problem(&A, &x, &b); AztecOO solver(problem); We create the Teuchos parameter list as follows: ParameterList MLList; ML_Epetra::SetDefaults("DD", MLList); MLList.set("max levels",2); MLList.set("increasing or decreasing","increasing"); MLList.set("aggregation: type", "METIS"); MLList.set("aggregation: nodes per aggregate", 16); MLList.set("smoother: pre or post", "both"); MLList.set("coarse: type","Amesos-KLU"); MLList.set("smoother: type", "Aztec"); The last option tells ML to use the Aztec preconditioning function as a smoother. All Aztec preconditioning options can be used as ML smoothers. Aztec requires an integer vector options and a double vector params. Those can be defined as follows: int options[AZ_OPTIONS_SIZE]; double params[AZ_PARAMS_SIZE]; 15

AZ_defaults(options,params); options[AZ_precond] = AZ_dom_decomp; options[AZ_subdomain_solve] = AZ_icc; MLList.set("smoother: Aztec options", options); MLList.set("smoother: Aztec params", params); The last two commands set the pointer to options and params in the parameter list 10 . The ML preconditioner is created as in the previous example, ML_Epetra::MultiLevelPreconditioner * MLPrec = new ML_Epetra::MultiLevelPreconditioner(A, MLList, true); and we can check that no options have been mispelled, using MLPrec->PrintUnused(); The AztecOO solver is called using, for instance, solver.SetPrecOperator(MLPrec); solver.SetAztecOption(AZ_solver, AZ_cg_condnum); solver.SetAztecOption(AZ_kspace, 160); solver.Iterate(1550, 1e-12); Finally, some (limited) information about the preconditioning phase are obtained using cout GetOutputList(); Note that the input parameter list is copied in the construction phase, hence later changes to MLList will not affect the preconditioner. Should the user need to modify parameters in the MLPrec’s internally stored parameter list, he can get a reference to the internally stored list: ParameterList & List = MLPrec->GetList(); and then directly modify List.

6

Parameters for the ML Epetra::MultiLevelPreconditioner Class

In this section we give general guidelines for using the MultiLevelPreconditioner class effectively. The complete list of input parameters is also reported. 10 Only the pointer is copied in the parameter list, not the array itself. Therefore, options and params should not go out of scope before the destruction of the preconditioner.

16

6.1

Setting Options on a Specific Level

Some of the parameters that affect MultiLevelPreconditioner can in principle be different from level to level. By default, the set method for the MultiLevelPreconditioner class affects all levels in the multigrid hierarchy. In order to change a setting on a particular level (say, d), the string “(level d)” is appended to the option string (note that a space must separate the option and the level specification). For instance, assuming decreasing levels starting from 4, one could set the aggregation schemes as follows: MLList.set("aggregation: type","Uncoupled"); MLList.set("aggregation: type (level 1)","METIS"); MLList.set("aggregation: type (level 3)","MIS"); If the finest level is 0, and one has 5 levels, the code will use Uncoupled for level 0, METIS for levels 1 and 2, then MIS for levels 3 and 4. In §6.5, parameters that can be set differently on individual levels are denoted with the symbol ? (that is not part of the parameter name). Note that some parameters (e.g., Uncoupled-MIS aggregation) correspond to quantities that must be the same at all levels. 6.2

General Usage of the Parameter List

All ML options can have a common prefix, specified by the user in the construction phase. For example, suppose that we require ML: (in this case with a trailing space) to be the prefix. The constructor will be char Prefix[] = "ML: "; ML_Epetra::MultiLevelPreconditioner * MLPrec = new ML_Epetra::MultiLevelPreconditioner(*A, MLList, true, Prefix); A generic parameter, say aggregation: type, will now be defined as MLLIst.set("ML: aggregation: type", "METIS"); It is important to point out that some options can be effectively used only if ML has been properly configured. In particular: • Metis aggregation scheme requires --with-ml_metis, or otherwise the code will include all nodes in the calling processor in a unique aggregate; • ParMetis aggregation scheme required --with-ml metis --enable-epetra and --with-ml parmetis2x or --with-ml parmetis3x. • Amesos coarse solvers require --enable-amesos. Moreover, Amesos must have been configure to support the requested coarse solver. Please refer to the Amesos documentation for more details; • Ifpack smoother requires --enable-ifpack.

17

6.3

Default Parameter Settings for Common Problem Types

The MultiLevelPreconditioner class provides default values for four different preconditioner types: 1. Linear elasticity 2. Classical 2-level domain decomposition for the advection diffusion operator 3. 3-level algebraic domain decomposition for the advection diffusion operator 4. Eddy current formulation of Maxwell’s equations Default values are listed in Table 3. In the table, SA refers to “classical” smoothed aggregation (with small aggregates and relative large number of levels), DD and DD-ML to domain decomposition methods (whose coarse matrix is defined using aggressive coarsening and limited number of levels). Maxwell refers to the solution of Maxwell’s equations. Default values for the parameter list can be set by ML Epetra::SetDefaults(). The user can easily put the desired default values in a given parameter list as follows: Teuchos::ParameterList MLList; ML_Epetra::SetDefaults(ProblemType, MLList); or as Teuchos::ParameterList MLList; ML_Epetra::SetDefaults(ProblemType, MLList, Prefix); Prefix (defaulted to an empty string) is the prefix to assign to each entry in the parameter list. For DD and DD-ML, the default smoother is Aztec, with an incomplete factorization ILUT, and minimal overlap. Memory for the two Aztec vectors is allocated using new, and the user is responsible to free this memory, for instance as follows: int * options; options = MLList.get("smoother: Aztec options", options); double * params; params = MLList.get("smoother: Aztec params", params); . . . // Make sure solve is completed before deleting options & params!! delete [] options; delete [] params; The rational behind this is that the parameter list stores a pointer to those vectors, not the content itself. (As a general rule, the vectors stored in the parameter list should not be prematurely destroyed or permitted to go out of scope.)

18

19

Option Name max levels output increasing or decreasing PDE equations null space dimension null space vectors aggregation: type aggregation: type (level 1) aggregation: type (level 8) aggregation: local aggregates aggregation: nodes per aggregate aggregation: damping factor eigen-analysis: type coarse: max size aggregation: threshold aggregation: next-level aggregates per process smoother: sweeps smoother: damping factor smoother: pre or post smoother: type smoother: Aztec as solver smoother: MLS polynomial order smoother: MLS alpha coarse: type coarse: sweeps coarse: damping factor coarse: max processes print unused

Type int int string int int double * string string string int int double string int double int

SA 16 8 increasing 1 1 NULL Uncoupled – MIS – – 4/3 Anorm 128 0.0 –

DD 2 8 increasing 1 1 NULL METIS – – 1 – 4/3 Anorm 128 0.0 –

DD-ML 3 8 increasing 1 1 NULL METIS ParMETIS – – 512 4/3 Anorm 128 0.0 128

maxwell 5 10 decreasing – – NULL Uncoupled-MIS – – – – 0.0 Anorm 128 0.0 –

int double string string bool int double string int double int int

2 0.67 both Gauss-Seidel – – – Amesos KLU 1 1.0 16 0

2 – both Aztec false – – Amesos KLU 1 1.0 16 0

2 – both Aztec false – – Amesos KLU 1 1.0 16 0

2 0.67 both – – 3 30.0 SuperLU 1 1.0 – 0

Table 3: Default values for ML Epetra::MultiLevelPreconditioner for the 4 currently supported problem types SA, DD, DD-ML, Maxwell. “–” means not set.

Uncoupled

Coupled MIS

Uncoupled-MIS

METIS

ParMETIS

Attempts to construct aggregates of optimal size (3 d nodes in d dimensions). Each process works independently, and aggregates cannot span processes. As Uncoupled, but aggregates can span processes (deprecated). Uses a maximal independent set technique to define the aggregates. Aggregates can span processes. May provide better quality aggregates than either Coupled or uncoupled. Computationally more expensive than either because it requires matrix-matrix product. Uses Uncoupled for all levels until there is 1 aggregate per processor. Then switches over to MIS. The coarsening scheme on a given level cannot be specified with this option. Use a graph partitioning algorithm to creates the aggregates, working process-wise. The number of nodes in each aggregate is specified with the option aggregation: nodes per aggregate. Requires ML to be configured with --with-ml metis. As METIS, but partition the global graph. Requires --with-ml parmetis2x or --with-ml parmetis3x. Aggregates can span arbitrary number of processes. Global number of aggregates can be specified with the option aggregation: global number.

Table 4: ML Epetra::MultiLevelPreconditioner: Available coarsening schemes.

6.4

Commonly Used Parameters

Table 4 lists parameter for changing aggregation schemes. Table 5 lists common choices for smoothing options. Table 6 lists common choices affecting the coarse grid solve. Note that, in the parameters name, spaces are important: Do not include nonrequired leading or trailing spaces, and separate words by just one space! Mispelled parameters will not be detected. One may find useful to print unused parameters by calling PrintUnused() after the construction of the multilevel hierarchy.

6.5 6.5.1

List of All Parameters for MultiLevelPreconditioner Class General Options

output

Output level, from 0 to 10 (10 being verbose).

print unused

If non-negative, will print all the unused parameter on the specified processor.

max levels

Maximum number of levels.

increasing or decreasing

If set to increasing, level 0 will correspond to the finest level. If set to decreasing, max levels - 1 will correspond to the finest level. 20

Jacobi

Gauss-Seidel

Aztec

MLS

Point-Jacobi. Damping factor is specified using smoother: dampig factor, and the number of sweeps with smoother: sweeps Point Gauss-Seidel. Damping factor is specified using smoother: dampig factor, and the number of sweeps with smoother: sweeps Use AztecOO’s built-in preconditioning functions as smoothers. Or, if smoother: Aztec as solver is true, use approximate solutions with AztecOO(with smoothers: sweeps iterations as smoothers. The AztecOOvectors options and params can be set using smoother: Aztec options and smoother: Aztec params. Use MLS smoother. The polynomial order is specified by smoother: MLS polynomial order, and the alpha value by smoother: MLS alpha.

Table 5: ML Epetra::MultiLevelPreconditioner: Commonly used smoothers. Jacobi Gauss-Seidel Amesos-KLU Amesos-UMFPACK Amesos-Superludist Amesos-MUMPS Amesos-ScaLAPACK SuperLU

Use coarse: sweeps steps of Jacobi (with damping parameter coarse: damping parameter) as a solver. Use coarse: sweeps steps of Gauss-Seidel(with damping parameter coarse: damping parameter) as a solver. Use Kluthrough Amesos. Coarse grid problem is shipped to proc 0, solved, and solution is broadcast Use Umfpack through Amesos. Coarse grid problem is shipped to proc 0, solved, and solution is broadcasted. Use SuperLU distthrough Amesos. Use double precision version of Mumps through Amesos. Use double precision version of ScaLAPACK through Amesos. Use ML interface to SuperLU.

Table 6: ML Epetra::MultiLevelPreconditioner: Some of the available coarse matrix solvers. Note: Amesos solvers requires ML to be configured with with-ml amesos, and Amesos to be properly configured to support the specified solver.

PDE equations

Number of PDE equations for each grid node. This value is not considered for Epetra VbrMatrix objects, as in this case is obtained from the block map used to construct the object. Note that only block maps with constant element size can be considered.

null space dimension

Dimension of the null space.

null space vectors

Pointer to the null space vectors. If NULL, ML will use the default null space.

21

6.5.2

Aggregation Parameters

aggregation:

type ?

Define the aggregation scheme. Can be: Uncoupled, Coupled, MIS, METIS, ParMETIS. See Table 4.

aggregation:

global aggregates ?

Defines the global number of aggregates (only for METIS and ParMETIS aggregation schemes).

aggregation:

local aggregates ?

Defines the number of aggregates of the calling processor (only for METIS and ParMETIS aggregation schemes). Note: this value overwrites aggregation: global aggregates.

aggregation:

nodes per aggregate ?Defines the number of nodes to be assigned to each aggregate (only for METIS and ParMETIS aggregation schemes). Note: this value overwrites aggregation: local aggregates. If none among aggregation: global aggregates, aggregation: local aggregates and aggregation: nodes per aggregate is specified, the default value is 1 aggregate per process.

aggregation:

damping factor

eigen-analysis:

aggregation:

type

threshold

Damping factor for smoothed aggregation. Defines the numerical scheme to be used to compute an estimation of the maximum eigenvalue of D−1 A, where D = diag(A) (for smoothed aggregation only). It can be: cg (use 10 steps of conjugate gradient method), Anorm (use Anorm of matrix), Anasazi (use the Anasazi package; the problem is supposed to be nonsymmetric), or power-method. Threshold in aggregation.

aggregation: next-level aggregates Defines the maximum number of next-level maper process ? trix rows per process (only for ParMETIS aggregation scheme).

6.5.3

Smoothing Parameters

smoother:

sweeps ?

Number of sweeps of smoother.

22

smoother:

damping factor ?

Smoother damping factor.

smoother:

pre or post ?

If set to pre, only pre-smoothing will be used. If set to post, only post-smoothing will be used. If set to both, pre- and post-smoothing will be used.

smoother:

type ?

Type of the smoother. It can be: Jacobi, Gauss-Seidel, sym Gauss-Seidel, Aztec, IFPACK. See Table 5.

smoother:

Aztec options ?

Pointer to Aztec’s options vector (only for aztec smoother) .

smoother:

Aztec params ?

Pointer to Aztec’s params vector (only for aztec smoother) .

smoother:

Aztec as solver ?

If true, smoother: sweeps iterations of Aztec solvers will be used as smoothers. If false, only the Aztec’s preconditioner function will be used as smoother (only for aztec smoother) .

smoother:

MLS polynomial order ?

Polynomial order for MLS smoothers.

smoother:

MLS alpha ?

Alpha value for MLS smoothers.

6.5.4

Coarsest Grid Parameters

coarse:

max size

Maximum dimension of the coarse grid. ML will not coarsen further is the size of the current level is less than this value.

coarse:

type

Coarse solver. It can Jacobi, Gauss-Seidel, Amesos KLU, Amesos UMFPACK, Amesos Superludist, Amesos MUMPS. See Table 6.

coarse:

sweeps

(only for Jacobi and Gauss-Seidel) Number of sweeps in the coarse solver.

coarse:

damping factor

(only for Jacobi and Gauss-Seidel) Damping factor in the coarse solver. 23

be:

coarse:

7

Maximum number of processes to be used in the coarse grid solution (only for Amesos-Superludist, Amesos-MUMPS, Amesos-ScaLAPACK).

max processes

Advanced Usage of ML

Sections 5 and 6 have detailed the use of ML as a black box preconditioner. In some cases, instead, the user may need to explicitly construct the ML hierarchy. This is reported in the following sections. A brief sample program is given in Figure 2. The function ML Create creates a mulML_Create

(&ml_object, N_grids);

ML_Init_Amatrix (ml_object, 0, ML_Set_Amatrix_Getrow(ml_object, 0, ML_Set_Amatrix_Matvec(ml_object, 0,

nlocal, nlocal,(void *) A_data); user_getrow, NULL, nlocal_allcolumns); user_matvec);

N_levels = ML_Gen_MGHierarchy_UsingAggregation(ml_object, 0, ML_INCREASING, NULL); ML_Gen_Smoother_Jacobi(ml_object, ML_ALL_LEVELS, ML_PRESMOOTHER, 1, ML_DEFAULT); ML_Gen_Solver (ml_object, ML_MGV, 0, N_levels-1); ML_Iterate(ml_object, sol, rhs); ML_Destroy(&ml_object); Figure 2: High level multigrid sample code.

tilevel solver object that is used to define the preconditioner. It requires the maximum number of multigrid levels be specified. In almost all cases, N grids= 20 is more than adequate. The three ‘Amatrix’ statements are used to define the discretization matrix, A, that is solved. This is discussed in greater detail in Section 11.1. The multigrid hierarchy is generated via ML Gen MGHierarchy UsingAggregation. Controlling the behavior of this function is discussed in Section 9. For now, it is important to understand that this function takes the matrix A and sets up relevant multigrid operators corresponding to the smoothed aggregation multigrid method [18] [17]. In particular, it generates a graph associated with A, coarsens this graph, builds functions to transfer vector data between the original graph and the coarsened graph, and then builds an approximation to A on the coarser graph. Once this second multigrid level is completed, the same operations are repeated to the second level approximation to A generating a third level. This process continues until the current graph is sufficiently coarse. The function ML Gen Smoother Jacobi indicates that a Jacobi smoother should be used on all levels. Smoothers are discussed further in Section 8. Finally, ML Gen Solver is invoked when the multigrid preconditioner is fully specified. This function performs any needed initialization and checks for inconsistent options. After ML Gen Solver completes ML Iterate can be used to solve the problem with an initial guess 24

of sol (which will be overwritten with the solution) and a right hand side of rhs. At the present time, the external interface to vectors are just arrays. That is, rhs and sol are simple one-dimensional arrays of the same length as the number of rows in A. In addition to ML Iterate, the function ML Solve MGV can be used to perform one multigrid ‘V’ cycle as a preconditioner.

8

Multigrid & Smoothing Options

Several options can be set to tune the multigrid behavior. In this section, smoothing and high level multigrid choices are discussed. In the next section, the more specialized topic of the grid transfer operator is considered. The details of the functions described in these next two sections are given in Section 13. For most applications, smoothing choices are important to the overall performance of the multigrid method. Unfortunately, there is no simple advice as to what smoother will be best and systematic experimentation is often necessary. ML offers a variety of standard smoothers. Additionally, user-defined smoothers can be supplied and it is possible to use Aztecas a smoother. A list of ML functions that can be invoked to use built-in smoothers are given below along with a few general comments. ML Gen Smoother Jacobi

Typically, not the fastest smoother. Should be used with damping. For Poisson problems, the recommended damping values are 32 (1D), 45 (2D), and 75 (3D). In general, smaller damping numbers are more conservative.

ML Gen Smoother GaussSeidel

Probably the most popular smoother. Typically, faster than Jacobi and damping is often not necessary nor advantageous.

ML Gen Smoother SymGaussSeidel

Symmetric version of Gauss Seidel. When using multigrid preconditioned conjugate gradient, the multigrid operator must be symmetrizable. This can be achieved by using a symmetric smoother with the same number of pre and post sweeps on each level.

ML Gen Smoother BlockGaussSeidel

Block Gauss-Seidel with a fixed block size. Often used for PDE systems where the block size is the number of degrees of freedom (DOFs) per grid point.

ML Gen Smoother VBlockJacobi

Variable block Jacobi smoother. This allows users to specify unknowns to be grouped into different blocks when doing block Jacobi.

25

ML Gen Smoother VBlockSymGaussSeidel Symmetric variable block Gauss-Seidel smoothing. This allows users to specify unknowns to be grouped into different blocks when doing symmetric block Gauss-Seidel. It should be noted that the parallel Gauss-Seidel smoothers are not true Gauss-Seidel. In particular, each processor does a Gauss-Seidel iteration using off-processor information from the previous iteration. Aztec user’s [15] can invoke ML Gen SmootherAztec to use either Aztec solvers or Aztec preconditioners as smoothers on any grid level. Thus, for example, it is possible to use preconditioned conjugate-gradient (where the preconditioner might be an incomplete Cholesky factorization) as a smoother within the multigrid method. Using Krylov smoothers as a preconditioner could potentially be more robust than using the simpler schemes provided directly by ML. However, one must be careful when multigrid is a preconditioner to an outer Krylov iteration. Embedding an inner Krylov method within a preconditioner to an outer Krylov method may not converge due to the fact that the preconditioner can no longer be represented by a simple matrix. Finally, it is possible to pass user-defined smoothing functions into ML via ML Set Smoother. The signature of the user defined smoother function is int user_smoothing(ML_Smoother *smoother, int x_length, double x[], int rhs_length, double rhs[]) where smoother is an internal ML object, x is a vector (of length x length) that corresponds to the initial guess on input and is the improved solution estimate on output, and rhs is the right hand side vector of length rhs length. The function ML Get MySmootherData(smoother) can be used to get a pointer back to the user’s data (i.e. the data pointer given with the ML Set Smoother invocation). A simple (and suboptimal) damped Jacobi smoother for the finest grid of our example is given below: int user_smoothing(ML_Smoother *smoother, int x_length, double x[], int rhs_length, double rhs[]) { int i; double ap[5], omega = .5; /* temp vector and damping factor */ Poisson_matvec(ML_Get_MySmootherData(smoother), x_length, x, rhs_length, ap); for (i = 0; i < x_length; i++) x[i] = x[i] + omega*(rhs[i] - ap[i])/2.; return 0; }

A more complete smoothing example that operates on all multigrid levels is given in the file mlguide.c. This routine uses the functions ML Operator Apply, ML Operator Get Diag, and ML Get Amatrix to access coarse grid matrices constructed during the algebraic multigrid process. By writing these user-defined smoothers, it is possible to tailor smoothers to a particular application or to use methods provided by other packages. In fact, the Aztec methods within ML have been implemented by writing wrappers to existing Aztec functions and passing them into ML via ML Set Smoother. At the present time there are only a few supported general parameters that may be altered by users. However, we expect that this list will grow in the future. When using ML Iterate, the convergence tolerance (ML Set Tolerance) and the frequency with which 26

residual information is output (ML Set ResidualOutputFrequency) can both be set. Additionally, the level of diagnostic output from either ML Iterate or ML Solve MGV can be set via ML Set OutputLevel. The maximum number of multigrid levels can be set via ML Create or ML Set MaxLevels. Otherwise, ML continues coarsening until the coarsest grid is less than or equal to a specified size (by default 10 degrees of freedom). This size can be set via ML Aggregate Set MaxCoarseSize.

9

Smoothed Aggregation Options

When performing smooth aggregation, the matrix graph is first coarsened (actually vertices are aggregated together) and then a grid transfer operator is constructed. A number of parameters can be altered to change the behavior of these phases. 9.1

Aggregation Options

A graph of the matrix is usually constructed by associating a vertex with each equation and adding an edge between two vertices i and j if there is a nonzero in the (i, j)th or (j, i)th entry. It is this matrix graph whose vertices are aggregated together that effectively determines the next coarser mesh. The above graph generation procedure can be altered in two ways. First, a block matrix graph can be constructed instead of a point matrix graph. In particular, all the degrees of freedom (DOFs) at a grid point can be collapsed into a single vertex of the matrix graph. This situation arises when a PDE system is being solved where each grid point has the same number of DOFs. The resulting block matrix graph is significantly smaller than the point matrix graph and by aggregating the block matrix graph, all unknowns at a grid point are kept together. This usually results in better convergence rates (and the coarsening is actually less expensive to compute). To indicate the number of DOFs per node, the function ML Aggregate Set NullSpace is used. The second way in which the graph matrix can be altered is by ignoring small values. In particular, it is often preferential to ignore weak coupling during coarsening. The error between weakly coupled points is generally hard to smooth and so it is best not to coarsen in this direction. For example, when applying a Gauss-Seidel smoother to a standard discretization of uxx + ²uyy = f −6

(with 0 ≤ ² ≤ 10 ) , there is almost no coupling in the y direction. Consequently, simple smoothers like Gauss-Seidel do not effectively smooth the error in this direction. If we apply a standard coarsening algorithm, convergence rates suffer due to this lack of y-direction smoothing. There are two principal ways to fix this: use a more sophisticated smoother or coarsen the graph only in the x direction. By ignoring the y-direction coupling in the matrix graph, the aggregation phase effectively coarsens in only the x-direction (the direction for which the errors are smooth) yielding significantly better multigrid convergence rates. In general, a drop tolerance, told , can be set such that an individual matrix entry, A(i, j) is dropped in the coarsening phase if p |A(i, j)| ≤ told ∗ |A(i, i)A(j, j)|. This drop tolerance (whose default value is zero) is set by ML Aggregate Set Threshold. There are two different groups of graph coarsening algorithms in ML: 27

• schemes with fixed ratio of coarsening between levels: uncoupled aggregation, coupled aggregation, and MIS aggregation. A description of those three schemes along with some numerical results are given in [16]. As the default, the Uncoupled-MIS scheme is used which does uncoupled aggregation on finer grids and switches to the more expensive MIS aggregation on coarser grids; • schemes with variable ratio of coarsening between levels: Metis and ParMetisaggregation. Those schemes use the graph decomposition algorithms provided by Metis and ParMetis, to create the aggregates. Poorly done aggregation can adversely affect the multigrid convergence and the time per iteration. In particular, if the scheme coarsens too rapidly multigrid convergence may suffer. However, if coarsening is too slow, the number of multigrid levels increases and the number of nonzeros per row in the coarse grid discretization matrix may grow rapidly. We refer the reader to the above paper and indicate that users might try experimenting with the different schemes via ML Aggregate Set CoarsenScheme Uncoupled, ML Aggregate Set CoarsenScheme Coupled, ML Aggregate Set CoarsenScheme MIS, ML Aggregate Set CoarsenScheme METIS, and ML Aggregate Set CoarsenScheme ParMETIS. 9.2

Interpolation Options

An interpolation operator is built using coarsening information, seed vectors, and a damping factor. We refer the reader to [17] for details on the algorithm and the theory. In this section, we explain a few essential features to help users direct the interpolation process. Coarsening or aggregation information is first used to create a tentative interpolation operator. This process takes a seed vector or seed vectors and builds a grid transfer operator. The details of this process are not discussed in this document. It is, however, important to understand that only a few seed vectors are needed (often but not always equal to the number of DOFs at each grid point) and that these seed vectors should correspond to components that are difficult to smooth. The tentative interpolation that results from these seed vectors will interpolate the seed vectors perfectly. It does this by ensuring that all seed vectors are in the range of the interpolation operator. This means that each seed vector can be recovered by interpolating the appropriate coarse grid vector. The general idea of smoothed aggregation (actually all multigrid methods) is that errors not eliminated by the smoother must be removed by the coarse grid solution process. If the error after several smoothing iterations was known, it would be possible to pick this error vector as the seed vector. However, since this is not the case, we look at vectors associated with small eigenvalues (or singular values in the nonsymmetric case) of the discretization operator. Errors in the direction of these eigenvectors are typically difficult to smooth as they appear much smaller in the residual (r = Ae where r is the residual, A is discretization matrix, and e is the error). For most scalar PDEs, a single seed vector is sufficient and so we seek some approximation to the eigenvector associated with the lowest eigenvalue. It is well known that a scalar Poisson operator with Neumann boundary conditions is singular and that the null space is the constant vector. Thus, when applying smoothed aggregation to Poisson operators, it is quite natural to choose the constant vector as the seed vector. In many cases, this constant vector is a good choice as all spatial derivatives within the operator are zero and so it is often associated with small singular values. Within ML the default is to choose 28

the number of seed vectors to be equal to the number of DOFs at each node (given via ML Aggregate Set NullSpace). Each seed vector corresponds to a constant vector for that DOF component. Specifically, if we have a PDE system with two DOFs per node. Then one seed vector is one at the first DOF and zero at the other DOF throughout the graph. The second seed vector is zero at the first DOF and one at the other DOF throughout the graph. In some cases, however, information is known as to what components will be difficult for the smoother or what null space is associated with an operator. In elasticity, for example, it is well known that a floating structure has six rigid body modes (three translational vectors and three rotation vectors) that correspond to the null space of the operator. In this case, the logical choice is to take these six vectors as the seed vectors in smoothed aggregation. When this type of information is known, it should be given to ML via the command ML Aggregate Set NullSpace. Once the tentative prolongator is created, it is smoothed via a damped Jacobi iteration. The reasons for this smoothing are related to the theory where the interpolation basis functions must have a certain degree of smoothness (see [17]). However, the smoothing stage can be omitted by setting the damping to zero using the function ML Aggregate Set DampingFactor. Though theoretically poorer, unsmoothed aggregation can have considerably less set up time and less cost per iteration than smoothed aggregation. When smoothing, ML has two ways to determine the Jacobi damping parameter and each require some estimate of the largest eigenvalue of the discretization operator. The current default is to use a few iterations of a conjugate-gradient method to estimate this value. However, if the matrix is nonsymmetric, the infinity norm of the matrix should be used instead via ML Aggregate Set SpectralNormScheme Anorm. There are several other internal parameters that have not been discussed in this document. In the future, it is anticipated that some of these will be made available to users.

10

Advanced Usage of ML and Epetra

Class ML Epetra::MultiLevelOperator is defined in a header file, that must be included as #include "ml_epetra_operator.h" Users may also need to include ml_config.h, Epetra_Operator.h, Epetra_MultiVector.h, Epetra_LinearProblem.h, AztecOO.h. Check the Epetra and AztecOO documentation for more details. Let A be an Epetra RowMatrix for which we aim to construct a preconditioner, and let ml_handle be the structure ML requires to store internal data (see Section 7), created with the instruction ML_Create(&ml_handle,N_levels); where N_levels is the specified (maximum) number of levels. As already pointed out, ML can accept in input very general matrices. Basically, the user has to specify the number of local rows, and provide a function to update the ghost nodes (that is, nodes requires in the matrix-vector product, but assigned to another process). For Epetra matrices, this is done by the following function EpetraMatrix2MLMatrix(ml_handle, 0, &A); 29

and it is important to note that A is not converted to ML format. Instead, EpetraMatrix2MLMatrix defines a suitable getrow function (and other minor data structures) that allows ML to work with A. Let agg_object a ML Aggregate pointer, created using ML_Aggregate_Create(&agg_object); At this point, users have to create the multilevel hierarchy, define the aggregation schemes, the smoothers, the coarse solver, and create the solver. Then, we can finally create the ML Epetra::MultiLevelOperator object ML_Epetra::MultiLevelOperator MLop(ml_handle,comm,map,map); (map being the Epetra Map used to create the matrix) and set the preconditioning operator of our AztecOO solver, Epetra_LinearProblem Problem(A,&x,&b); AztecOO Solver(Problem); solver.SetPrecOperator(&MLop); where x and b are Epetra_MultiVector’s defining solution and right-hand side. The linear problem can now be solved as, for instance, Solver.SetAztecOption( AZ_solver, AZ_gmres ); solver.Iterate(Niters, 1e-12);

11 11.1

Using ML without Epetra Creating a ML matrix: Single Processor

Matrices are created by defining some size information, a matrix-vector product and a getrow function (which is used to extract matrix information). We note that Epetra and Aztec users do not need to read this (or the next) section as there are special functions to convert Epetra objects and Aztec matrices to ML matrices (see Section 4.2). Further, functions for some common matrix storage formats (CSR & MSR) already exist within ML and do not need to be rewritten11 . Size information is indicated via ML Init Amatrix. The third parameter in the Figure 2 invocation indicates that a matrix with nlocal rows is being defined. The fourth parameter gives the vector length of vectors that can be multiplied with this matrix. Additionally, a data pointer, A data, is associated with the matrix. This pointer is passed back into the matrix-vector product and getrow functions that are supplied by the user. Finally, the number ‘0’ indicates at what level within the multigrid hierarchy the matrix is to be stored. For discussions within this document, this is always ‘0’. It should be noted that there appears to be some redundant information. In particular, the number of rows and the vector length in ML Init Amatrix should be the same number as the discretization matrices are square. Cases where these ‘apparently’ redundant parameters might be set differently are not discussed in this document. 11 The

functions CSR matvec, CSR getrows, MSR matvec and MSR getrows can be used.

30

The function ML Set Amatrix Matvec associates a matrix-vector product with the discretization matrix. The invocation in Figure 2 indicates that the matrix-vector product function user matvec is associated with the matrix located at level ‘0’ of the multigrid hierarchy. The signature of user matvec is int user_matvec(ML_Operator *Amat, int in_length, double p[], int out_length, double ap[]) where A mat is an internal ML object, p is the vector to apply to the matrix, in length is the length of this vector, and ap is the result after multiplying the discretization matrix by the vector p and out length is the length of ap. The function ML Get MyMatvecData(Amat) can be used to get a pointer back to the user’s data (i.e. the data pointer given with the ML Init Amatrix invocation). Finally, ML Set Amatrix Getrow associates a getrow function with the discretization matrix. This getrow function returns nonzero information corresponding to specific rows. The invocation in Figure 2 indicates that a user supplied function user getrow is associated with the matrix located at level ‘0’ of the multigrid hierarchy and that this matrix contains nlocal allcolumns columns and that no communication (NULL) is used (discussed in the next section). It again appears that some redundant information is being asked as the number of columns was already given. However, when running in parallel this number will include ghost node information and is usually different from the number of rows. The signature of user getrow is int user_getrow(ML_Operator *Amat, int N_requested_rows, int requested_rows[], int allocated_space, int columns[], double values[], int row_lengths[]) where Amat is an internal ML object, N requested rows is the number of matrix rows for which information is returned, requested rows are the specific rows for which information will be returned, allocated space indicates how much space has been allocated in columns and values for nonzero information. The function ML Get MyGetrowData(Amat) can be used to get a pointer back to the user’s data (i.e. the data pointer given with the ML Init Amatrix invocation). On return, the user’s function should take each row in order within requested rows and place the column numbers and the values corresponding to nonzeros in the arrays columns and values. The length of the ith requested row should appear in row lengths[i]. If there is not enough allocated space in columns or values, this routine simply returns a ‘0’, otherwise it returns a ‘1’. To clarify, these functions, one concrete example is given corresponding to the matrix:   2 −1   −1 2 −1   −1 2 −1 (2) .   −1 2 −1  −1 2 To implement this matrix, the following functions are defined: int Poisson_getrow(ML_Operator *Amat, int N_requested_rows, int requested_rows[], int allocated_space, int columns[], double values[], int row_lengths[]) { int count = 0, i, start, row;

31

for (i = 0; i < N_requested_rows; i++) { if (allocated_space < count+3) return(0); start = count; row = requested_rows[i]; if ( (row >= 0) || (row