A PARALLEL MULTILEVEL ILU FACTORIZATION BASED ... - CiteSeerX

A PARALLEL MULTILEVEL ILU FACTORIZATION BASED ON A HIERARCHICAL GRAPH DECOMPOSITION ∗ ´ PASCAL HENON

† AND

YOUSEF SAAD

‡

Abstract. PHIDAL (Parallel Hierarchical Interface Decomposition ALgorithm) is a parallel incomplete factorization method which exploits a hierarchical interface decomposition of the adjacency graph of the coefficient matrix. The idea of the decomposition is similar to the well-known wirebasket techniques used in domain decomposition. However, the method is devised for general, irregularly structured, sparse linear systems. This paper describes a few algorithms for obtaining good quality hierarchical graph decompositions, and discusses the parallel implementation of the factorization procedure. Numerical experiments are reported to illustrate the scalability of the algorithm and its effectiveness as a general purpose parallel linear system solver. Key words. Parallel Incomplete LU factorization, ILU, Sparse Gaussian Elimination, Wirebasket decomposition, Interface decomposition, Preconditioning, ILU with threshold, ILUT, Iterative methods, Sparse linear systems. AMS subject classifications. 65F10, 65N06.

1. Introduction. In recent years, a few Incomplete LU factorization techniques were developed with the goal of combining some of the features of standard ILU preconditioners with the good scalability features of multi-level methods, see for example ILUM [10], the Algebraic Recursive Multilevel Solver (ARMS) [12] and its parallel version pARMS [9], MLILU [2], and MRILU [3]. The key feature of these techniques is to reorder the system to enable a parallel (incomplete) elimination of the unknowns. A class of techniques which implement these ideas within domain decomposition have also been developed and can be found, for example, in the pARMS [9] package. This paper presents a new technique in this category. The underlying ordering on which the method is based is reminescent of the ‘wirebasket’ techniques of domain decomposition methods [13, 16, 14, 15]. The method can also be viewed as a variation and an extension of the pARMS algorithm, in which the independent sets and levels are defined statically, rather than dynamically, from a hierarchical decomposition of the interface structure of the graph. Finally, it can also be regarded as a form of an ILU decomposition based on a nested dissection – type ordering, in which cross points in the separators play a special role. In this regard, the method bears some resemblance with the FETI method [4]. The algorithm is based on defining a ‘hierarchical interface structure’. The hierarchy consists of levels with the property that Level k vertices, with k > 0, are separators for level k − 1 vertices. In each level, vertices are grouped in independent sets. Level 0 vertices are simply interior vertices of a domain in the graph partitioning of the problem. These are naturally grouped in group-independent sets, in which the blocks (groups) are the interior points of each domain. Vertices that are adjacent to more subdomains will be part of the higher levels. The vertices are globally ordered by level, in increasing order of level number, so that the incomplete factorization process proceeds from the lower to higher levels. The dropping strategies utilized in the elimination process attempt to maintain the independent set structure extracted from the interface decomposition. A major difference with existing methods of this type, such as pARMS, is that the preordering used for parallelism, is performed at the outset instead of dynamically. This has both advantages and disadvantages. Because the ordering is static, the factorization is rather fast. On the other hand, the ordering may face difficulties in the case when, for example, zero or small pivots are encountered, In pARMS, some form of diagonal pivoting is performed to bypass this difficulty. The main ingredient of the ILU factorization is the hierarchical interface decomposition of the graph. It is fairly easy to find definitions which will achieve the desired goals of yielding a hierarchy of subgraphs which separate each other. For example, just assigning to each node a label which is defined as the number of subdomains to which the node belongs, yieds, under certain conditions a decomposition into levels of independent sets, given by the labels. This is the case for simple graphs such as the one of a 5-point finite difference matrix. If not, then a simple algorithm will allow to modify the labels to yield the required ∗ This work was supported in part by the NSF under grants NSF/ACI-0305120 and NSF/INT-0003274, and by the Minnesota Supercomputer Institute. † INRIA, Scalapplix project - LaBRI, 351 cours de la Lib´ eration 33405 Talence, France, [email protected] ‡ University of Minnesota, 200 Union Street S.E., Minneapolis, MN 55455. email: [email protected]

1

Fig. 2.1. Partition of a 8 × 8 5-point mesh into 9 subdomains and the corresponding hierarchical interface structure.

Interior Points CrossPoint Domain Edges

decomposition. However, simple decompositions of this type are usually suboptimal for “rich” graphs, in that they may yield a large number of levels. The next section discusses the hierarchical decomposition of a graph. Section 3 presents the details of the parallel elimination process. Section 4 reports on some experiments. The paper ends with some concluding remarks and tentative ideas for future work. 2. Hierarchical Interface Decomposition. Let G = (V, E) be the adjacency graph of a sparse matrix A that has a symmetric pattern so that G is undirected. Assume that G is partitioned into p subgraphs. An edge-based partitioning is used, in the sense that edges are primarily assigned to subdomains, and therefore vertices at the interfaces overlap two or more domains. This is the common situation which arises, for example, when elements in finite elements methods, are assigned to processors. We begin by illustrating the hierarchical decomposition process on a matrix A which arises from a simple 5-point finite difference discretization of a Laplacean in 2-dimensional space. The vertex set V consists of three types of vertices representing the common “interior vertices”, “domain-edges”, and “cross-points” of the partition, see Figure 2.1 for an illustration. These three types of points are grouped in sets which will be referred to as “connectors”. Thus, the interior points of a given domain form one connector, and the domain-edges, i.e., the set of interface points linking two given subdomains (excluding cross-points) also form a connector. In this case, levels of connectors are also naturally defined. Level one connectors are the sets of interior points, level 2 connectors are the domain-edges, and the level 3 connectors are the cross points. This decomposition of the vertices has the interesting property that different connectors of the same level are not coupled. In the above example, a domain-edge can be defined as a subgraph of G that is adjacent to 2 subdomains and is not adjacent to any other domain-edge. Two sets of vertices are adjacent if there is at least one edge linking a vertex from one set to a vertex of the other set. Similarly, a “cross-point” can be defined as a subgraph of G consisting of points that are adjacent to 4 subdomains and some domain-edges but not adjacent to any other cross-point. The set of “domain-edges” and “cross-points” constitutes the “wirebasket”, to use the terminolgy of domain-decomposition techniques [13, 16, 14, 15]. Note that the domain-decomposition literature uses the term “vertex” for a cross-point and “edge” for a domain-edge which leads to a confusion in the context of sparse matrix techniques since these terms are already used when discussing adjacency graphs of sparse matrices. There are several ways in which one can define connectors for general irregularly structured graphs. 2

In the above example, the number of domains to which a given node is assigned was used as a sort of levelnumber. We refer to this number as the class of a vertex. However, as can be seen from a simple example with more complex connectivity, a levelization based on vertex classes will yield connectors of the same level that are coupled. We first discuss this case, which gives rise to what we term “class-based” connectors. Definition 2.1. (Class-based connectors) A connector C (l) of level l is a connected subgraph of G such that: 1. C (l) is a subset of vertices which are adjacent to l different subdomains; 2. C (l) is not adjacent to any other connector of level l. Fig. 2.2. An example of a hierarchical decomposition of a graph. The circles are connectors and the links show the adjacencies between them. The connectors are grouped into 4 levels indicated by the dashed concentric circles.

Level 1 Level 2 Level 3 Level 4

Note that this definition is not constructive. In order to obtain a decomposition into levels of connectors which complies with the definition, one can first use the class numbers. This will likely yield connectors of the same class that are adjacent. Then, a post-processing is required in which certain nodes are pushed to higher levels to uncouple the connectors. The above definition yields an adequate hierarchical decomposition for the 5-point example discussed above. It uses classes of vertices as a definition of the level of the connector. For the 5-point example this is sufficient to define a good level partitioning. For complex irregular domain and partitions, the situation is more subtle and the definition given above yields a poor decomposition. Figure 2.3 shows an irregular domain partition (the figure only shows the connectors, not the graph vertices and edges). In this case, there are five levels. Clearly, a better level-sctruture of the connectors is achieved by grouping all the connectors represented as dots together and therefore obtain only 3 sets of indepedent connectors. Here we should state what is meant by “good level partitioning”. What is wanted is essentially a decomposition of the graph into a few subgraphs, each defining a different level, with some specific recursive properties of separation between the levels. The subgraph at each level k, is a set of independent subsets (l) (k) (k) Ci (separated by connectors of higher level). Each Ci is adjacent only to Ci with levels l 6= k. Using the number of nearest neighbors as a count for the level number provides an admissible decomposition which is far from being the best for irregular graphs. A general definition of the hierarchical level-decomposition invokes connectors of all levels. Definition 2.2. (Hierarchical Interface Decomposition) A hierarchical interface decomposition of a graph G is a partitioning of G into subsets of connected subgraphs (k) Ci , k = 1, . . . , lev, i = 1, . . . , νk , termed connectors. The level k group (or “k-th level) is the subset of all (k) connectors Ci , i = 1, . . . , νk , of level k. The groups satisfy the following requirements 3

• Two connectors of the same level cannot be adjacent. (k) (l) (k) (l) • Two adjacent connectors Ci and Cj with k < l verify class(Ci ) < class(Cj ) where class(C) is the level of the connector C as defined in 2.1. (k) Note that the first condition required of the groups simply means the connectors C i from a (block) independent set. The second condition ensures that the highly connected connectors are in the upper levels. In the domain decomposition linguo, this can be viewed as a condition ensuring that a “cross-point” is never in a lower level than an adjacent “domain-edge”. A few more clarifications are needed. The term ‘interface’ is used only because of the nature of the algorithms that will produce the decomposition. The term ‘hierarchical’ implies only that the connectors of high level are separators to lower level connectors. A pictural illustration is given in Figure 2.2. Another observation is that the above definition is not tied to any graph partitioning. A partitioning can serve to define a hierarchical decomposition but it is not essential for the definition. In particular a reasonable partitioning will give rise to a hierarchical decomposition by using definition 2.1. Fig. 2.3. Connector levels resulting from an irregular domain partition. The solid lines are the whole domain boundaries. The dashed connectors constitutes the level-2 connector set. The dot connectors are the level-i connector with i > 2, the levels of the dot connectors are written on their side.

4

3 3 3

6

In the remainder of this section, we will discuss first an algorithm for obtaining an initial decomposition of the graph into connectors. Then we discuss the algorithm to obtain levels of independent connectors. 2.1. Algorithms for building connectors. We begin with some definitions and notation that will be used in the remainder of the paper. The heuristics developed in this section will require that we associate with each vertex u a unique key Keyu Keyu = {i1 , .., im } which lists all subdomains ik to which u belongs. We denote by Ci1 ,...,im the set of vertices which belong to the intersection of subdomains i1 , i2 , · · · , im . In particular note that Keyu = {i1 , .., im } ⇔ u ∈ Ci1 ,...,im . The notation KeyC = {i1 , .., im } will also be used for a connector Ci1 ,..,im . Recall that we introduced earlier the term class of a vertex u which is simply defined as |Ku |. We now describe how connectors can be built from these keys. Figure 2.5 shows that level-2 connectors defined directly from the vertex classes |Key u | may be adjacent. One possible remedy in this case, is simply to define Ku differently. For example, we can define Ku as containing the list of all the domains of adj(u), i.e., the set of all subdomains of the nearest neighbors of u. This fixes the problem but leads to interfaces that are unnecessarily ‘thick’. For example, in this 9-point example, the domain interfaces with be 3-lines wide instead of just one line-wide. A better solution to get independent connectors is to start with a decomposition based on the vertex classes and then refine it by modifiying, in a certain way, the keys Keyu of some overlapped nodes at the 4

Fig. 2.4. (a) Non consistent interface decomposition. (b) Consistent interface decomposition. 1, 2

1, 2

1

1, 4

4

2

1, 3, 4

1

2, 3

1, 4

3

3, 4

2

1, 2, 3, 4

4

3, 4

2, 3

3

(b)

(a)

Fig. 2.5. Partition of a 9-points grid into 4 subdomains. Connectors induced by the domain partitioning are not independent. 1, 4

1, 4

1

4 1, 4

1, 2

1, 2 1, 2, 3, 4

1, 2

3, 4

3, 4

3, 4

2, 3

2

2, 3

3

2, 3

interface of the subdomains. Figure 2.6 shows several such modifications to get independent connector in a level. It is important to note that when modifying the key of vertex u we are implicitly changing the original partitioning by reassigning the vertex u. Several possibilities arise to comply with the requirements of a hierarchical interface decomposition and some solutions appear to be better than others. In Figure 2.6 the solution (a) produces 3 levels including the zero-th level in the overlap of the interface whereas solution (b) and (c) produces more levels and/or have to modify a greater number of vertex levels. This example illustrates the fact that some post-processing of the keys are more desirable than others. We impose another important constraint on the connector decomposition that we called the consistency. The definition of a consistent connector decomposition is: Definition 2.3. A hierarchical interface decomposition is said to be consistent if any pair of connectors C1 , C2 verifies: • C1 is adjacent to C2 => KeyC1 ⊂ KeyC2 or KeyC2 ⊂ KeyC1 . The consistency requirement ensures that if a subdomain has some “domain-edges” (i.e. a low level connector) connected to a “cross-point” (i.e. a high level connector) then the “cross-point” belongs to the local interface of the subdomain. Such a case is illustrated in the figure 2.4. This condition can be viewed as an improvment to the numerical robutness of the restriction operator in in a multi-level resolution. One can notice that the consistency requirement contains the second requirement of the definition 2.2. In the previous examples, we discussed two important criteria to use in irregular graph to find an optimal interface partitioning. The first is to limit the number of vertices whose key must be expanded to obtain independent connector level sets and a consistent connector set. The second one is to obtain as small a number of levels as possible. Thus, the ideal interface decomposition should yield a small number of levels by modifying as few keys as possible. A decomposition which satisfies these constraints will be said 5

Fig. 2.6. Some of the possible solutions to get independent connectors in each level by modifying some vertex keys. (a) 2 keys modified, 3 levels obtained. (b) 4 keys modified, 3 levels obtained. (c) 2 keys modified, 4 levels obtained. 1, 4

1, 4

1, 4

1 1, 2

1, 4

4

1, 2, 3, 4

1, 2 1, 2, 3, 4

1, 2

3, 4

1 3, 4

3, 4

1, 2, 3, 4

1, 4

4

1, 2 1, 2, 3, 4 1, 2, 3, 4 1, 2, 3, 4

1, 2

1, 2, 3, 4

2

1, 4

1 3, 4

3, 4

1, 2

1, 2 1, 2, 3, 4

1, 2

1, 2, 3, 4

3

2, 3

2

2, 3

2, 3

3, 4

3, 4

3, 4

2, 3, 4

3

2

2, 3

2, 3

(a)

4

1, 2, 4

3

2, 3

(c)

(b)

Fig. 2.7. Partition of a 9-points grid into 4 subdomains. The connectors and values of the keys are shown at the begining and at completion of the algorithm.

1

1, 4

1, 4

1, 4

1, 4

C1,4

1

4

1, 4

C1,2 1, 2

1, 2

1, 2 1, 2, 3, 4 3, 4

4 1, 2, 3, 4

3, 4

1, 2

3, 4

1, 2 1, 2

1, 2, 3, 4

3, 4

3, 4

3, 4

C1,2,3,4

2

2, 3

C3,4

1, 2, 3, 4

2, 3

2

3

2, 3

2, 3

3

2, 3

C2,3

to be optimal. In the general case, the problem of finding an optimal interface partitioning into connectors is NP-complete because the only problem of finding the minimum vertex set to be modified in order to get independent level connector is similar to a Minimun Set Cover problem [5]. Therefore, to define the connectors and levels as described before, we use two heuristics. The first algorithm aims at modifying the minimum number of vertex levels to disconnect all the connector in the same level. The second algorithm aims at building the levels in order to minimize the number of levels. The principle of the first algorithm is to “expand” iteratively level-by-level some vertex keys in order to disconnect the current level connector. Note that expanding a vertex key means that this vertex will be overlapped by additional subdomains than those given by the initial partition. At initialization of the algorithm 2.1, the key of each vertex u is set to the list of subdomains that contain vertex u. In the main loop, the keys of each vertex u of level 2 that is connected to another vertex in level 2 with a different key is expanded by the union of all the keys from vertices ∈ V l (u). Then the vertex u is set in a higher level. This is continued iteratively over all the levels. Lines 13 – 15 of Algorithm 2.1 ensure the property of consistence (2.3). In this algorithm, the traversal order of the vertices in a level is important since two different traversals can give very different results with a different number of vertices moved to an upper level. A heuristic to find a good traversal is to define Kdegu as the number of neighbors in V l (u) which have a key different from Keyu and then choose the next vertex u in the traversal as the one that has the 6

maximum Kdegu (line 18). For this we need to update on the fly the Kdeg of the neighbors when a vertex expands its key (line 23,24,25). An implentation issue is to sort the vertices by increasing order of Kdeg in a max heap; the cost to obtain the maximum vertex is then constant and the cost to update the rank of a vertex neighbor is bounded by O(log(ni )) where ni is the number of vertices in the interfaces. If we denote by dmax the maximum degree of a vertex in the interface subgraph, then a pessimistic bound on the complexity of the algorithm 2.1 is in O(dmax .ni .log(ni )). At the end of the algorithm, the key of a vertex u indicates the connector to which it belongs, i.e., Ci1 ,...,im = {u ∈ V | Keyu = {i1 , ..., im }} Algorithm 2.1, uses the following notation: • p is the number of subdomains, • Ll is the set of vertices whose keys have a cardinality equal to l, • V l (i) is the set of vertices belonging to Ll and adjacent to vertex i. • Kdegu is the number of neighbors in V l (u) that have a key different from Keyu . Algorithm 2.1. Independent Connector Computation. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28.

Initialization: For each vertex u ∈ V Keyu := list of subdomains containing vertex u. end For each vertex u ∈ V Kdegu := |{ v ∈ V l (u) / Keyv 6= Keyu }| . end For l = 1 to p Do: Ll = {u ∈ V / |Keyu | == l} end Main loop: For l = 2 to p Do: For each vertex u ∈ Ll do ³S ´ S Keyu := Keyu ∪ k=1..l−1 Key k v v∈V (u) end While all vertices in Ll have not been processed l Get u the vertex in S L such that Kdegu is maximum. Keyu := Keyu ∪ v∈V l (u) Keyv m = |Keyu | if m > l then Ll := Ll Â{u} Lm := Lm ∪ {u} For each v ∈ V l (u) Kdegv := Kdegv − 1 end end end end Figure 2.7 illustrates the results of this algorithm on the 9-points grid example.

2.2. Reducing the number of levels. We have seen earlier that defining a levels from classes may yield a poor interface decomposition because an irregular domain partition may produce a large number of levels. A better definition of the levels from the keys is required to alleviate this problem. This definition c requires an order relation > for pairs of adjacent connectors. Definition 2.4. For two adjacent connectors C1 and C2 : c

C1 > C 2

KeyC2 ⊂ KeyC1

←→ 7

Note that for consistent decompositions, two adjacent connectors are always comparable with the relation >. The idea of the level computation algorithm is to compute iteratively the level as the set of local minimum c for > in the set of all unclassifed connectors. By classifying a connector, we simply mean assigning a level to it. On irregular graphs, the connector decomposition can sometimes produce connectors that do not help in the main goal of separating the graph into levels of independent sets. For example, Figure 2.8 shows a decomposition which produces two connectors C1,2,3 and C1,2,3,4 which should clearly be merged into one. Indeed, it is meaningless to differentiate between these two connectors because connector C 1,2,3,4 does not separate two connectors of the level immediately above it. A connector such as C 1,2,3,4 on Figure 2.8 will be termed a singular connector. A singular connector is always merged with a connector of a level immediately below it. The merge operation takes place just after a new level has been computed. The principle, described in Algorithm 2.3, is to merge any neighbor of the current level connectors such that the new merged connector c is still a local minimum for > in the set of all unsorted connectors. c

Fig. 2.8. On the left, connector C1,2,3,4 is singular. On the right the singular connector has been merged.

4

4

14

34

3

1234

1

14

34

3 1

1234

123

12

2

23

12

2

23

In the algorithms 2.2 and 2.3, Cunclass. is the set of all unsorted connectors, Ci is the level i. Algorithm 2.2. Level Computation Algorithm. 1. 2. 3. 4. 5. 6. 7.

i := 0; While Cunclass. 6= ∅ i := i + 1 c Ci := the set of all minimum connectors in Cunclass. for >. Cunclass. := Cunclass. ÂCi Merge all the singular connectors for level i. end

Algorithm 2.3. Merge singular connectors for level i. 1. 2. 3. 4. 5. 6. 7. 8. 9.

For each connector U ∈ Ci Do: T For each connector V ∈ V(U ) Cunclass. c if U ∪ V is a minimum for > in Cunclass. ∪ Ci then U =U ∪V KeyU = KeyU ∪ KeyV Cunclass. = Cunclass. ÂV End End End

One can notice that the algorithm 2.2 will always produce a number of levels that is less than or equal to the number of levels that a class-based levelization, as defined in 2.1, would produce. If we denote by Dmax the maximum degree of a connector in the connector graph, by l the number of connector levels and by c the number of connector then a pessimistic bound of the complexity is in O(l.c.D max ). 3. Parallel incomplete factorizations based on the hierarchical interface decomposition. The previous section defined what we termed a “hierarchical interface structure” related to a partitioning 8

Fig. 3.1. Domain partition into 6 subdomain. Connectors and their keys are represented

1 14

4

12 1245 45

2

3

23

36

25 2356

5

56

6

of the graph G. This hierarchical interface structure consists of several levels of connectors which have the property of forming an independent set at each level. If the global matrix is (implicitly) reordered by level, starting with the nodes of level 0, followed by those of level 1, and so on, the independent sets will lead naturally to parallel ILU-type factorizations if careful dropping is used. This technique bears some similarity with the Algebraic Recursive Multilevel Solver (ARMS), discussed in [12, 8] with a major difference being that in ARMS the independent sets are determined dynamically whereas in the new algorithm, the ordering is set in advance. In addition dropping is especially adapted to the partitioning as is explained in the next section. 3.1. Consistent dropping strategies. The interface decomposition and the keys associated with each node will be intimately related with the Incomplete LU factorization process. In particular, this decomposition and the keys will help define a suitable fill-pattern. The goal is to retain as much parallelism as possible in the ILU factorization. The first principle of the elimination is to proceed by levels: nodes of the zero-th level are eliminated first, followed by those of the first level, etc. All nodes of the zero-th level can be eliminated independently since there is no fill-in between nodes i and j of two different connectors of level 0 (interior of the subdomain). In contrast, fill-ins may appear between connectors from level 1 down to the last level of the interface connectors. These fill-ins will cause some uncoupled connectors to become coupled, and so they will no longer form an independent set. This will result in reduced parallelism. Recall that in the standard static ILU factorization a nonzero pattern is selected in advance based on the sparsity pattern of A. For example, the nonzero pattern for the ILU(0) factorization is simply the set of all pairs (i, j) such aij 6= 0, i.e., it is equal to the nonzero pattern of the original matrix. For details and extensions, see a standard text on iterative methods, e.g., [11, 1]. As was mentioned above certain nonzero patterns will allow more parallelism than others during the factorization. Certain choices of nonzero patterns may also lead to simple implementations. We consider two possible options available to handle this problem. The first is not to allow any fill-ins between two uncoupled connectors of the same level. In this fashion, the elimination of a any connector at a certain level can always proceed in parallel. The second option, which is less restrictive, is to allow fill-ins but to restrict them to occur only when i, j are in the same processor (i.e. subdomain). In this case, elimination should be done in a certain order and parallel execution can be maintained by exploiting independent sets. Figure 3.2 draws the block structure of the global matrix reordered according to the HID decomposition of the domain illustrated in figure 3.1. Figure 3.2 illustrates the two dropping strategies. 3.2. Strictly consistent nonzero patterns. A nonzero pattern P is said to be strictly consistent with the HID decomposition if it does not allow any fill-in between any two connectors of the same level. This implies that an entry (i, j) will be in the nonzero pattern P only if i and j have identical keys, i.e., we must have: P ⊂ {(i, j) | Keyi = Keyj } . Note that fill-ins between connector of different levels are allowed. Indeed, there is already a sequential loop corresponding to the levels and parallelism is not hampered by introducing additional fill-ins between these levels. Not allowing fill-ins between levels is too restrictive and may yield an ILU factorization that does not scale too weel in terms of iteration counts. 9

Fig. 3.2. Block decomposition of the matrix according to the HID reordering. For convenience, all blocks have been drawn at the same size; using a real scale the blocks become significantly smaller in the high levels.

KK L KK L KK LKK [[ \[[ ]]\[[ ^]] ^]] ^]] L \ zyzy zyzy zyzy zyzy L L L L \ \ \ ^ ^ ^ M M M W W W c c c c N N N X X X d d d d b b b xwzyw xwzyw xwzyw xwzyw |{{ |{{ |{{ K L K LK N [X LKL \ MM N MM NMM WW \[XWW ]\[XWW ^] ^] dcc ^] dcc dcc dcc abaabaaba xxw xxw xxw xxw ||{ ||{ ||{ N N N X X X d d d d b b b a a a M M M W W W c c c c wx wx wx wx {}|~ {}|~ {}|~ N N N O X X X d_` d_` d_` d_` babagh ba gh gh gh O OPO PO P O _ _ _ _ g g g g } } } PP P P ` ` ` ` h h h h ~ ~ ~ Q Q Q Q Y Y Y Y k k k u u u u RQRQ ZYZY`_ ZY `_ `_ `_ hg hg hg hg lklklk vuvuvuvu ~} ~} ~} O P O RQ PR O RQ Level 1 Y llk llk llk vvu vvu vvu vvu QQ RR QQ RR QQ RRQQ YYZZYY ZZYY ZZYY R k k k u u u u RR RR T Z Z Z l l l v v v v SS T SS TSS fefe fefe fefe rqrq rqrq ssrqrq tsts tsts popo tsts popo popo popo Non empty sparse T T T in A S T S V T fe fe ji fe ji ji ji rq rq smrq tsnm tsnm po tsnm po po po blocks UU TS V UU V UU VUU V jiji jiji jiji jiji mmnmnm nmnm nmnm @?? @?? @?? U V U V U VVU V V V @@? @@? @@?

@? @? @? Non empty sparse

in L.U

DCDC DCDCDCDCDCDC >=>== >=>== >=>== >=>== blocks >>>> DDDCDDCD

B AA BAA BAA CDC CDC Level C 2 >= >= >= >= JIJIJI C B F H A BBA BBA ¤£ ¤£ ¤£ ¤£ % &% &% &% JIJ¢¡I JIJ¢¡I JIJ¢¡I E F E FE G H G H G HG B empty sparse EE F EE ¦ GG ¨§ H GG H GG HGG "! "! ¤££ "! ¤££ ¤££ ¤££ ª© ª© %#%ª© &%$#% &%$#% ¬« &%$#% ¬« ¬« ¬« ¢¡{ >|{? >|{? ? ?>|{?>|{ ©¨©¨©¨ ? > >| >| { { { ¨ { | { {©©¨©¨©¨ | { {| { |{|{ | C CC D CC DCC D C D D DDC $ DC% $C $ $%$JI% I J +I, DC% $I $JI% $ % I I J J JIJI J I I I J I Iôõ Iôõ J JIôõô ôô;< J ô ô õ õ W WW X WW õXWW 2 X W X X XXW 12 X W W1 /./ W X .. . / /. ./ . .\ .\ [ [[ / [[ /./.\[[ [ ./ [ \ [ \ [ \\[ \ \

éèéè

éèéèéè

9éè

çæ9:çæçæ

çæçæ:çæ

åäåä

åäåäåä

78åäåä

ãâãâãâ

ãââãâã

ñð

ñðñðñð

ñðñðñð

ïîïîïî

ïîïîïî

íìíìíì

íìíìíì

ëêëê

ëëêëê

êëêëê

å æ å æå æ åf å æåf fæ f æågfæfgfæåg gfæå f g g g í g í gîí î î íí îíí îí î îí óó î óó îôóó ô ô ô ô ôó ó ôó ô ôôó ómlþýô óm ô lþým lþ ml ý mlþýmlþý þý ýý þýý þý þýþ þþ eeª fe feef ««ªª e fe f fe ª ª ª f6 6f 6 6fe676fe ««««ªªªª ª 6 ª e f 76fe 6 7 ª77 «ª 7 7 7 7 m ¬ nm n nmm ¬¬¬ ¬ ¬ m nm n nnm ¬¬ ¬ ¬ mn m n ¬ ¬ t s ® ts tsts ¯¯¯®®® ® ss t ss ® t ® ® t t ts=

A PARALLEL MULTILEVEL ILU FACTORIZATION BASED ... - CiteSeerX

A PARALLEL MULTILEVEL ILU FACTORIZATION BASED ... - CiteSeerX

Suggest Documents

Multilevel ILU Decomposition - CiteSeerX

Parallel Multilevel Block ILU Preconditioning ... - Semantic Scholar

Parallel Multilevel Incomplete LU Factorization ...

Parallel Sparse Cholesky Factorization with Spectral ... - CiteSeerX

Towards an Implementation of a Multilevel ILU ... - Springer Link

BLOCK ILU PRECONDITIONERS FOR PARALLEL AMR/C ...

Parallel Implementation of a Two-level Algebraic ILU(k ... - Scielo.br

DAKOTA, A Multilevel Parallel Object-Oriented Framework ... - CiteSeerX

Matrix Reordering Using Multilevel Graph Coarsening for ILU ...

Take-off ILU ILU - Lindab

Parallelization of Multilevel ILU Preconditioners on ... - Semantic Scholar

cudaBayesreg: Parallel Implementation of a Bayesian Multilevel ...

cudaBayesreg: Parallel Implementation of a Bayesian Multilevel ...

DAKOTA, A Multilevel Parallel Object-Oriented

DAKOTA, A Multilevel Parallel Object-Oriented ...

Parallel LU Factorization of Sparse Matrices on FPGA-Based ...

Parallel LU Factorization of Sparse Matrices on FPGA-Based

a multilevel analysis - CiteSeerX

parallel multilevel graph-partitioning software

Correlation-Based Context-aware Matrix Factorization - CiteSeerX

An Adaptive LU Factorization Algorithm for Parallel Circuit ... - CiteSeerX

Applying recursion to serial and parallel QR factorization ... - CiteSeerX

New Serial and Parallel Recursive QR Factorization ... - CiteSeerX

FPGA Accelerated Parallel Sparse Matrix Factorization for ... - CiteSeerX