電気学会論文誌●(●●●●●●●部門誌) IEEJ Transactions on ●●●●●●●●●●●●●●● Vol.●● No.● pp.●-● DOI: ●.●●/ieejeiss.●●.●
Paper Proposal of a Genetic Algorithm-applied GMRES(m) method with Automatic Subspace Parameter Optimization Nobutoshi Sagawa*a) Non-member, Norihisa Komoda* Fellow, Ken Naono** Non-member (Manuscript received Sep. 16, 2014, revised Dec. 20, 2014) This paper presents an approach for improving the efficiency of solving linear systems by applying a genetic algorithm (GA) to the GMRES(m) method. For every restart process in GMRES(m), the initial vectors are regarded as chromosomes. When the restart process stagnates, the GA performs a crossover on chromosomes to create new chromosomes for the next restart stage, in which a weighted average algorithm is used to perform the crossover process effectively. To further enhance the performance, the concept of “chromosome-wide stagnation” is introduced by enabling on-the-fly detection of a slowdown in convergence of the GA. A possible way to adjust the m value automatically at the onset of such stagnation is proposed. The proposed method had been tested on several sample matrices and showed satisfactory improvements in execution time. Keywords: numerical simulation, sparse matrix, iterative method, GMRES, genetic algorithm
1.
initial vector, several alternatives have been proposed: one such method is the “look-back” type in which the initial vector is improved by referring back to the results at multiple intermediate stages, rather than using the single result from the immediately previous stage (4),(5),(6). Alternatively, the recent advent of cloud computing is paving the way for using abundant computational resources to accelerate simulations. One promising approach for utilizing such resources is the use of a GA (genetic algorithm), because of its compatibility with parallelism, i.e., chromosomes belonging to the same generation of a GA calculation can be distributed readily and processed in parallel. Application of a GA to sparse matrix solvers was initially proposed for combining calculation parameters such as the types of solvers and preconditioners. Primary examples include on-the-fly optimization of simulator performance by dynamically adjusting the simulation parameters(7) and application of supervised learning to find the optimum combination of a sparse solver and its parameters(8). However, most of these efforts have only concentrated on the optimum choice of matrix computation parameters, such as the type of solver and preconditioner, the Krylov subspace dimension, and the number of execution threads. In reference(9),(10), a GA was applied for the first time to the combination of the calculation parameters and initial vectors, thereby achieving a successful convergence of the GMRES(m) method with a much smaller m than in cases without GAs. However, since both the parameters and the initial vectors were handled simultaneously in the GA process, the impact of applying GA only to the initial vectors was obscured. Furthermore, a simple one-point crossover algorithm was used for crossing the chromosomes (including the initial vectors), which led us to believe that there would be further opportunities for improving performance. This paper applies GA only to the choice of initial vectors and not to calculation parameters to clarify the impact of GA, by which we try to narrow down the effective algorithms for crossover of chromosomes in the GA. We also aim at choosing an
Introduction
Matrix computation is at the heart of numerical simulations that are important for industrial product design optimization. The importance of such simulations is increasing because they can often replace physical experiments that tend to be expensive and time consuming. However, as computer performance continues to improve, the nature of the matrices being treated in simulations becomes more complex, i.e., larger, sparser, and often more ill conditioned. To cope with this situation, the GMRES(m) method was developed as a promising way to solve large-scale sparse matrices with better convergence and lower memory consumption (1). In the GMRES(m) method, an extension of the GMRES (Generalized Minimal Residual) method, the GMRES process is prematurely terminated after examining m-dimensional Krylov subspace using m iterations of GMRES process, and the resulting intermediate solution is used as the initial vector for the next restart stage. By choosing the value for m, users can control the complexity of the calculation. The GMRES(m) process comprises the following two phases: I. Solve Ax = b by m iterations of GMRES calculations with the initial guess x0(l) to get an approximate solution xm(l). II. Update the initial guess x0(l+1) := xm(l) for the next restart cycle. There have been several approaches proposed to improve the efficiency of phase I, such as introduction of deflation techniques(2) and techniques based on augmented Krylov subspace(3). In addition to these, it is advantageous to use a better intermediate initial vector for each restart stage to accelerate convergence of a GMRES(m) calculation at phase II. Instead of directly using the solution from the previous stage as the next a) Correspondence to: E-mail:
[email protected] * Graduate School of Information Science and Technology, Osaka University, 1-1 Yamadaoka, Suita, Osaka, Japan 565-0871 ** Central Research Laboratory, Hitachi Ltd., 1-280 Higashi-Koigakubo, Kokubunji, Tokyo, Japan 185-8601 © 2014 The Institute of Electrical Engineers of Japan.
1
Proposal of a Genetic Algorithm-applied GMRES(m) method (Nobutoshi Sagawa et al.)
appropriate Krylov subspace m and stagnation-detection threshold automatically to further improve the convergence speed and to alleviate the burden of selecting these additional parameters deriving from applying GA to the GMRES(m) method.
2.
schematically in Fig. 1. The solution process of a linear equation Ax = b is defined alternatively as a minimization of the functional f(x) = (x, Ax)/2 − (x,b), where (x, y) denotes the inner product of two vectors x and y. The contour of f(x) = constant forms a hyper-elliptic surface in an n-dimensional space. In Fig. 1, the n-dimensional space collapses to a two-dimensional plane, and the number of chromosomes is limited to two (i.e., x[1]0(1,1) and x[2]0(1,1)) for the sake of simplicity. Here x[d]a(b,c) denotes an intermediate solution vector x at the ath GMRES iteration in the bth restart stage of the cth GA generation with chromosome number d. Suppose the intermediate results derived from x[1]0(1,1) and x[2]0(1,1) will develop into x[1]i(1,1) and x[2]j(1,1) after the ith and jth GMRES stages, respectively, to experience stagnation, this is the point at which a GA process comes into play to cross x[1]i(1,1) and x[2]j(1,1) to generate a new chromosome x[1]0(1,2). Using x[1]0(1,2) and x[2]0(1,2) (generated from another pair of chromosomes) as the initial vectors for the next generation, the calculation continues until the result x that minimizes f(x) is reached.
GMRES Method and Look-back Modification
2.1 GMRES(m) method Since the advent of the GMRES method in 1986, a series of improvements has been proposed to make it one of the most promising techniques for solving large-scale, non-symmetric, sparse matrices. A drawback of this method is that the calculation cost grows to the order of n2, where n is the matrix size, and hence the memory use increases. The GMRES(m) method was introduced as a restarting technique to alleviate this problem(1). With the new technique, GMRES calculation is restarted after every m iterations, and the intermediate result is used as the initial vector for the next restart stage. The algorithm is formulated as follows. (a) (b) (c) (d)
Choose an initial guess of x0(1) For l = 1, 2, … until convergence Do Solve Ax = b using m iterations of the GMRES method with initial guess x0(l) := xm(l − 1) End Do
The cost of each restart stage is limited within O(m2), where m can be adjusted by the user to balance and control the convergence speed and memory consumption. However, the introduction of the restart process gives rise to the problem of finding an appropriate value of m for the characteristics of the matrix to be solved. 2.2 Look-back GMRES(m) method To accelerate convergence of the GMRES(m) method, a look-back version of GMRES(m) has been proposed(4),(5). This method attempts to improve the initial vector for each restart stage using multiple intermediate results, rather than using only the intermediate result from the just concluded stage. The simplest algorithm uses only the two previous results and is obtained by replacing Step (c) of the GMRES(m) method with the following Step (c)' (c)'
Fig. 1.
This algorithm is summarized below as a pseudo code. (a) (b) (c) (d) (e)
Solve Ax = b using m iterations of the GMRES method with initial guess x0(l) = xm(l − 1) + y(l −1), where y(l) = (l)Δx(l) (l) = arg min || rm(l) − AΔx(l) || = (rm(l), AΔx(l)) / (AΔx(l), AΔx(l))
(f) (g) (h) (i)
We will later take a cue from this look-back approach and try to improve the crossover algorithm in GA by adopting (c)' as the weighting function between the paring vectors.
3.
Schematic of crossover process
Populate initial guess vectors x[k]0(1,1) with random numbers, where k = 1… Nc For i = 1, 2… until convergence Do For k = 1… Nc Do For l = 1, 2,… until stagnation Do Solve Ax = b using m iterations of the GMRES method with initial guess x[k]0(l, i) End Do End Do Cross x[k]m(l, i) [k = 1 … Nc] to obtain new x[k]0(1, i +1) End Do
Loop (b) pushes the chromosome generation forward. Loop (c) scans the chromosomes. Loop (d) corresponds to the restart loop of the GMRES(m) calculation. A schematic of the algorithm is provided in Fig. 2. 3.2 Treatment of crossover To obtain satisfactory results from the GA, the choice of an appropriate crossover algorithm at Step (h) in Section 3.1 plays a crucial role. We have made a preliminary study of four different crossover algorithms by using them to solve a matrix. The configuration of the numerical experiment is as follows. The test matrix, torso1, was taken from the University of Florida’s sparse matrix collection(12). The GMRES solver was the Xabclib library developed by Tokyo University(13). We used the Dell Inspiron with Core i5 2.5 GHz
GA application to GMRES(m) method
3.1 Algorithm for applying GA In this paper we aim to clarify the impact of GA on the creation of better initial vectors for the restart process and propose an effective crossover algorithm. While applying a GA to the GMRES(m) process, the initial vectors of each restart step are regarded as chromosomes. The Nc sets of initial chromosomes are generated randomly as the starting points. GMRES processes are executed until the solution vectors become stagnant, and then new initial vectors are generated by crossing the resultant vectors of the previous stage. The basic idea of this approach is shown 2
IEEJ Trans. ●●, Vol.●●, No.●, ●●●
Proposal of a Genetic Algorithm-applied GMRES(m) method (Nobutoshi Sagawa et al.)
Fig. 2.
Schematic of the GA-applied GMRES(m) method
CPU equipped with 8 GB memory, and the operating system was Ubuntu 10.4 with Intel Fortran compiler. The preconditioner was set to be ILU(0). To make the impact of GA application distinctive, the auto-tuning feature of m (i.e., increasing m upon stagnation) in Xabclib was switched off. The value of m was controlled outside the Xabclib library. The right-hand vector b of Ax = b was determined such that x is an n-dimensional vector of (1.0 … 1.0), and b is the product of A and x. The components of the randomly generated initial vectors were evenly spread between −1.0E+1 and 1.0E+1. For each trial, five randomly generated initial vectors were prepared and GMRES(m) calculations were performed. The following four cases were examined.
successful convergence for torso1 typically occurs when m exceeds 180, even with a good preconditioner (9). Fig. 3 show the typical convergence curves out of the numerical experiment using crossover algorithms (A), (B), (C), and (D). The convergence curve denoted “S.A.” reveals that the simple average case of (A) ended with stagnation, i.e., the residual did not decrease below 1.0E−4. This trend was almost the same as the behavior in the conventional non-GA calculation. The algorithm with the single-point crossover (B) did not show much improvement, as is plotted as “S.P.C.” in Fig. 3. The convergence curve oscillates upward when a crossover takes place, but the residual remains above 1.0E−4 throughout the calculation. However, from the curve “Mut.” in Fig. 3, it can be observed that convergence did occur when mutation was introduced in addition to the simple average algorithm. This is an interesting observation, because introducing disturbance in the crossover process, thereby virtually injecting more diversity in the chromosome pool, appears to have a meaningful impact on the conversion process. An obvious drawback of this algorithm is that the speed of convergence depends on “serendipity” of the random mutation. It is difficult to predict at which point of the iteration the calculation starts to converge. In our trials, the residual reached 1.0e−8 criteria after 40 iterations in the best case, but the worst case required two or more times that number of iterations. Among the four algorithms examined, the most promising was the weighted average algorithm (D), whose convergence curve is shown as “W.A.” in Fig. 3. This approach demonstrated not only a satisfactory convergence with m = 50, which could not be achieved by using the conventional non-GA algorithm, but also much faster and more stable convergence speed than the mutated average algorithm (C). In our multiple trials, convergence to 1.0E−8 could be reached after 14 iterations in the best case, while only 18 iterations were required even in the worst case. The comparison of the cross-over algorithms was conducted using other matrices such as xenon1, sme3Da, and comsol in the Florida Matrix Collection(11), and the weighted average algorithm turned out to be the only promising method among the four to obtain the converged solution in a stable manner.
(A) Simple average: use the average of the two pairing vectors. (B) Single point crossover: use the upper and lower halves of the pairing vectors and combine them. (C) Mutated simple average: use the same simple average algorithm as in (A), while a new kind of mutation is introduced. As the means of the mutation, a few of the newly generated chromosomes are chosen at each crossover process, and a randomly generated disturbance is added to their components. The magnitude of the random disturbance is set to 1/10th of the residual ||b − Ax||. (D) Weighted average: taking a cue from the look-back GMRES(m) algorithm, calculate a new vector from the paired vectors, as shown in Step (c)' of Section 2.2, where x(l −1) is interpreted as the subtraction of one of the paired vectors from the other at the (l − 1)th restart stage. Note that the weighted average will become identical to the simple average when is fixed to 0.5. In this study, the GA used four chromosomes of randomly generated initial vectors. The chromosomes were numbered from 1 to Nc (i.e., = 4) in order of their corresponding residuals (with 1 being the smallest), then (1, 2), (1, 3), (1, 4), and (2, 3) were paired to create the chromosomes for the next generation. The restart parameter m was set to 50. Special attention should be drawn to the fact that this parameter is quite small considering that 3
IEEJ Trans. ●●, Vol.●●, No.●, ●●●
Proposal of a Genetic Algorithm-applied GMRES(m) method (Nobutoshi Sagawa et al.)
Table 1 Matrix
Impact of Applying GA to GMRES(m) m
30
torso1 (n=116,158) (no of non zero elements; 8.5M)
50
200
Fig. 3.
Convergence curve for various crossover algorithms
30
(S.A: Simple Average, S.P.C.: Single Point Crossover, Mut.: Mutated, W.A.: Weighted Average)
4.
sme3Da (n = 12,504) (no of non-zero elements; 0.8M)
Numerical Experiments with Sample Matrices
4.1 Experimental settings Using four sample matrices from the sparse matrix collection of University of Florida, we conducted more extensive numerical experiments on the proposed GA-based GMRES(m) method using the weighted average crossover algorithm. The configuration of the experiment was the same as in Section 3.2. The matrices torso1, sme3Da, xenon1, and comsol were selected and Nc was set to 1, 4, and 8, respectively. Three different values for the restart parameter m were selected for each matrix. The iterations in every GA generation were executed until stagnation, which was defined to be when the decrease in the residual became less than 3% of that at the previous iteration. After all chromosomes stagnated, the crossover algorithm was activated to create chromosomes for the next generation. For each combination of m and Nc, five trials were performed using randomly generated initial chromosomes. A trial was believed to have converged when the convergence criterion ||Ax-b||/||b|| < 10−8 was met. When the criterion was not satisfied after 500 restart iterations, the execution was deemed to have failed to converge. 4.2 Results The results from the experiments are shown in Table 1. The number in the “Iteration” column indicates the total restart iteration count required to obtain a converged solution, i.e., the normalized residual became less than 10 −8. This number corresponds to a multiplication of iterations in Steps (b) and (d) in the algorithm shown in Section 3.1. The number displayed is the rounded average of the five trials that began with randomly generated initial vectors (chromosomes). In this column, “NCV” stands for “no convergence.” The e-time column shows the execution time in seconds (I/O excluded) for each trial until convergence. The execution time corresponding to the best performance for each matrix is indicated with underlined boldface. Comparing the Iteration and e-time columns for m = 1 with those for m = 4 and m = 8 for the same matrix, we notice that applying GA is effective for decreasing the iteration count and thus, reducing the execution time. Recall that the result for m = 1 corresponds to the result for conventional non-GA execution. However, the impact of applying GA was affected by the nature of the matrix; different convergence pattern occur when values of m were changed. There are two groups of matrices.
50
200
200
xenon1 (n = 48,600) (no of non-zero elements; 1.2M)
250
300
10
comsol (n = 1,500) (no of non-zero elements; 0.1M)
30
60
Nc
Iteration
e-time (sec)
1
NCV
-
4
30
2.5E+1
8
27
2.3E+1
1
NCV
-
4
17
2.0E+1
8
19
2.3E+1
1
19
1.2E+2
4
14
8.7E+1
8
13
8.1E+1
1
NCV
-
4
110
1.0E+2
8
65
6.2E+1
1
159
2.6E+2
4
58
9.8E+1
8
36
6.1E+1
1
18
1.4E+2
4
13
9.9E+1
8
12
9.2E+1
1
NCV
-
4
254
4.1E+2
8
219
3.5E+2
1
326
7.7E+2
4
153
3.6E+2
8
153
3.6E+2
1
159
5.1E+2
4
69
2.2E+2
8
72
2.3E+2
1
565
1.5E+1
4
63
1.6E0
8
68
1.8E0
1
63
5.2E0
4
32
2.6E0
8
29
2.4E0
1
22
3.9E0
4
10
1.8E0
8
10
1.9E0
The first two matrices in Table 1, torso1 and sme3Da, are the matrices of the first group in which convergence occurs at a much smaller value of m than that for the non-GA case. With GA, the execution time is six times lesser in torso1 (2.0E+1 vs 1.2E+2) and less than half in sme3Da (6.1E+1 vs 1.4E+2). This tendency is desirable as it can contribute not only to shortening the execution time but also to reducing memory usage because of the smaller value of m. Memory reduction is approximately 75% for both torso1 and sme3Da. 4
IEEJ Trans. ●●, Vol.●●, No.●, ●●●
Proposal of a Genetic Algorithm-applied GMRES(m) method (Nobutoshi Sagawa et al.)
Table 2.
so visible. The major contribution to the performance improvement appeared to derive due to the impact of GA only.
Combination of GA-GMRES(m) with Look-Back Method
Stagnation parameter
non -GA (a)
non-GA + Look-back (b)
Ratio (a)/(b)
GA (c)
GA + Look-back (d)
Ratio (c)/(d)
torso1 (m = 30)
NCV
NCV
-
30
29
1.0
sme3Da (m = 50)
159
120
1.3
58
36
1.6
xenon1 (m = 300)
159
93
1.7
69
37
2.0
comsol (m = 30)
63
40
1.6
32
18
1.8
5. Optimization of the Stagnation Threshold Parameter and the Krylov Subspace Parameter m In the GA-based GMRES(m) method described above, the convergence threshold parameter was fixed at a certain value, viz., the calculation was deemed stagnated, if the residual decrease became less than 3% of the previous iteration at each generation. A fixed Krylov subspace value for m was also used throughout the calculation. However, a closer observation of the convergence curves revealed that the convergence trend differed depending largely on the type of matrices used. In the sme3Da case, the convergence curve flattened at an early stage, while in the xenon1 case, it tended to maintain a degree of steepness. This suggests that adjusting the occurrence of the GA intervention and GMRES subspace m to a more appropriate timing and value, respectively, may result in better convergence. 5.1 Influence of the stagnation threshold parameter We conducted a numerical experiment to understand the influence of the stagnation threshold parameter on convergence. The configuration of the numerical experiment is the same as in Section 3.2. For each trial, five randomly generated initial vectors were prepared and GMRES(m) calculations were performed. The test matrices were torso1, sme3Da, xenon1, and comsol. The stagnation threshold parameter was scanned from 0.01 to 1.0. The results are shown in Table 3. The figures in the table are the five-time average of the number of restart stages included in each calculation, which is proportional to the requested computational time for each matrix. “NCV” stands for “no convergence.” From Table 3, we can notice that although the stagnation threshold parameter shows an influence over the convergence speed, the degree of influence is not sensitive to the parameter value. The iteration count until the final convergence stays relatively stable when the threshold parameter is varied between 0.03 and 1.0. This result suggests that paying the extra cost involved to search for the optimum threshold parameter is unlikely to be rewarded.
The other group of matrices are xenon1 and comsol; there is little improvement in convergence with changes in the value of m, but they still have good performance, because the iteration count can be decreased for the same m. The performance improvement was more than double for both xenon1 at m = 300 (2.2E+2 vs 5.1E+2) and comsol at m = 60 (1.8E0 vs 3.9E0). No memory reduction can be expected in this group due to no change in m. In contrast, the impact of increasing the number of chromosomes is not as clear in either of the groups. Iteration counts for Nc = 4 and Nc = 8 do not show substantial difference for most of the matrices, even when the value of m changed. 4.3 Combination of GA-GMRES(m) with Look-back method One of the virtues of the proposed GA-GMRES(m) method is that it is orthogonal to most of the existing acceleration techniques that are designed for use within one thread (i.e., one chromosome), such as the look-back types of GMRES(m) method(4),(5),(6), whereby allowing our GA-based approach to be coupled with those techniques to expect synergetic effects. For demonstrating the above mentioned orthogonal nature, we conducted additional numerical experiments in which GA-GMRES(m) and a look-back algorithm were coupled. For each crossover process, the modifications of the intermediate initial vectors were doubly applied; i.e., two intermediate results in the current and previous stages were referred back to for improving the next initial vector within each chromosome (such as the look-back algorithm shown in section 2.2 (c)'), then the pairs of the improved initial vectors across chromosomes were further utilized to refine the vectors using the weighted average algorithm. The experimental results are listed in Table 2. The number in each cell indicates the restart counts (five times average) till convergence. The columns of “non-GA” and “non-GA + Look-back” show the results for the conventional GMRES(m) method with and without look-back modification. Likewise, the columns “GA” and “GA + Look-back” are the results when GA is applied, with and without look-back modification. The results show that the effect of the GA application and look-back modification are orthogonal as expected; thus, the resultant “GA + Look-back” gives rise to the convoluted performance of the two algorithms. The acceleration ratio of the look-back modification is inserted in the “(a)/(b)” and “(c)/(d)” columns of Table 2. The comparison of these two columns shows that the combination of GA and look-back modification is even more effective than the introduction of the same technique in the conventional case. Among the four tested matrices, torso1 exhibited slightly different behavior where the impact of the look-back modification was not
Table 3.
Influence of the stagnation parameter
Stagnation parameter
non -GA
0.01
0.03
0.1
1.0
torso1 (m = 30)
NCV
50
30
30
30
sme3Da (m = 50)
159
129
58
46
51
xenon1 (m = 300)
159
80
69
69
70
comsol (m = 30)
63
75
32
27
27
5.2 Chromosome-wide stagnation So far, stagnation in the calculation process has been described as the status wherein the decrease of the residual becomes less than the fixed value for each chromosome at each stage of GA generation. Hereafter, this status is defined as “local stagnation.” As mentioned previously, when local stagnation is detected across all the chromosomes, the crossover process is initiated to resolve the locally stagnated status. However, there are cases in the convergence process in which the residual of the GMRES(m) shows little progress despite the fact that frequent GA intervention is taking place. This status is 5
IEEJ Trans. ●●, Vol.●●, No.●, ●●●
Proposal of a Genetic Algorithm-applied GMRES(m) method (Nobutoshi Sagawa et al.)
hereafter defined as “chromosome-wide stagnation.” Fig. 4 exemplifies a typical convergence curve for sme3Da at m = 50. A cross marker on the curve corresponds to an intervention of a crossover process as a result of local stagnation. We can observe several areas in the convergence curve where the convergence shows little progress despite the presence of heavy crossing intervention. These areas correspond to the previously mentioned chromosome-wide stagnation. In this figure, three stagnated areas can be observed, identified as such on the convergence curve of Fig. 4.
Chromosome: 0.3029526E−06 0.2902925E−06 0.2810679E−06 0.2693219E−06 0.2623717E−06 Chromosome: 0.2959425E−06 0.2900377E−06 Chromosome: 0.7687083E−06 0.6848289E−06 0.6389537E−06 0.6380865E−06
No.2
No.3
No.4
(a). without chromosome-wide stagnation Generation: #8 Chromosome: 0.2461232E−06 0.2455979E−06 Chromosome: 0.2461134E−06 0.2455717E−06 Chromosome: 0.2460692E−06 0.2456790E−06 Chromosome: 0.2462488E−06 0.2457450E−06
No.1
No.2
No.3
No.4
(b). with chromosome-wide stagnation
Fig. 4.
Fig. 5.
Chromosome-wide stagnation appearing on the conversion curve
5.4 Adjusting m at chromosome-wide stagnation Taking measures to minimize the impact of chromosome-wide stagnation is believed to be effective in further accelerating the convergence. It allows the use of non-costly calculation when the convergence is occurring, intervening with the more expensive calculation only when the calculation experiences chromosome-wide stagnation. To fulfill this objective, an approach of automatically adjusting the subspace value m is examined; i.e., m is increased when chromosome-wide stagnation is encountered, and gets reverted to the original when the stagnation has been resolved. We conducted yet another numerical experiment to investigate the efficiency of this approach. The configuration of the experiment was the same as that specified in Section 3.2, with sme3Da, xenon1, and comsol1 chosen as the testing matrices. This time torso1 was excluded from the test matrices because GA-GMRES(m) already performed so well on this matrix that there was no chromosome-wide stagnation taking place. The starting subspace parameter m was set to 30 (sme3De), 10 (comsol), and 200 (xenon1), and then increased by 30, 10, and 100, respectively while chromosome-wide stagnation was detected, until it hits the maximum m of 120 (sme3De), 40 (comsol), and 300 (xenon1). The local stagnation threshold was set to 1.0 for all the testing matrices as was made clear for the experiment in section 5.1. Fig. 6 shows the convergence curves obtained from the sme3Da matrix experiment. The top and bottom curves correspond to the cases in which m is fixed to 30 and 120, respectively, and the curve in-between is the result of dynamic intervention of m as it increased gradually from 30 to 60, 90, and eventually 120, while chromosome-wide stagnation persisted. The black marks on the graph indicate that the m value increased at those points. It should
5.3 Stagnation detection To decide the best timing and the most appropriate way for the crossover process to occur, it is desirable to detect first whether the calculation has fallen into a state of chromosome-wide stagnation. One method is to compare the value of the residual between the current stage and the previous stages in one chromosome and then compare the difference in residuals for all the chromosomes. However, there is a simpler and cheaper way for detecting chromosome-wide stagnation. When all the chromosomes have come to a state of stagnation, and the decrease in the residual is stuck after the crossover process, the value of the residual itself (rather than the difference from the previous stage) becomes almost uniform across all the chromosomes. This implies that all of the chromosomes become similar in nature, resulting in a less effective crossing over of the chromosomes. Figs. 5(a) and (b) show the development of the residual in the numerical experiment, as in the case plotted in Fig. 4. The first sequence in Fig. 5(a) is the residual convergence when no chromosome-wide stagnation is observed (in this case, Generation #3). We can observe that the pattern of the residual reduction is unique for each chromosome. In contrast, Fig. 5(b) shows the same sequence when chromosome-wide stagnation is taking place (Generation #8). We can observe that the residuals of all chromosomes become almost identical, implying that the characteristics of the chromosomes resemble each other. Generation: #3 Chromosome: 0.2689048E−06 0.2638042E−06
Residual convergence progress
No.1
6
IEEJ Trans. ●●, Vol.●●, No.●, ●●●
Proposal of a Genetic Algorithm-applied GMRES(m) method (Nobutoshi Sagawa et al.)
be noted that the m increased only when chromosome-wide stagnation was recognized, thus enabling a smaller m to be used whenever convergence progressed without stagnation. It is also evident from a comparison of the three graphs that an increasing m contributes to a tendency for accelerated convergence. The steepness of the curve for the dynamic intervention case got closer to that of the case with m fixed to 120.
the dynamic variation of m within 200–300 yields only an intermediate result compared with the results for m = 200 and 300. However, even in this case, introduction of the dynamic adjustment of m caused no harm in convergence because the e-time was above the average of m=200 and 300. To further clarify the advantage of introducing the concept of the chromosome-wide stagnation, we conducted an experiment for the comparison of our proposed approach against the conventional m adjustment based on the local stagnation(11), where the conventional method calculates the ratio between the residuals of the previous and current restart stages, i.e., p(l) = ||b(l) − Ax(l)|| / ||b(l-1) − Ax(l-1)|| and recognizes stagnation if p(l) becomes close to 1.0 within a given threshold. In Table 5, the comparison on the restart iteration counts and execution time obtained via the conventional local stagnation detection and the chromosome-wide detection is shown. Table 5.
Fig. 6.
matrix m
Convergence curve for sme3Da for various m values
Tables 4(a) to 4(c) show the performance results of the numerical experiment using the above approach for all the three tested matrices. In the sme3Da case (Table 4(a), column 30–120), the value of m starts from 30 and is increased to 60, 90 and 120 while chromosome-wide stagnation persists, and it is then pushed back to 30 when the stagnation is resolved. The results for fixed m values (30, 60, and 120) are provided in the same table. From these results we can understand that increasing the value of m with appropriate timing is effective, because it outperforms the fixed m = 120 case by 30%. Likewise, the m adjustment strategy works effectively for the comsol1 matrix, increasing its performance by approximately 30% when compared with the execution time for the fixed value of m = 40 case. Table 4.
60
120
30–120
e-time (sec)
1.0E+2
9.8E+1
7.6E+1
5.8E+1
Number of restart stages
101
45
19
30
(b). comsol1 m
10
20
40
10–40
e-time (sec)
1.6E0
1.4E0
1.3E0
1.0E0
Number of restart stages
92
42
20
36
200
300
200–300
e-time (sec)
4.0E+2
1.8E+2
2.2E+2
Number of restart stages
254
70
123
comsol 10−40
algorithm
local
c-wide
local
c-wide
local
c-wide
e-time (sec)
9.2E+1
5.8E+1
2.6E+2
2.2E+2
1.3E0
1.0E0
Number of restart stages (a)
30
30
115
123
27
36
Number of m adjustment (b)
26
14
102
32
20
12
Adjustment Ratio (b)/(a)
0.87
0.47
0.89
0.26
0.74
0.33
6.
Conclusion
This paper presented an approach for improving the efficiency of solving linear systems by applying a GA to the GMRES(m) method. For each restart process in GMRES(m), the initial vectors are regarded as chromosomes. When the restart process stagnates, the GA performs a crossover on chromosomes to create new chromosomes for the next restart stage in which a weighted average algorithm is used to perform the crossover process effectively. The proposed GA-based GMRES(m) method successfully exhibited acceleration in the calculation time ranging from 15% to 600% on the sample test matrices. To further enhance the convergence performance, the concept of “chromosome-wide
(c). xenon1 m
xenon1 200−300
Although we followed the recommended threshold value to detect local stagnation, the conventional stagnation detection, which looks at only in-chromosome convergence, recognized the stagnation too sensitively because it tended to pick up the local fluctuations in the convergence curve. The “Adjustment Ratio” row of the table reveals that the conventional m adjustment is activated at approximately 80% of the restart stages. It contributed to lessen the number of restart stages, but the calculation itself ended up with a longer execution time because of too much intervention. Conversely, in our approach of detecting the chromosome-wide stagnation, the intervention of an increased m was more moderate and timely and it gave better performance by choosing a more appropriate value of m in accordance with the progress of calculation.
(a). sme3Da 30
sme3Da 30−120
local: local stagnation, c-wide: chromosome wide stagnation
Convergence performance for various m values m
Comparison of adjusting m for local and chromosome-wide stagnation
In contrast, the effect of increasing m at chromosome-wide stagnation is not as clear in the xenon1 case (Table 4(c)) because 7
IEEJ Trans. ●●, Vol.●●, No.●, ●●●
Proposal of a Genetic Algorithm-applied GMRES(m) method (Nobutoshi Sagawa et al.)
stagnation” was introduced by enabling on-the-fly detection of a slowdown in convergence of the GA process. A possible way to adjust the m value automatically at the onset of such stagnation has been proposed, which contributes a further 20% improvement in the convergence speed. Future work includes activities such as the following; (a) application of the same GA technique to other matrix solution methods, especially BiCGStab (Bi Conjugate Gradient Stabilized method), (b) implementation of GA-GMRES(m) method on an actual cloud environment such as AWS(14) to examine the impact of increasing the number of chromosomes to a much larger value, and (c) provision of an easy to use API to the proposed GA-GMRES(m) method so that it can be readily linked to end user’s program.
Nobutoshi Sagawa (Non Member) He has received a master's degree in engineering from the University of Tokyo in 1985. He joined Central Research Laboratory, Hitachi Ltd. in 1985, and was engaged in the development of a numerical simulation language for physical problems. Between 2010 and 2012 he took up an assignment as the CTO of Hitachi Asia Ltd in Singapore to oversee cutting-edge projects in South East Asia. He is presently a postgraduate student at Osaka University. He has also been serving as the executive officer and general manager at the R&D division of Hitachi Systems Ltd. since 2013. His technical interests include numerical simulation, control systems, database systems, cloud computing, and corporate technology management. He is a member of IPSJ. Norihisa Komoda (Fellow) He has been a professor at the Department of Multimedia Engineering, Graduate School of Information Science and Technologies, Osaka University since 2002. He received his bachelor’s and master’s degree in electrical engineering in 1972 and 1974, respectively, and doctorate in 1982, from Osaka University. He had worked with Systems Development Laboratory, Hitachi Ltd. from 1974 to 1991. He worked at the Department of Engineering Systems, UCLA as a visiting faculty from 1981 to 1982. In 1991, he moved to Faculty of Engineering, Osaka University. His research interests include business information systems, e-commerce systems, and knowledge information processing. He received the IEEJ Technical Development Award in 2000 and 2006, the IEEJ Distinguished Paper Award in 1998, the Award for Outstanding Technology, SICE, in 1987, and the Award for Outstanding Paper, SICE in 1986.
References (1) (2)
(3) (4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12) (13) (14)
Y. Saad: “Iterative methods for sparse linear systems”, SIAM, (1996). J. Erhel, K. Burrage, and B. Pohl: “Restarted GMRES preconditioned by deflation”, Journal of Computational and Applied Mathematics Vol.69, pp.303-318 (1996). J. Baglama and L. Reichel: “Augmented GMRES-type methods”, Numerical Linear Algebra with Applications Vol.14, pp.337-350 (2007). A. Imakura, T. Sogabe and S.L. Zhang: “A look-back technique of restart for the GMRES(m) method”, Applied Linear Algebra - in honor of Hans Schneider, Novi-Sad (2010). A. Imakura, T. Sogabe, S.L. Zhang: “A Look-Back-Type Restart for the Restarted Krylov Subspace Methods to Solve Non-Hermitian Linear Systems”, Technical Report of Department of Computer Science, University of Tsukuba (CS-TR), CS-TR-13-22 (2013). M.A. Gomes-Ruggiero, V.L. Rocha Lopes, and J.V. Toledo-Benavides: “A safeguard approach to detect stagnation of GMRES(m) with applications in Newton-Krylov methods”, Computational & Applied Mathematics, Vol. 27, No.2, pp.175-199 (2008). I. D. Mishev, N. Fedorova, S. Terekhov, B. L. Beckner, A. K. Usadi, M. B. Ray, and O. Diyankov: “Adaptive control for solver performance optimization in reservoir simulation”, 11th European Conference on the Mathematics of Oil Recovery, Bergen, Norway, September 8–11 (2008). V.Y. Voronov and N.N. Popova: “Automatic performance tuning approach for parallel applications based on sparse linear solvers”, Parallel Computing: from multicores and GPU's to petascale, B. Chapman et. al. (Eds.), IOS Press, pp. 415-422 (2010). K. Naono, N. Zakaria, T. Sakurai, A. J. Pal, and N. Sagawa: “Genetic algorithm-based sparse matrix iterative solvers for efficient simulation platforms”, The 1st Asian conference on Information Systems, December (2012). N. Sagawa, N. Komoda, and K. Naono: “Improvement in performance of GMRES(m) method by applying a genetic algorithm to the restart process”, Proc. CICS, pp. 466-471 (2014). A.H. Baker, E.R. Jessup, and Tz.V.Kolev: “A simple strategy for varying the restart parameter in GMRES(m)”, Journal of Computational and Applied Mathematics, Vol.230, pp.751-761 (2009). University of Florida sparse matrix collection homepage. http://www.cise.ufl.edu/research/ /matrixes/index.html. Xabclib homepage. http://www.abc-lib.org/ Xabclib/index.html. Amazon Web Services homepage. http://aws.amazon.com
Ken Naono (Non-Member) He received his master’s and doctorate degrees from Kyoto University in 1994 and University of Electro-Communications in 2006, respectively. He joined Central Research Laboratory, Hitachi, Ltd. in 1994. He has been engaged in research and development of matrix libraries for parallel supercomputers. From 2012 to 2014, he was the Chief Researcher in the R&D department, Hitachi Asia Malaysia Sdn Bhd and worked as a visiting faculty at Universiti Teknologi Petronas. He has also been engaged in the research and development of auto-tuning and business monitoring systems. He received the 2000 Joint Symposium on Parallel Processing (JSP) Best Paper Award, the 2006 Japan Society for Computational Engineering and Science (JSCES) Young Researcher Award, and the 2009 Kanto Local Commendation for Invention. He is a member of IPSJ, JSIAM, JAFEE, and JSCES.
8
IEEJ Trans. ●●, Vol.●●, No.●, ●●●