Using Massively Parallel Computations for Absolutely ... - Springer Link

3 downloads 41900 Views 223KB Size Report
ISSN 0005-1179, Automation and Remote Control, 2012, Vol. ... A broad experience in solving the linear programming problems has been gained to date. There.
ISSN 0005-1179, Automation and Remote Control, 2012, Vol. 73, No. 2, pp. 276–290. © Pleiades Publishing, Ltd., 2012. Original Russian Text © A.V. Panyukov, V.V. Gorbik, 2012, published in Avtomatika i Telemekhanika, 2012, No. 2, pp. 73–88.

LINEAR AND NONLINEAR PROGRAMMING PROBLEMS

Using Massively Parallel Computations for Absolutely Precise Solution of the Linear Programming Problems A. V. Panyukov and V. V. Gorbik South-Ural State University, Chelyabinsk, Russia Received June 6, 2011

Abstract—Consideration was given to the approaches to solve the linear programming problems with an absolute precision attained through rational computation without roundoff in the algorithms of the simplex method. Realization of the modified simplex method with the use of the inverse matrix was shown to have the least spatial complexity. The main memory area sufficient to solve the linear programming problem with the use of rational computations without roundoff is at most 4lm4 + O(lm3 ), where m is the minimal problem size and l is the number of bits sufficient to represent one element of the source data matrix. The efficiency of parallelization, that is, the ratio of acceleration to the number of processors, was shown to be asymptotically close to 100 %. Computing experiments on practical problems with the sparse matrix corroborated high efficiency of parallelization and demonstrated the advantage of the parallel method of inverse matrix. DOI: 10.1134/S0005117912020063

1. INTRODUCTION A broad experience in solving the linear programming problems has been gained to date. There are algorithms oriented to the sparse matrices that are adapted to the least accumulation of errors owing to the floating-point computations. Apart from the programs solving directly the linear programming problems, there are modeling systems such as the free products Zimpl [1] and IPAT-S [2]. They assist in formulating the problem in convenient and intuitively understandable terms, representing it in a special—usually MPS [3]—format for subsequent transfer to a program solving the problem directly (this program can run on another, more efficient special platform), and analyzing the results obtained. The list of the professional commercial software products to solve the linear programming problems is rather long. The most popular of them are presented in [4]. The description of the library Netlib [5] lists the linear programming problems for which the popular products,—in particular, CPLEX [6] and MINOS [7]—either provide appreciably discrepant results or fail at all to determine a solution. In addition to the commercial software, there are some open codes to solve the linear programming problems [4]. Their majority, however, also carries out computations using the floating-point data. For example, GLPK [8] solves many problems efficiently and rapidly , but there are examples of incorrect solutions. All computing paradoxes of the aforementioned popular packages result from the widespread unsubstantiated prejudices generating the computational errors: (1) extension of associativity of the operations of addition and multiplication in the field of real numbers to the finite set of the “real” computer numbers and (2) extension of the property of continuous dependence on the parameters of the solution of the system obtained through the “equivalent” transformations to the original system. 276

USING MASSIVELY PARALLEL COMPUTATIONS

277

Programs based on the symbolic computations represent the exclusions. They include, in particular, the popular commercial products MAPLE and MathCad and two open products EXLP [9] and QSopt-Ex [10]. EXLP appeared in 2002, and to date there exists an open optimized code in C using the library of precise computation GNU MP [11]. QSopt-Ex is a modified version of QSopt [12] where the floating-point operations were replaced by the library GNU MP of precise computations. This program is part of the studies of AT&T Labs [13, 14] on the faultless methods of computations for the linear programming problems. However, the potentialities of the popular packages supporting the symbolic computations do not allow one to solve the real problems of mathematical and simulation modeling because of the great spatial and computing complexity of such support. Parallel and distributed computations allow one to extend size and precision of the problems solved. Analysis of the parallelization means demonstrated the efficiency of the “information-based” parallelization. With this technology, the computing process is based on a unique program run on all processors of the computing system or multiple stations of the local-area network. The program copies can process data subsets on different branches of the algorithm. Synchronization in time and at processing the shared data is a must. This philosophy is used in the Message Passing Interface (MPI) standard. This technology of parallel programming gave rise to the development of corresponding methods. This methodological approach opens the road to the solution of highdimensionality problems and efficient parallel operation of multiple processors. There is a number of programs to solve the linear programming problems which rely on the advantages of parallelization [4]. They all use computations over standard floating-point data. At that, the computations with different numbers of processors often provide essentially different results, which is indicative of the need for evidential computations (see [15–19]). The well-known open library GMP [11] enables one to carry out arbitrary-accuracy computations in the user programs, but the user has to develop its own interface to organize the distributed and parallel computations [20]. The ExactComputational library [21] of the present authors which is an extension of the GMP library offers facilities for using its objects in the parallel and distributed computations. The present authors previously developed algorithms and software for absolutely precise solution of the systems of linear algebraic equations [22] and computation of the generalized inverse Moore–Penrose matrix [23] on the multiprocessor computing systems with the use of the overlong and rational classes from the ExactComputational library [21]. Theoretical and practical study of this software demonstrated high efficiency of the multiprocessor computing systems. The present paper is aimed to develop the software for the faultless rational computations on the parallel and distributed computing systems handling the linear programming problems and to analyze the efficiency of various realizations of the simplex method. 2. TECHNIQUE TO REALIZE THE SIMPLEX METHOD Despite the recently developed polynomial algorithms, in the practical linear programming problems the simplex method remains beyond comparison. Two techniques of its realization such as the method of simplex tables and that of inverse matrix (modified simplex method) are used today (see, for example, [24]). To keep the presentation integral, we demonstrate their distinctions by way of example of the linear programming problem 



max cT x : Ax = b  0, x  0; c, x ∈ Rn ; b ∈ Rm .

(1)

The passage from the general linear programming problem to the representation as an (1) is possible in a time depending linearly on the volume of the source data [24]. AUTOMATION AND REMOTE CONTROL

Vol. 73

No. 2

2012

278

PANYUKOV, GORBIK

2.1. Method of Simplex Tables This method at the kth iteration recalculates the simplex table S

(k)

=

(k) Z (k) = cT B(k) B

XB (k) = B (k)

−1

−1

b

z (k) = −cT + cT B (k) B (k) B (k)

b

−1

−1

A

,

A

where B (k) is the basic matrix comprising all columns of the matrix A relating to the basic variables of the kth iteration (basic columns); cB (k) is the vector of the objective function coefficients relating (k) (k) −1 b to the basic variables of the kth iteration. The value of the upper left cell S00 = Z (k) = cT B(k) B is equal to the value of the objective function on the current basic solution. The remaining part of the left column of the simplex table 



(k)

col Si0 : i = 1, 2, . . . , m = B (k)

−1

b = XB (k)

contains the vector of values of the basic variables of the kth iteration. The remaining part of the upper row   (k) (k) −1 A row S0j : j = 1, 2, . . . , n = z (k) = −cT + cT B (k) B comprises the residue vector of the dual problem. The columns of the matrix S (k) corresponding to the basic variables are the unit vectors, that is, each of them comprises a single nonzero element equal to one. Nonnegativeness of the elements of the row z (k) is the criterion for optimality of the current basic (k) (k) solution. If there exists a nonbasic variable xi : zi < 0, then the condition L = {l : Sli > 0} = ∅ (k) is the unlimitedness attribute of the objective function. If L = {l : Sli > 0} = ∅, then addition of xi to the basic variables results in an increase in the objective function by 

(k)

Δi = −

(XB (k) )l∗ × zi (k)

Sl ∗ i

,

where



l = lex min Arg min l∈L

(XB (k) )l



(k)

Sli

.

(2)

This method of selecting l∗ retains nonnegativeness of the basic solution at the next iteration and rules out cycling if Δi = 0 [24]. The passage from the kth iteration table to that of the (k + 1)st iteration is done for the ith (master) column using the Jordan–Gauss elimination procedure and l∗ as the master row:

(k+1)

Slj

=

⎧ (k) ⎪ Sl∗ j (k) ⎪ (k) ⎪ ⎪ S − (k) Sli , if l = l∗ ⎪ ⎪ ⎨ lj S∗ l i

(k) ⎪ ⎪ Sl ∗ j ⎪ ⎪ ⎪ ⎪ ⎩ (k) ,

Sl ∗ i

l = 0, 1, 2, . . . , m; j = 0, 1, 2, . . . , n.

(3)

if l = l∗ ;

One can readily see that execution of the iteration including recalculation of the simplex table requires at most m + n divisions and comparisons, as well as at least m(n + 1) additions and multiplications. The spatial algebraic complexity of the tabular simplex method is defined mostly by the number of operands in the simplex table, that is, is equal to 2mn + O(m). 2.2. Method of the Inverse Matrix −1

In distinction to the above method, it is the matrix B (k) that is recalculated here at each iteration instead of the recalculation of the simplex table. The inverse matrix enables an easy AUTOMATION AND REMOTE CONTROL

Vol. 73

No. 2

2012

USING MASSIVELY PARALLEL COMPUTATIONS

279

−1

determination of the corresponding solution y(k)T = cT B (k) for the current basis. The current B (k) basis is optimal if the corresponding dual solution is permissible: y(k)T A  cT . If in the matrix A there is a column Ai(k) : y(k)T Ai(k) < ci(k) , then its addition to the basic columns increases the objective function. The vector g = B (k)

−1

Ai(k) is the image of column Ai(k) in the simplex table S (k) . Therefore,

(XB (k) )l gl >0 gl



r = lex min Arg min l:

(4)

is the master row defining the column outgoing from the basis, and the new values of the basic variables are as follows: ⎧ gl ⎪ (XB (k) )l − (XB (k) )r ⎪ ⎨

(XB (k+1) )l =

if l = r

gr

l = 1, 2, . . . , m.

⎪ (XB (k) )l ⎪ ⎩

if l = r;

gr



(k) m

The basic matrices B (k) = blj



l,j=1

elements of the matrix B (k)

−1

(k)

= βlj

(k+1) βlj

(k+1) m

and B (k+1) = blj

Therefore, the elements of the inverse matrix B (k+1)

m

l,j=1

=

(5)

−1

differ only in the rth column.

l,j=1 m (k+1)

= βlj

l,j=1

are calculated from the

in the following manner:

⎧ (k) gl (k) ⎪ ⎪ ⎨ βlj − gr βrj

if l = r

1 (k) ⎪ ⎪ ⎩ βrj

if l = r.

gr

(6)

We estimate the algebraic computing complexity of the method under consideration. It is evident that in the general case all n constraints on the dual problem must be verified at the final iteration, which requires mn multiplications and m(n − 1) additions. For ascertainment of impermissibility of the dual solution at the intermediate iteration, this value usually is much smaller. Recalculation of the inverse matrix requires at most m divisions, m2 multiplications, and m2 − 1 additions/subtractions. Since m < n, the costs of the computing resources for recalculation of the inverse matrix may be much lower than for recalculation of the simplex table. The spatial algebraic complexity of the method of inverse matrix is defined by the number of elements in the source data and in the inverse matrix, that is, it is equal to mn + m2 + O(m) or, stated differently, is greater than in the tabular simplex method. It deserves noting that the algebraic computing complexity estimates adequately the used computing resources only in the cases where all operands need the same memory area to represent them,—for example, if using the standard floating-point data types. The memory area required by the operands at execution of rational computations varies dynamically (usually grows). Therefore, the memory size that is sufficient to solve the problem and the number of elementary operations over bits are in this case adequate estimates of the computing complexity. These values are called the bit computing complexity (spatial and computing, respectively). 2.3. Bit Complexity of the Absolutely Precise Realization of the Simplex Method Practical realizability of the roundoff-free computations,—in particular, faultless solution of the linear programming problem—is defined by the resources required for computation such as the number of main memory bits and the number of operations over bits. We denote by sc(λ) AUTOMATION AND REMOTE CONTROL

Vol. 73

No. 2

2012

280

PANYUKOV, GORBIK

the number of bits required to represent the object λ and by cc(λ) the number of bit operations executed at determining the representation of λ. For example, the number of bits required to represent the integers z ∈ Z \ {0} is sc(0) = 1, sc(z) = log2 |z|. Here and below r denotes the least integer not smaller than r ∈ R . The number of bits required to represent the rational number r = p/q, p, q ∈ Z, is estimated from above as sc(r)  O(sc(p) + sc(q)). It is easy to make sure that if sc(p), sc(q) is the memory volume required to represent the rational numbers p, q, then the memory size for representation of the arithmetic operation ◦ ∈ {+, −, /, ×} over numbers is sc(p◦q)  sc(p)+sc(q). For the bit computing complexity of executing the operation ◦ ∈ {+, −, /, ×} with the use of the classical arithmetic algorithms (column multiplication/division), the estimate cc(p ◦ q) ≤ O (sc(p)sc(q)) is valid. The fast multiplication algorithms provide the estimate cc(p ◦ q) ≤ (sc(p) + sc(q)) that will be used in what follows. We estimate the number of main memory bits sufficient to solve the linear programming problem using roundoff-free computations. Since both the elements of the simplex table and the elements of the inverse matrix are solutions of the systems of algebraic linear equations, we first determine the number of bits required to represent the determinant of a matrix with elements having the predefined upper estimate of the spatial complexity. Proposition 1. Let B = (bij ) be an m × m integer matrix, l=

max

i,j=1,2,...,n

sc(bij ).

Then, sc(det B)  n(log2 m + l). Proposition 2. Let B = (bij ) be an m × m rational matrix, l=

max

i,j=1,2,...,m

sc(bij ).

Then, sc(det B)  m(log2 m + (2m + 1)l). Propositions 1 and 2 are proved in the Appendix. Since at all iterations of the simplex method the basic matrix B is a submatrix of the original (m × n) matrix A = (aij ), at all iterations of the simplex method the bit spatial complexity of any element of the basic matrix does not exceed l=

max

i=1,2,...,m; j=1,2,...,n

sc(aij ).

The elements of the simplex table and the inverse matrix are subjected to the Jordan–Gauss transformation. All elements of these objects are determined from the solution of the corresponding equation system with the current basic matrix B. Therefore, the Kramer formulas for the systems of linear algebraic equations and the above propositions give rise to corollary. Corollary. If one numerical element of the source data has the spatial complexity at most l, then: (1) the elements of the simplex table and the inverse matrix have spatial complexity at most 4lm2 + 2lm + 2m log2 m; (2) the columns of the simplex table and the inverse matrix have spatial complexity at most 4lm3 + 2lm2 + 2m2 log2 m; (3) 4lm3 n + O(lm2 n) bits suffice to represent the simplex table; (4) 4lm4 + O(lm3 ) bits suffice to represent the inverse matrix. Since m < n, from the point of view of the bit spatial complexity of the simplex method the method of inverse matrix must be preferred. To analyze the computing complexity and efficiency of parallelization of the simplex method, we consider its parallel versions. AUTOMATION AND REMOTE CONTROL

Vol. 73

No. 2

2012

USING MASSIVELY PARALLEL COMPUTATIONS

281

3. PARALLEL VERSIONS OF THE SIMPLEX METHOD As was noted in the Introduction, there are schemes of parallelization of the linear programming algorithms. Many techniques of parallelization of the simplex method are cited in [25]. The proposed here schemes of simplex method parallelization are oriented to the precise rational computations. They practically eliminate latency and reduce spatial and computing complexity, as well as the amount and volume of the interprocess exchanges. 3.1. Method of Simplex Tables A concrete example of parallelization of the tabular simplex method is given in [26]. It relies on the row-wise decomposition of the simplex table data. This decomposition, however, leads to an unjustified increase in the volume of the interprocess exchange data (the row of n > m must be transmitted) and occurrence of latency (process 0 seeks a candidate to be introduced into the basis, the rest of the processes are idling). The present paper proposes a column-wise decomposition. Owing to small overhead that is due to the fact that the n-dimensional vector of the basic variables is located in each process, this scheme of parallelization enables one to do without latency and confine oneself to transmitting a column of m < n values. In the case of precise rational computations this may provide appreciable savings in the computing resources. So, we carry out the column-wise decomposition of the simplex table into blocks whose number is equal to the number N of processors. Except for the leftmost column, all other columns of the simplex table are shared in equal proportions between the N processes, the leftmost column, that is, the vector of values of the basic variables and the value of the objective function on it being sent to all processes and processed independently. Table 1 presents an example of decomposing the simplex table S into blocks S(K), K = 1, 2, . . . , N . Table 1. Decomposition of the simplex table to processors S00 = Z S10 = XB 1 S20 = XB 2 .. . Sm0 = XB m

Process K = 1, 2, 3, . . . , N z (K−1)n +1 z (K−1)n +2 N

N

S1  (K−1)n +1 N S2  (K−1)n +1 N .. . Sm  (K−1)n +1 N

S1  (K−1)n +2 N S2  (K−1)n +2 N .. . Sm  (K−1)n +2 N

···

z Kn 

··· ··· .. . ···

S1  Kn  N S2  Kn  N .. . Sm  Kn 

N

N

The above conventions suggest the following parallel realization of iteration of the simplex table method. Algorithm TS. kth iteration. • Data: simplex table S(K) for each process K = 1, 2, 3, . . . , N . • Step 1. For each process K = 1, 2, 3, . . . , N , —determine a column iK : ziK < 0; —if the column iK was not found, assume that C(K) = Δ(K) = iK = 0 and go to Step 2; 

—seek the row lK = lex min Arg



min

l: Sl iK >0

(XB )l Sl iK



;

—if the row lK was not found, complete all processes and return ”Not bounded”; AUTOMATION AND REMOTE CONTROL

Vol. 73

No. 2

2012

282

PANYUKOV, GORBIK

—calculate ΔiK = −(XB )lK ziK /SlK iK , assume that Δ(K) = ΔiK , C(K) = iK , and go to Step 2. Comments. At this step, it is either established that the problem is unsolvable or the data are determined for changing the basis—the master column iK as a candidate to be introduced into the basis, the master row lK defining the column taken out from the basis, and the increment of the objective function ΔiK —or lack of such candidates iK = 0 is established. • Step 2. For L = 1, 2, 4, 8, . . . , 2log2 N  , perform the data exchange with the process K + L for each process K = 1, 2, 3, . . . , N whose number satisfies the condition (K − 1) mod (2L) < L, that is, the residue of dividing K − 1 by 2L is smaller than L. —If C(K) = 0, assume that C(K) = C(K + L), Δ(K) = Δ(K + L) and continue computations for the next L. —If C(K + L) = 0, assume that C(K + L) = C(K), Δ(K + L) = Δ(K) and continue computations for the next L. —If Δ(K)  Δ(K + L), assume that C(K + L) = C(K), Δ(K + L) = Δ(K); otherwise, C(K) = C(K + L), Δ(K) = Δ(K + L). —Continue computations for the next L. Comments. Upon completion of the given step, the value of C(K) in each process K = 1, 2, 3, . . . , N is equal to the number of the master process

⎧ ⎨ lex min Arg max Δ if {K : iK = 0} = ∅ iK K∗ = K: iK =0 ⎩

0,

otherwise.

• Step 3. If K ∗ = 0, then for each process K = 1, 2, 3, . . . , N for k=

 (K − 1)n 

N

+ 1,

 (K − 1)n 

N

+ 2, . . . ,

 Kn 

N

assume that 

xk =

(XB )l if (Slk = 1) 0,



(Sik = 0) ∀i = 0, 1, . . . , l − 1, l + 1, . . . , m

otherwise.

Return the optimal solution of the problem x, the optimal value of the objective function Z, and complete the algorithm. Comments. If the current basic solution of the problem is optimal, then the answer is generated and the algorithm is completed. • Step 4. The master process K ∗ transmits to the rest of the processes the master column 

SiK ∗ = ziK ∗ , S1iK ∗ , S2iK ∗ , . . . , SmiK ∗

T

and the number of the master row lK ∗ . • Step 5. Recalculate from (3) the simplex table S(K) for each process K = 1, 2, 3, . . . , N . Go to the next iteration. • End of Algorithm TS. As can be seen from the aforementioned, the description of the tabular simplex method is not much more complicated than that for the sequential algorithm. At that, the parallel “processto-process” communications are used log2 N  times to determine the master process K ∗ , one broadband communication is used to transmit the master column SiK ∗ and the number lK ∗ from the master process to all processes. AUTOMATION AND REMOTE CONTROL

Vol. 73

No. 2

2012

USING MASSIVELY PARALLEL COMPUTATIONS

283

Table 2. Algebraic computing resource of the simplex table method One processor N processors Number of algebraic Number of transmitted Load on process operation operands K = 1, 2, 3, . . . , N

Operator Optimality check Determination of master row Selection of master process Transmission of master column Recalculation of simplex table

[t, N t] 2m + n + 2 — — 2(m + 1)(n + 1)

— — N m+2 —

Total:

cca (TS) = [t, N t] +(2m + 1)n + 2m + 3

m+N +2

t ∈ [1, n/N ] 2m + n/N + 2 log2 N  m+2 2m(1 + n/N ) cca (TS(N )) = t + (n/N )(2m + 1) +5m + 4 + log2 N 

Table 2 compiles the algebraic computing resource required for the sequential and parallel realizations of the simplex table method, that is, the number of the algebraic operations used. It is clear from the table and algorithm’s description that the processors are loaded uniformly. Since any element of the simplex table requires at most 4lm2 + O(lm) bits, the bit computing complexity of recalculation of the simplex table S is cc(S) = (4lm2 + O(lm))cca (TS). For N processors, the processor K = 1, 2, 3, . . . , N load is cc (S(N )) = (4lm2 +O(lm))cca (TS(N )). Consequently, the acceleration from parallelization is equal to [t, N t] + (2m + 1)n + 2m + 3 cc (TS)

. UTS = cc (TS(N )) t + (2m + 1)n/N + 5m + 2 log2 N + 4 With regard for the interval nature of the variable t, it follows that     1 2m + 1 , 1 =N 1− , 1 N. lim UTS = N n→∞ 2m + 2 2m + 2 Therefore, the efficiency of parallelization grows with problem dimension and reaches at the limit a value close to 100 %. 3.2. Method of Inverse Matrix Many parallelization schemes of the inverse matrix method can be found in [25] and the works cited in it. All of them, however, are oriented to the multiplicative representation of the inverse matrix and improved accuracy of computations. At using the rational computations, other problems such as elimination of latency and reduction of the spatial complexity, amount and volume of the interprocess exchanges come to the foreground. The decomposition scheme given in the present paper is aimed at solving these problems, as well as the direct and dual problems. It follows from the description of the inverse matrix method that the maximal effect of parallelization at the search of the variable introduced in the basis is attained at the column-wise decomposition in blocks of the source data—the vector cT and the matrix A—in equal proportions between the processors: c(K)T = ⎛



c (K−1)n +1 c (K−1)n +2 · · · c Kn

N

a1  (K−1)n +1

N

N

,

K = 1, 2, . . . , N ;

(7)



a1  (K−1)n +2 · · · a1 ( Kn ) N N ⎜ ⎟ ⎜ a2  (K−1)n  a2  (K−1)n  · · · a2 Kn ⎟ ⎜ ⎟

( ) N +1 N +2 N A(K) = ⎜ ⎟, .. .. .. .. ⎜ ⎟ . ⎝ ⎠ . . . am  (K−1)n +1 am  (K−1)n +2 · · · am ( Kn ) N N N N

K = 1, 2, . . . , N.

(8)

It is assumed at that that the vector of dual variables y is located in each process K = 1, 2, 3, . . . , N . AUTOMATION AND REMOTE CONTROL

Vol. 73

No. 2

2012

284

PANYUKOV, GORBIK

The effect of parallelization at recalculation of the inverse matrix B −1 is maximal if it is row-wise decomposed into blocks in equal proportions between the processes: ⎛

b (K−1)m +11 b (K−1)m +12 · · · b (K−1)m +1m



N N N ⎜ ⎟ ⎜ b (K−1)m  b (K−1)m  · · · b (K−1)m  ⎟ ⎜

+2 1

+2 2

+2 m ⎟ N N N ⎟, B −1 (K) = ⎜ ⎜ ⎟ .. .. .. .. ⎜ ⎟ . . . . ⎝ ⎠

b( Km +2)2

b( Km )1 N

···

N

(9) K = 1, 2, 3, . . . , N.

b( Km +2)m N

The description of the inverse matrix method suggests advisability of decomposing the vector XB of the basic variables, the vector CB of the coefficients of the objective function at the basic variables, and the vector g with respect to the same row blocks: ⎛

XB (K−1)m +1

N ⎜ ⎜ XB (K−1)m

+2 ⎜ N XB (K) = ⎜ . ⎜ .. ⎝

XB Km





⎟ ⎟ ⎟ ⎟; ⎟ ⎠

N ⎜ ⎜ cB (K−1)m

+2 ⎜ N cB (K) = ⎜ . ⎜ .. ⎝

cB Km

N



cB (K−1)m +1

⎞ ⎟ ⎟ ⎟ ⎟; ⎟ ⎠

N

g (K−1)m +1

N ⎜ ⎜ g (K−1)m ⎜ N +2 g(K) = ⎜ .. ⎜ . ⎝

g Km

(10)

⎞ ⎟ ⎟ ⎟ ⎟; ⎟ ⎠

K = 1, 2, 3, . . . , N.

N

Method (9) of decomposing the inverse matrix by processes allows each process to calculate a subvector of the dual variables

y(K) = B(K)−1

T

cB(K) .

(11)

The vector of dual variables, obviously, is as follows: y=

N 

y(K).

K=1

The aforementioned suggests the following parallel realization of the iteration of the inverse matrix method. IBMS algorithm. kth iteration. • Data: in each process K = 1, 2, 3, . . . , N there are: —vector of dual variables y constructed from the current basic solution of the direct problem; —vector M (K) of the numbers of the basic variables, —vector XB (K) of the values of the basic variables; —vector cB (K) of the values of the coefficients of the objective function at the basic variables; —block A(K) of the constraint matrix; —block c(K)T of the coefficients of the objective function; —block B −1 (K) of the inverse matrix. • Step 1. For each process K = 1, 2, 3, . . . , N , —determine the column iK : ziK = y T A(K)iK − (c(K)T )iK < 0; AUTOMATION AND REMOTE CONTROL

Vol. 73

No. 2

2012

USING MASSIVELY PARALLEL COMPUTATIONS

285

—if column iK is not found, then assume that C(K) = Δ(K) = iK = 0; otherwise, assume that Δ(K) = ziK , C(K) = iK . • Step 2. For L = 1, 2, 4, 8, . . . , 2log2 N  , each process K = 1, 2, 3, . . . , N with the number satisfying the condition (K − 1) mod (2L) < L, that is, the residue of dividing K − 1 by 2L is smaller than L, must exchange data with process K + L: —If C(K) = 0, assume that C(K) = C(K + L), Δ(K) = Δ(K + L) and continue computations for the next L. —If C(K + L) = 0, assume that C(K + L) = C(K), Δ(K + L) = Δ(K) and continue computations for the next L. —If Δ(K)  Δ(K+L), then assume that C(K+L) = C(K), Δ(K+L) = Δ(K); otherwise, assume that C(K) = C(K + L), Δ(K) = Δ(K + L). —Continue computations for the next L. Comments. Upon completion of this step, the value of C(K) in each process K = 1, 2, 3, . . . , N is equal to the number of the column master process

⎧ ⎨ lex min Arg min z if {K : iK = 0} = ∅ iK Kc = K: iK =0 ⎩

0,

otherwise.

• Step 3. If K c = 0, then solve the problem: —assume that x[1 : n] = 0; —for each process K = 1, 2, 3, . . . , N , assume that xM (K)k = (XB (K))k for k=

 (K − 1)m 

N

+ 1,

 (K − 1)m 

N

+ 2, . . . ,

 Km 

N

;

—return the optimal problem solution x, the optimal solution of the dual problem y, and complete the algorithm. • Step 4. Process K c sends the column AiK c and the value of the coefficient ciK c to all processes K = 1, 2, . . . , N . • Step 5. Each process K = 1, 2, . . . , N calculates g(K) = B(K)−1 AiK c and 



r(K) = lex min Arg

min

l: g(K)l >0

XB (K)l h(K, l) = g(K)l



.

If r(K) is not found, assume that C(K) = 0; otherwise, assume that C(K) = r(K), Δ(K) = h(K, r(K)). • Step 6. For L = 1, 2, 4, 8, . . . , 2log2 N  , each process K = 1, 2, 3, . . . , N whose number satisfies the condition (K − 1) mod (2L) < L, that is, the residue of dividing K − 1 by 2L is smaller than L, exchanges data with the process K + L: —If C(K) = 0, then assume that C(K) = C(K + L), Δ(K) = Δ(K + L) and continue computations for the next L. —If C(K + L) = 0, then assume that C(K + L) = C(K), Δ(K + L) = Δ(K) and continue computations for the next L. —If Δ(K)  Δ(K+L), then assume that C(K+L) = C(K), Δ(K+L) = Δ(K); otherwise, assume that C(K) = C(K + L), Δ(K) = Δ(K + L). —Continue computations for the next step L. AUTOMATION AND REMOTE CONTROL

Vol. 73

No. 2

2012

286

PANYUKOV, GORBIK

Comments. Upon completion of this step, the value of C(K) in each process K = 1, 2, 3, . . . , N is equal to the number of the row master process 



K r = lex min Arg

min

K: ∃r(K)

h (K, r(K)) .

• Step 7. The master process K r sends to all processes K = 1, 2, 3, . . . , N the master row r ∗ = r(K r ) of the inverse matrix

(r∗ )

B −1 (K)



=

br∗ 1 , br∗ 2 , . . . , br∗ m

and the value of g(K r )r∗ . Assume that M (K)r∗ = iK c and cB (K)r∗ = ciK c . • Step 8. Each process K = 1, 2, . . . , N calculates new values of the basic variables of the process XB (K) from (5), the new block of the inverse matrix of the process B −1 (K) from (6), and the dual subsolution of the block

T

y(K) = B −1 (K)

cB (K).

• Step 9. For L = 1, 2, 4, 8, . . . , 2log2 N  , each process K = 1, 2, 3, . . . , N whose number satisfies the condition (K − 1) mod (2L) < L, that is, the residue of division of K − 1 by 2L is smaller than L, exchanges data with the process K + L: —˜ y = y(K) + y(K + L); —y(K) = y(K + L) = y˜; —Continue computations for the next L. Comments. Upon completion of this step, the vector y(K) in each process K = 1, 2, 3, . . . , N contains the dual solution y constructed from the current basic solution. • Step 10. Begin next iteration. • End of Algorithm IBMS. Therefore, the description of the parallel version of the inverse matrix method turns out to be more complex than that of the tabular simplex method, which is mostly due to greater nonuniformity of the structures used. The following broadcast inter-process communications are used at executing one iteration: (1) the parallel “process-to-process” communications are applied log 2 N  times to determine the column master process K c ; (2) the broadcast communication “column master process-to-all processes” for transmission of the (master) column AiK ∗ which is introduced in the basis and its number; (3) the parallel “process-to-process” communications are used log 2 N  times to determine the row master process K r ; (4) the broadcast “row master process-to-all processes” communication for transmission of the

master row B(K)−1

(r∗ )

and its number;

(5) the parallel ‘process-to-process” communications are used log 2 N  times to exchange the dual subsolutions and determine the dual solution. The required algebraic computing resource for sequential and parallel realization of the inverse matrix method is compiled in Table 3. As can be seen from it and the description of the algorithm, the processors are loaded uniformly. Since any element of the simplex table requires at most 4lm2 + O(lm) bits, the bit computing complexity is equal to cc (IBMS) = (4lm2 + O(lm))cca (IBMS). With N processors, the load of AUTOMATION AND REMOTE CONTROL

Vol. 73

No. 2

2012

USING MASSIVELY PARALLEL COMPUTATIONS

287

Table 3. Algebraic computing resource of inverse matrix method One processor N processors Operator Number of algebraic Number of transmitted Load to process operations operands K = 1, 2, 3, . . . , N Optimality check 2mt[1, N ] — 2mt : t ∈ [1, n/N ] Selection of c-master process — N log2 N Transmission of master column — m+1 m+1 Determination of g m(m − 1) — m(m − 1)/N Determination of master row 2m N 2m/N + log2 N Modification of XB 1 + 2m 1 2m/N Modification of B −1 3m2 m 3m2 /N + m Calculation of y(K) — — 2m2 /N − m Calculation of y m(2m − 1) N log2 N cca (IBMS) = 2mt[1, N ] +6m2 + 2m − 1

Total:

2m + 3N + 2

cca (IBMS(N)) = 2mt + 6m2 /N + 3m/N +m + 3 log2 N + 1

one processor is cc (IBMS(N)) = (4lm2 + O(lm))cca (IBMS(N)). Consequently, the acceleration provided by parallelization is as follows: UIBMS =

2mt[1, N ] + 6m2 + 2m − 1 cc (IBMS) = . cc (IBMS(N)) 2mt + 6m2 /N + 3m/N + m + 3 log2 N + 1

Whence it follows that if we assume that m = kn and take into account the interval nature of the variable t, we can easily determine 

= lim U n→∞ IBMS







1 2 + 6kN ,N = N− , N N, 2 + 6k 1 + 3k

k=

m < 1. n

Therefore, efficiency of parallelization grows with problem size and at the limit reaches a value close to 100 %. 3.3. Comparison of the Methods To sum up we can conclude that the considered above methods of parallelization of the simplex method are efficient. Let us consider in more detail the relation of the processor loads when using the above algorithms at an individual iteration in the case of N processes: I=

2ktnN + 6k2 n2 cc (IBMS(N))

cc (TS(N)) tN + 2kn2

−→ n→∞

[3k, 1 + 3k],

k=

m < 1. n

Therefore, at an individual iteration the positive effect of using the inverse matrix method is possible in asymptotics for k = m/n < 1/3. On the whole, use of the inverse matrix method is justified if (1) the main memory size is critical; (2) it is required to solve both the direct and dual problems; (3) n substantially ranks over m; (4) the matrix A is sparse; (5) permissibility of the dual solution may be optimized. 3.4. Computing Experiment A computing experiment was carried out to compare the time resource used by the considered algorithms at solving practical problems. The linear programming problems from the Netlib AUTOMATION AND REMOTE CONTROL

Vol. 73

No. 2

2012

288

PANYUKOV, GORBIK Table 4. Library Netlib problems Problem name SCSD6 SCTAP1 SHARE1B

Number of constraints 148 301 118

Number of variables 1350 480 225

Number of nonzero elements in matrix 5666 2052 1182

Coefficient of filling nonzero elements 0.0284 0.014 0.045

Table 5. Results of the computing experiment Algorithm used Problem TS IBMS (number of processors) Time, s Acceleration Efficiency Time, s Acceleration Efficiency SCSD6 (1) 448.18 1.00 1.00 55.23 1.00 1.00 SCSD6 (2) 265.47 1.69 0.85 47.50 1.16 0.58 SCSD6 (4) 133.14 3.37 0.84 24.03 2.30 0.58 SCSD6 (8) 61.86 7.25 0.90 12.73 4.34 0.54 SCSD6 (16) 28.01 15.80 0.90 6.00 9.20 0.58 SCTAP1 (1) 401.81 1.00 1.00 251.13 1.00 1.00 SCTAP1 (2) 203.95 1.97 0.98 129.48 1.99 0.99 SCTAP1 (4) 102.64 3.91 0.99 65.15 3.85 0.96 SCTAP1 (8) 50.98 7.88 0.99 33.86 7.42 0.94 SCTAP1 (16) 25.37 15.84 0.99 16.85 14.90 0.88 SHARE1B (1) 96.84 1.00 1.00 53.80 1.00 1.00 SHARE1B (2) 53.84 1.80 0.90 30.77 1.75 0.87 SHARE1B (4) 27.78 3.49 0.84 16.87 3.39 0.85 SHARE1B (8) 14.66 6.61 0.83 8.54 6.60 0.83 SHARE1B (16) 6.73 14.40 0.83 4.74 14.39 0.90

library [5] storing in the MPS format the freely accessed real linear programming problems of various dimensions were used as the source data for the experiment. Apart from the problems themselves, there are the descriptions and solutions by two accepted program products such as CPLEX and MINOS. The selected problems are compiled in Table 4. Computations were carried out on a cluster of Pentium 4-based computers. The experimental results are compiled in Table 5. Since the considered practical problems have sparse matrices, it is advisable to use the modified simplex method. The time of solution of all the considered examples using the inverse matrix method turned out to be smaller. The experiment also demonstrated high efficiency of parallelization if the number of variables exceeds the number of the processors by the order of one. 4. CONCLUSIONS In the case of rational roundoff-free computations, the positive effect of parallelization lies not only in acceleration of computations, but also in the possibility of solving large-dimension problems because it is rather easy to reach the boundaries if the matrix cannot fit entirely the main memory of one node. The computing experiment demonstrated efficiency of parallelization in the problems of diverse dimensions. As the result, it deserves noting the higher efficiency of solving the problems under consideration by the inverse matrix method. The total time of computation may be improved by some optimizations of algorithm realization which allow for the specificity of individual problems, which is the line of future research. ACKNOWLEDGMENTS This work was supported by the Russian Foundation for Basic Research, project no. 10-0796003-r ural a. AUTOMATION AND REMOTE CONTROL

Vol. 73

No. 2

2012

USING MASSIVELY PARALLEL COMPUTATIONS

289

APPENDIX Proof of Proposition 1. We consider the upper estimate of the absolute value of the determinant |det B| =

m ! σ k=1

bkσ(k) 

m ! σ k=1

bkσ(k)  m!Lm  (mL)m ,

where L = max{|bij | : i, j = 1, 2, . . . , m}. It follows from this estimate that sc(det B) = log2 | det B|  m(log2 m + l). Proof of Proposition 2. Let Dr be equal to the least common divisor of all denominators of the row r = 1, 2, . . . , m of the matrix B. Obviously, sc(Dr )  lm. Let D be the diagonal matrix ˜ = DB which is an integer one. The diag(D) = {Dr : r = 1, 2, . . . , m}. Consider the matrix B ˜ is as follows: upper estimate of the number of bits required by one element of the matrix B ˜l =

max

i,j=1,2,...,m

sc(˜bij ) =

max

i,j=1,2,...,m

sc(Di bij )  l(m + 1).

Taking into consideration Proposition 1 and the inequality log2 | det D −1 |  lm, we get

˜  m(log2 m + (2m + 1)l). sc(det B) = S det D −1 × det(B)

REFERENCES 1. Zimpl, http://www.zib.de/koch/zimpl. 2. IPAT-S, http://ipat-s.kb-creative.net/. 3. MPS format, ftp://softlib.cs.rice.edu/pub/miplib/mps format. 4. Fourer, R., Linear Programming Frequently Asked Questions, Optimization Technology Center of Northwestern University and Argonne National Laboratory, 2005, http://www-unix.mcs.anl.gov/otc/ Guide/faq/linear-programming-faq.ht ml. 5. Netlib library collection, ftp://netlib2.cs.utk.edu/lp/data, 1996. 6. CPLEX, http://www.ilog.com/products/cplex/. 7. MINOS, http://www.sbsi-sol-optimize.com/asp/sol product minos.htm. 8. GLPK, http://www.gnu.org/software/glpk/glpk.html. 9. EXLP, http://members.jcom.home.ne.jp/masashi777/exlp.html. ˜ 10. QSopt-Ex, http://www.dii.uchile.cl/daespino/. 11. GNU Multiple Precision Arithmetic Library, http://swox.com/gmp/. 12. QSopt, http://www2.isye.gatech.edu/wcook/qsopt/. ˜ 13. Applegate, D.L., Cook, W., Dash, S., and Espinoza, D.G., Exact Solutions to Linear Programming Problems, Florham Park: AT&T Labs Research, 2006. 14. Applegate, D.L., Cook, W., Dash, S., and Espinoza, D., Exact Solutions to Linear Programming Problems, Preprint Oper. Res. Lett., 2007. 15. Parallel CPLEX, http://www.ilog.com/products/cplex/product/parallel.cfm. 16. Garancha, V.A., Golikov, A.I., and Evtushenko, Yu.G., Parallel Realization of the Newton Method for Solution of Large Linear Programming Problems, Zh. Vychisl. Mat. Mat. Fiz., vol. 49, no. 8, pp. 1369–1384. AUTOMATION AND REMOTE CONTROL

Vol. 73

No. 2

2012

290

PANYUKOV, GORBIK

17. Ho, J.K., On the Efficacy of Distributed Simplex Algorithms for Linear Programming, Comput. Optimiz. Appl., 1994, vol. 3, pp. 1237–1240. 18. Bixby, R.E. and Martin, A., Parallelizing the Dual Simplex Method, Inf. J. Comput., 2000, vol. 12, no. 1, pp. 45–56. 19. Hall, Ju., Towards a Practical Parallelization of the Simplex Method, J. Comput. Manage. Sci., 2010, vol. 7, no. 2, pp. 139–170. 20. Panyukov, A.V. and Gorbik, V.V., Exact and Guaranteed Accuracy Solutions of Linear Programming Problems by Distributed Computer Systems with MPI, Tambov Univ. REPORTS: A Theoretic. Appl. Sci. J. Ser.: Nat. Techn. Sci., 2010, vol. 15, no. 4, pp. 1392–1404. 21. Panyukov, A.V., et al., Library of “Exact Computational” Classes, in Computer Programs, Databases, Topologies of Integral Circuits. Ofits. Bull. Ross. Agentstva po Patentam i Tovarnym Znakam, 2009, no. 3, p. 251. 22. Panyukov, A.V., Germanenko, M.I., and Gorbik, V.V., Parallel Algorithms to Solve Systems of Linear Algebraic Equations with the Use of Roundoff-free Computations, Parallel Computing Technologies (PAVT–2007): Proc. Int. Conf., Chelyabinsk: YuUrGU, 2007, vol. 2, pp. 238–249. 23. Panyukov, A.V. and Germanenko, M.I., An Application for Faultless Determination of the Generalized Inverse Matrix by the Moore-Penrose Method and Faultless Solution of the Linear Algebraic Equations, Inf. Technol. Model. Upravlen., 2009, no. 1 (53), pp. 78–87. 24. Vasil’ev, F.P. and Ivanitskii, A.Yu., Lineinoe programmirovanie (Linear Programming), Moscow: Faktorial, 2003. 25. Hall, J., Towards a Practical Parallezation of the Simplex Method, http://www.maths.ed.ac/uk/hall/ ParSimplex. 26. Badr, E.-S., Moussa, M., Paparrizors, K., et al., Some Computational Results on MPI Parallel Implementation of Dense Simplex Method, World Acad. Sci., Eng. Technol., 2006, vol. 23, pp. 39–42.

This paper was recommended for publication by A.I. Kibzun, a member of the Editorial Board

AUTOMATION AND REMOTE CONTROL

Vol. 73

No. 2

2012

Suggest Documents