Optimization Algorithms For Magnetics And Their ... - IEEE Xplore

2 downloads 0 Views 513KB Size Report
Optimization Algorithms for Magnetics and their Parallelkability. S. Ratnajeevan H. Hoole, and Kanishka Agarw. De artmentofEn 'neering. 8arvey Mudd (%le e.
IEEE TRANSACTIONS ON MAGNETICS, VOL 33, NO 2, MARCH 1997

1966

Optimization Algorithms for Magnetics and their Parallelkability S. Ratnajeevan H. Hoole, and Kanishka Agarw De artmentofEn ‘neering 8arvey Mudd (%le e Claremont, Ca 91711, SA

6

Abstract - Parallelization of Optimization routines is now increasingly resorted to, because of the heavy computation associated with the optimization of electromagnetic products in the rocess of their design. This aper evaluates the possibiEties €or parallelization in efectroma netic product design. The several optimization a gorithms available are evaluated for their parallelizability. It is further shown that, given the present limits of technolog in relation to the number of processors available OR s ared memory computers of parallel architecture, parallelization in electromagnetic optimization is not necessarily worth attempting at present. Thus, works of this nature, are of necessity, for the future, in readiness €or that day when a new generation of parallel computer is -available to us.

19

x

I. INTRODUCITON Several algorithms are presently available for optimizing the design-synthesis process of electromagnetic products through an iterative cycle [I]. In recent years, some of these algorithms have been modified for efficient implementation on parallel computer architectures and compared [2-61. This paper examines two aspects of this parallelization in the context of the optimization of electromagnetic products using finite element analysis [71. First, the comparisons that have been made are usually done with simple mathematical functions and are not useful in magnetics. For instance, an algorithm taking several function evaluations--like random search methods--might be highly parallelizable, but the comparison based on function evaluations with simple object functions like the Rosenbrock [8] function will not be valid when applied to a magnetic field problem where each function evaluation is a finite element solution. That is, while some methods of optimization (such as the random search methods) are easily amenable to parallelization and may be attractive in finding the minimum of a simple mathematical function, when it comes to optimization in finite element design, such methods are not as attractive because each performance-based function evaluation in the process of optimization i s a finite

lPresently attached to Booz-Allen Francisco, Ca.

& Hamilton, San

element solution of the obje of design for 1 performance. Thus the number and nature of functi evaluations also must be factored into the comparisl rather than simple iteration counts. Second, hardware limits on parallelization considered, keeping in mind that finite elemi analysis is usually done on shared memory para computers (when parallel computers are used). On SI machines, parallelization without l i m i t is I possible since only a limited number of processors available; therefore there is a need to consider wh of the computations are best done in parallel. 11. THEOJTIMIZATION ALGORITHMS EVALUATED FOR PARALLELISM

II.2 General Broadly there are three classes of optimizati algorithms: zeroth order, first order and second ord relying on the evaluation of the object function its, its first derivative and its second derivati respectively. By parallelism is meant the ability perform some of the c optimization process computations must 5 inherently sequential and whereas if one compu numerical outcome of the other independently and therefor different processors th In this section we shall examine the m important algorithms in optimization -important terms of popularity-for their parallelizability.

11.2 Conjugate Gradients A first order method [8, pp: 87-92] relies on following algorithm: i) Find the direction of steepest descent. ii) Line-search to find a minimum along that direction i update the parameters of the object function along t direction. iii) Exit if the minimum is found; other wise continue. iv) Find the gradients of the object €unction. v) Find the conjugate direction. vi) Line-search to find a minimum alone that direction i update the parameters of the objectufunction along t direction.

Manuscript Received: 19 March, 1996.

0018-9464/97$10.00 0 1997 IEEE

1967

vii) Exit if the minimum along the direction is small enough;

otherwise repeat steps iv-vii. The obvious place for parallelism is in the line search .where the object function needs to be evaluated for various values of the design parameter. Another ~mssibilityis in computing the partial derivatives as :shown in [12].

rejected and starting a new processor on tl assumption, calculating about the old value as well the assumption is proved correct, then we would k saved time; on the other hand, if the assumpti proves incorrect, some unnecessary computation woi have been undertaken. 115 The Evolution Strategy

11.3 Powell's Method

'This is a zeroth order method that approximates a second order scheme. i ) Line-search along N orthogonal direction and update the parameters, N bein the number of parameters. Initially these are the N cooriinate directions.

Generate a new search direction by taking the difference between the on inal parameter set and the now modified parameter set. ?his is the (N+l)th search direction. Line search along this and adjust parameters accordingly. iii) Exit if at minimum, otherwise continue. iv) Throw away search direction 0 and shift down each direction number. That is, direction k+l becomes the new direction k. v) Repeat steps i to iv until convergence. While the line-search may be parallelized as with conjugate gradients, in addition, each of the N + l directions may be attempted in parallel (the (N+l)th requiring the completion of these N)

ii)

11.4 Simulated Annealing

This is a zeroth order statistical method whose main advantage is that it works with almost all problems and is perhaps the most popular in magnetics. There are several other zeroth order search methods which we will not consider here because simulated annealing is generally considered the best. Its steps are [41: i) Perturb the parameters randomly. ii) If the new value of the object function is lower, accept the new set of arameters. Otherwise, acce t it with a probabilit orexp(-aC/T) [that is, acce t if For a random number r etween 0 and 1, r e ex ( - A C ~ IHere . AC is the chan e in the object function andpTis the temperature.Tis initia7ly set to a value com arable to or larger than AC. iii) Repeat ste s i and ii unti equilibrium has been reached (i.e., the o%jectfunction does not decrease any more for that tem erature). iv) Update &e temperature by "cooling it" (i.e., reducin T) so that the robability of acceptance is increased to Jsturb the equilt%riumreached. This is clearly a sequential algorithm that is not, in the normal sense, parallelizable because we cannot get the new values of the parameter set {p)until we have determined whether to perturb from the new value of (p) or the old value. That is, from an old value (p) we perturb to a new value, and we must decide whether to accept the new value or not, before proceeding; since this determination specifies whether we are going to perturb next about the old value (if we reject) or the new value (if we accept). This sequentiality has been cleverly overcome in [21 by simply speculatively planning for the scenario that the new value being

i:

P

While methods like conjugate gradients work 01 point, methods like evolution strategy and the genc algorithm explore a solution space by working entire populations at a time. The distincti exploitative (working on a point) and exploratc (working on a population) has been made. Exploratc strategies are natiirally parallelizable, althou their stochastic nature will not allow the same rest to be repeated in terms of efficiency of parallelizatit Between the evolution strategy and the genc algorithm, it was considered sufficient to work on c to establish the nature of the parallelizability si1 the results would be similar. The evolution stratc was chosen because of its popularity electromagnetics. The steps are, for two numbs NumKeep and " W e i g h i) Pick NumKee the number of best parameter sets (p) will keep and RumNeigh, the number of neighbors ab each (p) we would enerate. Pick the first set of NumK ( J's and evaluate t8e object function for each (p). ii) Arturb each (p) NumNei h times randomly to generatc additional NumKeep x JumNeigh (p)sand evaluate corres onding values of the ob'ect function. iii) Pick JumKeep (p)s corresponding to the lowest value

the object function. iv) Return to step ii and repeat until the minimum is found.

The parallelizability is readily evident: all i perturbations and object function evaluations are dc in parallel by different processors.

11.6 Newton's Scheme Newton's method is of second order I81 and requi extensive second derivative computations of the obj function with respect to the various parameters create the Hessian matrix. The second derivatives computed using finite differences and therefore requ several parallellizable object function evaluatio Once the Hessian is computed, the symmetric mat equation solution can also be done in parallel [l]. I

111. RESULTS

Ill. 1: General This sections reports the results of the parallelizab suggested in the preceding section. As appropriate, results from sirnple algebraic functions are from a 5 SPARCServer 100 consisting of four supersca

1968

SPARC version 8 processors, also known as SuperSPARC processors. The parallelization was through multithreading included in the library thread.h. The results from larger finite element computations were on a SEQUENT Symmetry 8 parallel computer. 111.2: Conjugate Gradients

As pointed out, there are two areas of parallelization, in evaluating the object function during line searches and in doing the partial derivatives. In evaluating the object functions, close to 100% efficiency was accomplished. With 3 processors in use, computational time was down almost by a factor of 3. However, when doing partial derivatives, 100% efficiency was not possible. The difference is best explained with the Rosenbrock function [13, p. 4191 y = lOO(X, -XI2) * With this, evaluating the two derivatives with respect to the two different variables, using two different processors, at best a speed up of 1.49 was accomplished. This is because the two derivatives are of different orders of complexity so that the processor computing the derivative with respect to x, is done quickly and then idles while the other derivative continues to be calculated. In electromagnetic field computation, similarly, the derivative-computations with respect to geometric parameters are much more difficult to implement while those with respect to permeability or current density are easy [ll. Similarly even between two geometric parameters, if changing one parameter changes the shape of many triangles, while changing another changes the shape of a few parameters, then the derivative with respect to the first would take much longer, thereby preventing the realization of 100% efficiency in parallelization. Quantification of these differences is not given since it is very problem-specific. 113 Powell's Method

The line searches yielded almost 100% parallelism as before. However, doing the N steps in parallel was not as effective as hoped because, for each iteration, the parallel version, of necessity, starting from the original point, searches simultaneously for a minimum in each coordinate direction, and at the end, updates all the movements; whereas the sequential algorithm, more effectively, searches in each coordinate direction only after finishing the previous direction and using that end point as the new starting point. Thus the gains in parallelization are off-set by Iosses in not having the best starting point for each direction of search. Thus only parallelizing the line search is

recommended - especially considering that w limited processors, parallelizing the searches in N-directions means that we cannot parallelize i line searches. 111.4 Simulated Annealing

Since the computation is probabilistic and becai much of the speed up is dependent on whether computations begun on the assumption that a nev perturbed point is accepted or not, speedinformation is highly random and we simply m; recourse to the formula given in [ 2 ] : Speed-up=SP"(l -P>"L,(l + a , + r , ) where P is the probability of acceptance; a, is number of ancestors of a node i that have made accl decisions; rl the number that made rei L, is the probability of exiting the tr (0 if the node has both children, l-P if the node 1 only an accept child; P if the node has only the rej child and 1 if it has no child.

111.5: Evolution Strategy Since the main computation is object functi evaluation, all parallel processes took the same ti to solve, giving us a n expectation of 101 parallelizability, allowing for some time loss coordinating between the different processes. remarked, however, the random nature perturbations prevents us from realizing convergencc time factored down by the number of proces,. However, when the sequential algorithm was run ' times and the parallel algorithm 100 times on i pole-face problem described in [ll, the average ri time for the sequential algorithm was found to be 2 times t h e average run-time for para1 implementation with 3 processors. IV. HARDWARE LIME To PARALLELIZATION

Much of the parallelization with finite elements done in shared memory systems where the number processors is limited. Shared memory machines, where all i processors operate off the same memory, are esseni to parallel finite element solutions because, in i matrix solution associated with the largest componl of work in finite element analysis, the finite elem! matrix equation is a proper equation only after all local matrices are put together into a global mati The other kind of parallel architecture: associai with distributed memory machines, has e; processor operating off its own independent memc with only occasional exchanges of information. T kind of machine is used for independent computatio

1969

:such as the auditing of several separate income tax returns by the government. Because distributed memory machines’ processors act almost independently, they are not too complex, technically speaking, and machines with thousands of processors are not unheard of. It is when the memory is shared, that the technology is limiting and machines with more than 32 processors are rarely seen, although the industry is aiming at 2000 processors in a decade. In such a limited processor environment as with shared memory systems in finite element analysis, if each processor is used in parallelizing the optimization algorithm, the finite element computation that accompanies each object function evaluation cannot be parallelized since no processor is free for that. As such, an essentially sequential (optimization algorithm like simulated annealing could be as effective if the object function evaluations are parallelized. Clearly then, in a shared memory environment with limits on the processors we have available to us, we must choose carefully those aspects of computation that we wish to implement in parallel. Ideally we would have as many processors as we want and we would then implement in parallel all those tasks that (can be implemented in parallel. But such is not the case and we therefore need to parallelize only those tasks that reach close to 100% efficiency or speed-up (the ideal 100% speed-up occurring when with n processors, solution time is cut down by a factor of n) . As shown in [l, 9-11] the process of matrix [computation and finite element equation assembly yields almost 100% speed up, whereas from the results above, it is seen that none of the above optimization strategies results in the same 100% speed-up. We conclude therefore that given a limited number of processors, we must not parallelize the optimization process, but rather implement the optimization in a sequential way and use all the processors to parallelize the finite element solution that accompanies the object function evaluation within a sequential implementation of the optimization. Does this then mean that the parallelizability of optimization is a waste? No. What it means is that the use of these parallel algorithms for optimization must await that day when technological developments give us that computer with as many processors as we want sharing the same memory. At that time, these parallel algorithms may be employed effectively.

V. CONCLUSIONS The search methods and gradients-based techniqi have parallelism built into their algorithr Simulated annealing, although not natura parallelizable, can be passed on to different process in the expectation tlhat it is profitable to do so, a: often is. However, because the number of processors shared memory machines is limited, parallelizati can be an academic exercise -- for if we truly hi parallelized the finite element solution, we would J have spare processors to parallelize optimization.

REFERENCES

S . Ratnajeewan H. Hoole (Ed.), Finite Elemei Elecfroma netics and Desi n, Elsevier, Amsterdam, 199 Michele Marchesi, dorgio Molinari and Mauri Repetto, “A Parallel Simulated Annealin Algorithm the Desi n of Magnetic Structures,” IEEE Trans. Mal Vol. 30, Bo. 5, p. ,34393442,Sept. 1994. Panos M. ParAs, Advances in Optimization and P a n Computing, Elsevier, Amsterdam, 1992. J. Ramanujan, F. Ercal and P. Sag: appan, “Tc Allocation by Simulated Annealin bepartment Computer and Information Science, &io State Univeri (Un ublished.) A. furzhanski, K, Neumann and D. Pallaschke (Ed ”Optimization, Parallel Processing and Applicatior Lecture Notes in Economics and Mathematical Syste Volume 304, no date. Manfred Grauer and Dieter B. Pressmar (Eds.), ”Para Computin and Mathematical Optimization,’’ Lect Notes in gonomics and Mathematical Systems, Volu

t.

367, no date.

S. Ratnajeevan H. Hoole, Computer-Aided Analysis , Design of Electromagnetic Products, Elsevier, New Yc 1989. (Now handled by Prentice Hall). Garret N. Yanderplaats, Numerical Optimizat Techni ues for Engineering Design, McGraw-Hill, N

York, 1984 G. Mahinihakumar and S . R. H. Hoole, “A Para Conjugate Gradients Algorithm for Finite elem Anal sis of Electroma netic Fields,” 1. App. Pkys., 1

d.

67. 9. DV. 5818682i. 1990. [ l o ] G.’Mahinthakumar and S. R. H. Hoole “A Paralleli: Element by Element Jacobi Conjugate Gradients Algorit for Field Problems and a Comparison with ot Schemes.” In%.I. ADD.Electromamz. In Matls. Vol. 67, Nc 1111 -~

p. 581&5820,’19!%. ” g.Computation R. H. Hoole. “Finite Element Electromaenetic Fi on the Se uent Symmetr 81 Para Com uter,“ I E E E Trans. % u p . , vol. 26, r3b. 2, pp. E 840, Larch 1990. 0

[12] S . R. H. Hoolle, “Optimal Desi n, Inverse Problems i Parallel Computers,” IEEE Runs. Magn., Vol. 27, 4146-4149, Ma 1991. [13] Gorden S. Beverid e and Robert S. Schech Optimization: Theory andgPractice, McGraw-Hill, N York, 1970.

2

Suggest Documents