Comparison of Evolutionary and Deterministic Multiobjective. Algorithms for Dose Optimization in Brachytherapy. Natasa Milickovic1, Michael Lahanas1,Dimos ...
Comparison of Evolutionary and Deterministic Multiobjective Algorithms for Dose Optimization in Brachytherapy
Natasa Milickovic 1, Michael Lahanas1,Dimos Baltas1,2 and Nikolaos Zamboglou1,2 1
Department of Medical Physics and Engineering, Strahlenklinik, Klinikum Offenbach, 63069 Offenbach, Germany. and 2 Institute of Communication and Computer Systems, National Technical University of Athens, 15773 Zografou, Athens, Greece.
Abstract We compare two multiobjective evolutionary algorithms, with deterministic gradient based optimization methods for the dose optimization problem in high-dose rate (HDR) brachytherapy. The optimization considers up to 300 parameters. The objectives are expressed in terms of statistical parameters, from dose distributions. These parameters are approximated from dose values from a small number of points. For these objectives it is known that the deterministic algorithms converge to the global Pareto front. The evolutionary algorithms produce only local Pareto-optimal fronts. The performance of the multiobjective evolutionary algorithms is improved if a small part of the population is initialized with solutions from deterministic algorithms. An explanation is that only a very small part of the search space is close to the global Pareto front. We estimate the performance of the algorithms in some cases in terms of probability compared to a random optimum search method.
1 Introduction High dose rate (HDR) brachytherapy is a treatment method for cancer where empty catheters are inserted within the cancer volume. Once the correct position of these catheters is verified, a single 192Ir source is moved inside the catheters at discrete positions (dwell positions) using a computer controlled machine. The problem which we consider is the determination of the n times (dwell times) which are sometimes termed dwell weights or weights for which the source is at rest at each of the n dwell positions so that the resulting three-dimensional dose distribution will fulfill defined quality criteria. In modern brachytherapy the dose distribution has to be evaluated with reference to irradiated normal tissues and the planning target volume (PTV), which includes the cancer volume and an additional margin. For a more detailed description see [1], [2]. The number of source positions varies from 20 to 300. We consider the optimization of the dose distribution using as objectives the variance of the dose distribution on the PTV surface and in the PTV calculated at 1500-4000 positions (sampling points). For variances and in general for quadratic convex objective functions f(x ) of the form f( x ) = ( A x - d )T(A x - d )
(1)
it is known that a weighted sum optimization method converges to the global Pareto front [3], where A is a constant matrix and d is a constant vector of the prescribed dose values in the PTV target or surface.
We have successfully applied multiobjective evolutionary algorithm with dose-volume based objectives [1], [2]. In the past comparisons of the effectiveness of evolutionary algorithms have been made with either other evolutionary algorithms [4] or with manually optimized plans [5], [6]. We have compared the Pareto fronts obtained by multiobjective evolutionary algorithms with the Pareto fronts obtained by a weighted sum approach using deterministic optimization methods such as quasi-Newton algorithms and Powells modified conjugate gradient algorithm which does not requires derivatives of the objective function [7].
We use here only objectives where gradient based algorithms are superior. However, we must consider also critical structures partly inside the target or close to it which have to be protected by excessive radiation. Other objectives are the optimum position and the minimum number of sources. In such cases the gradient based algorithms can not be used. Therefore before applying evolutionary algorithms for the solution of these more complex problems we have compared their efficiency with deterministic methods using only as objectives the dose variance within the target and on its surface.
2 Methods 2.1 Objectives We use as objectives the normalized variance of the dose distribution on the PTV surface f S and in the PTV fV:
1 m2 S 1 fV = 2 mV fS =
NS
∑ (d
i
− mS )
i
− mV )
2
i =1 NV
∑ (d i =1
(2)
2
(3)
mS, mV are the corresponding mean values, NS, NV the number of points used to estimate these parameters and di the dose value of the i-th point. We use a weighted sum approach for the multiobjective optimization with the deterministic algorithms, where for a set of weights for the volume and surface variance we perform a single objective optimization of fw:
fW = wS f S + wV fV
(4)
where w S, w V ≥ 0 are the surface and volume importance factors, respectively and w S + wV = 1. We used 21 optimization runs where wS varied from 0 to 1 in steps of 0.05 to determine the shape of the trade-off curve.
2.2 Genetic operators We use a real representation for the chromosomes. The following variants of genetic operators have been used in this study.
Uniform mutation. In uniform mutation if gk is the kth element of a chromosome selected for mutation, then it is replaced by a random number from the interval [LB,UB] where LB and UB are the lower and upper bounds of the kth element. Non-uniform mutation. In non-uniform mutation if gk is the kth element of a chromosome at generation t it will be transformed after a non-uniform mutation to gk' gk + g'k =
∆ (t, UB - gk) if r1 = 0
{
(5) gk +
∆(t,y) = y(1-r (1-t/T)**b )
∆ (t, gk - UB) else (6)
where r1 is a random bit (0 or 1 ), r is a random number in the range [0,1], T is the maximal generation number and b a parameter controlling the dependency of ∆(t,y) on the generation number. The function ∆(t,y) returns a value in the range [0,y] such that the probability of ∆(t,y) being close to 0 increases as t increases. Initially when t is small the space is searched uniformly and very locally at later stages [8]. Flip Mutation. In the case of Flip mutation, we randomly pick elements (genes) g k to g j from the chromosome and then set those elements to any other of genes of this chromosome. This will work for any number of gene sets of a given chromosome. Swap Mutation. In the case of swap mutation, we randomly swap the elements in the chromosome. The number of swapped elements depends on the mutation probability. Gaussian Mutation. This mutation operator adds a gaussian distributed random value to the chosen gene. The new value is clipped if it falls outside the specified lower and upper bounds for that gene. Blend Crossover. In the case of Blend crossover [9] we generate a uniform distribution based on the distance between parent values, then choose the offspring value based upon that distribution. This distance is defined for each pair of the parents corresponding genes. If a and b are parent
chromosomes, and c 1 and c 2 are the offsprings then for the parents' i-th genes, this distance is given by: dist = | ai-bi |
(7)
Further, lower and higher bounds, lo,hi are found as: lo = min(a i, bi)-0.5*dist, hi = max(a i, bi)+0.5*dist
(8)
and correcting those values if necessary to retain within the wanted gene-value boundaries. Then c1i and c2i are determined as the random numbers from the range [lo, hi] . Geometric Crossover. In geometric crossover if a and b are parent chromosomes then the new progeny are two children c 1 and c2, where: c1 = c2 =
(9)
One of these progenies will be selected according to their fitness values. Two Point Crossover. The two point crossover operator randomly selects two crossover points within a chromosome. Further, it interchanges the two parent chromosomes between these points (segment interchange) to produce the two new offsprings. Arithmetic Crossover. If git and gjt are two chromosomes of the population at generation t , then after arithmetic crossover two new chromosomes git+1, g jt+1 at generation t+1 are produced: gi t+1 = α gjt + (1- α) git gj t+1 = α git + (1- α) gjt
(10)
α is either a constant (uniform arithmetical crossover), or a variable depending on the generation number (non-uniform arithmetical crossover). 2.3 Estimation of the Probabilities of Random Solutions Using random sets of decision variables (weights) we extrapolated the number of function evaluations required by a random search method to obtain points on the Pareto front. We have found that a two-dimensional normal density function f(x) in some cases can be used to describe the distribution of these random points in the objectives space known also as bi-loss map for two objectives [10].
The density function is:
f (x) =
1 2π Σ
1/ 2
1 exp − ( x − µ)T Σ −1 ( x − µ) 2
(11)
where
σ 12 σ Σ = 11 σ 12 σ 22
(12)
with σ11 = σx2 and σ12 = σ21 = cov(x,y) = E[(x - µx) (y - µy )] = ρσxσy. E(x) is the expectation value of x, σx, σy are the variances in x and y respectively and ρ the correlation coefficient.
3 Results The convergence of the Strength Pareto Evolutionary Approach algorithm (SPEA) [11] is shown in Fig. 1. For a breast implant with 250 sources the population averages of the surface and volume dose variances and are shown for 500 generations. A convergence is observed after 100 generations. In order to compare the efficiency of SPEA and the deterministic algorithms we have evaluated the probability to obtain at random a solution presented by these algorithms. We generated 2500000 random sets of weights. The distribution of these points in search space for a cervix implant with 215 sources is shown in Fig. 2. We use Powells optimization method [7] to fit the parameters σij, i,j=1,2 and µ of (12) to this distribution. The result is shown in Fig. 3 where the fit and the experimental value for bins with a non-zero number of entries is shown unfolded as a one dimensional distribution. In Fig. 4 we show the probability distribution for the cervix implants of obtaining a solution at a point fS ± 0.05, fV ± 0.05. The distributions is shown for 100000 random solutions and for the non-dominated solutions from SPEA and the deterministic algorithm. The distribution of weights obtained for the gradient based optimization for the cervix implant is shown in Fig. 5 .Only a small part of the weights is significant. Fig. 6 like Fig. 4 shows the result for a implant with 250 sources where the global Pareto front is closer to the majority of the random solutions in the search space. Consequently the difference between probabilities of the solutions of the deterministic and the SPEA algorithm is smaller. In all cases which we studied the Pareto fronts of the multiobjective evolutionary algorithms MOEAs are local and between the global Pareto front found by the deterministic algorithms and the majority of random solutions in the search space. The shape of this set depends on the geometry of the implant and the topology of the sources. For implants with a large number of sources the majority of the random solutions is far from the global Pareto front. K. Deb described some reasons why convergence to the true Pareto-optimal front may not occur: Multimodality, deception, isolated optimum and collateral noise [12]. For the objectives used multimodality can be excluded. The most important reason is the isolated minimum. Since only a extreme small part of the search space is located close to the true Pareto front it is not
possible for the evolutionary algorithm to acquire information about the position of the Pareto front from crossover and mutation. As described in [13] code-substrings that have above average fitness values are sampled at an exponential rate for inclusion in subsequent generations. This does however not imply that a convergence to the global Pareto-front will occur. In contrast the gradient based algorithms use very efficient the information from local gradients and converge extremely fast to the Pareto-optimal front. Collateral noise seems to be present when there is an excessive noise coming from other parts of the solution vector. Without initialization the population moves smoothly and converges to a local Pareto front. If a small part of the population is initialized with solutions from the deterministic algorithm then members of the population cover a much larger part of the search space. This shows that the solutions require a extreme fine tuning of the decision variables which the conventional genetic operators used in this study can not handle. We analyzed the influence of different mutation and crossover operators, described previously in Methods, on the efficiency of different evolutionary algorithms. In all cases the parameters of 0.85 for the cross-over probability and 0.0065 for the mutation probability are used. The best coverage of objective space was obtained by using the geometric crossover. The mutation operator did not influence the evolutionary algorithm efficiency as much as the crossover. In order to compare the effectiveness of the evolutionary algorithms we generated 100000 points in the objective space from a corresponding number of random sets of weights. The efficiency of evolutionary algorithms highly depends on how far these random solutions are from the global Pareto front. In Fig. 7 the case of a prostate implant with 20 sources is presented. We can conclude that the random solutions cover the objective space very satisfactorily and approach the Pareto front. This means that the evolutionary algorithms should be able to produce Pareto sets which will converge toward the deterministic solutions. In different case, see Fig. 8, evolutionary algorithms could only theoretically approach the deterministic Pareto set after a extreme large number of generations. The Pareto set obtained by the evolutionary algorithms converges toward the global Pareto set better in the direction in which the majority of random solutions is nearer to the global Pareto set. In the case that the majority of random solutions in objective space is far from the Pareto global front, then the evolutionary algorithms produce only local Pareto sets. An optimal set of parameters and genetic operators does not improve the convergence to the global Pareto set as significantly as an initialization with solutions from the deterministic algorithm. With only four deterministic solutions, see Figs. 7 and 8, the evolutionary algorithms reproduce the Pareto global front. We have found that the objective functions used in deterministic algorithms for this initialization need not to be exactly the same as used by the evolutionary algorithms. For example the avoidance of high dose values inside the PTV can be satisfied by a small dose variance inside the PTV. Another objective function which can be used is for this purpose is a penalty function which penalizes solutions with dose values above a given limit. In this case the deterministic gradient based algorithms due to the presence of local minima can not be applied. We analyzed the efficiency of different evolutionary algorithms compared with deterministic ones. The Pareto sets produced by SPEA, FFGA, FFGA with elitism [14], [15], and the niched Pareto algorithm (NPGA) [16], and the Pareto set evaluated by the Fletcher deterministic method, are compared. Pareto fronts from SPEA and FFGA with elitism are converging to the deterministic evaluated Pareto front much better than FFGA and NPGA, see Fig. 8 . The main reason for this is the elitism implemented in the former two methods. In the case of multiobjective evolutionary algorithms it is important to save the nondominated solutions. Even in
the case that the previously described initialization is applied, the non-elitistic algorithms do not produce better results as the algorithm does not "remember" the external nondominated set. This means that an initialization of evolutionary algorithms requires the inclusion of elitism. For NPGA and FFGA an additional problem in comparison to SPEA is that a value for the sharing radius is required which can vary from case to case. The problem of the SPEA algorithm is that the extension of the Pareto front is not very large as FFGA. It does not cover the "ends", see Figs. 7 and 8, as the extension requires a fine tuning of the weights which can not be reached by the evolutionary algorithms, except if an additional initialization was made first. We usually run algorithms for up to 500 generations although the population converges to a local Pareto front as shown in Fig. 9 after 100 generations. The Pareto front is not significant modified even if the calculations has been extended up to 10000 generations in some cases. For implants with 200 and more sources, the computation time with 10000 generations requires up to 10 hours. We solved this problem by initialization of four members of the initial population with solutions from deterministic algorithm. In this case, the algorithm converges to the deterministic evaluated Pareto set in less than 100 generations. Even if we could use for other objectives a weighted sum approach with deterministic algorithms, this would require to use a very large number of weights to reconstruct the Pareto set, especially with increasing number of objectives. Here the evolutionary algorithms in combination with deterministic methods are more effective in generating non dominated solutions.
4 Conclusions We have compared standard multiobjective evolutionary algorithms with deterministic optimization methods with objectives such that a weighted sum approach can be used to obtain the global Pareto fronts. This comparison was done for the dose-optimization problem in HDR brachytherapy. The number of decision variables can be as high as 300 . From the distribution of random generated weights we were able to estimate in some cases the probabilities of generating randomly a point in the objective space. This enabled us to estimate the performance of the multiobjective evolutionary and the deterministic algorithms in comparison to a random search. The evolutionary algorithms have been found to be a factor 105-1012 more effective than a random search. The deterministic are more efficient which exceeds in some cases 1030. This could explain why evolutionary algorithms were not able to generate solutions close to the global Pareto front. The Pareto front reached depends on the probability of generating a point in the objective space. The evolutionary algorithms with the standard genetic operators described in this work are not able to significantly improve the performance. An initialization from deterministic algorithms improves the performance and helps to reconstruct the Pareto front around the initial seeds of the deterministic algorithm. Our previous results of evolutionary algorithms with objectives where gradient based deterministic algorithms can not be used showed that the results were compatible or even better than by other phenomenological dose-optimization methods [1]. With this study we have found that if a part of the population is initialized with a good initial estimate of the Pareto front, that the results of the evolutionary algorithms improve significant more better than by any optimization of the GA parameters. We do not know if there are special genetic operations which if applied could fill the gap between the Pareto fronts found by the evolutionary and deterministic algorithms. It seems that the global
solutions requires a fine tuning of the decision variables which is far beyond what evolutionary algorithms can achieve in an acceptable number of generations. If a part of the population reaches via initialization a part of this region then the evolutionary algorithm is able to find solutions around this region. A weakness of the MO evolutionary algorithms is the large number of function evaluations required to obtain a reasonable good local Pareto front. In the past MOEAs were compared for problems involving only very few decision variables. In such cases a random set of a few thousand sets of decision variables covers a large part of the objective space and evolutionary algorithms are able to produce fast solutions very close to the global Pareto front. The population size increases the efficiency of the MOEA algorithms by generating solutions closer to the global Pareto set although population sizes of more than few hundred are not very practical and useful. In problems like in brachytherapy with up to 300 decision variables points close to the global Pareto front are in some cases of extreme low probability. In this case the evolutionary algorithms can not reach these regions whereas deterministic algorithms using information from local gradients are guided into these regions after only 10-20 iterations. In each such iteration a few times a line minimization is performed. We have considered here only a simple coupling of the deterministic algorithm which delivers some optimal solutions as a starting point. We have found that the performance of the MOEA algorithms is significantly enhanced even if the initial solutions are obtained using different objectives for the deterministic algorithm. In the past algorithms were proposed where the evolutionary algorithms produce starting points for deterministic algorithms. Another possibility is to use a hybrid version of the evolutionary and deterministic algorithms. There are version where a hill-climbing operator is applied, with some probability which could be adapted to the performance of the algorithm. We have estimated in some cases the performance of the MOEAs from Monte Carlo sampling experiments. We will in the future consider other approaches such as simulated multiobjective annealing where an external population is filled with the nondominated solutions found. This population is used in the optimization process by picking random members as starting points. We have to compare if such an algorithm with a single member can produce better results than MOEAs where it is assumed that the performance can be explained by mechanisms such as the implicit parallelism.
Acknowledgments. We would like to thank Dr. E. Zitzler and P. E. Sevinc for the FEMO library. This investigation was supported by a European Commission Grant (IST-1999-10618, Project: MITTUG).
References 1) Lahanas, M., Baltas, D., Zamboglou, N.: Anatomy-based three-dimensional dose optimization in brachytherapy using multiobjective genetic algorithms. Med. Phys. 26 (1999) 1904-1918 2) Lahanas, M., Milickovic, N., Baltas, Zamboglou, N.: Application of multiobjective evolutionary algorithms for dose optimization problems in brachytherapy. These proceedings 3) Bazaraa, M. S., Sherali, H. D., Shetty, C. M.: Nonlinear Programming, Theory and Algorithms. Wiley, New York. 1993 4) Zitzler, E., Deb, K., Thiele, L.: Comparison of Multiobjective Evolutionary Algorithms: Empirical Results. Evolutionary Computation. 8 (2000) 173-195 5) Yang, G., Reinstein, L. E., Pai, S., Xu, Z.: A new genetic algorithm technique in optimization of permanent 125I prostate implants. Med. Phys. 25 (1998) 2308-2315 6) Yu, Y., Schell, M. C.: A genetic algorithm for the optimization of prostate implants. Med. Phys. 23 (1996) 2085-2091 7) Press,W. H., Teukolsky,S. A., Vetterling, W.T., Flannery,B. P.: Numerical Recipes in C. 2nd ed. Cambridge University Press, Cambridge, England. 1992 8) Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. Springer Verlag. 1996 9) Vicini, A., Quagliarella, Q.: Airfoil and wing design through hybrid optimization strategies. American Insitute of Aeronautics and Astronautics. Report AIAA-98-2729 (1998) 10) Das, I. Dennis, J.: A Closer Look at Drawbacks of Minimizing Weighted Sums of Objectives for Pareto Set Generation in Multicriteria Optimization Problems. Structural Optimization 14 (1997) 63-69 11) Zitzler, E., Thiele, L.: Multiobjective Evolutionary Algorithms: A Comparative Case Study and the Strength Pareto Approach. IEEE Transactions on Evolutionary Computation. 37 (1999) 257271 12) Deb, K.: Multi-objective Genetic Algorithms: Problem Difficulties and Construction of Test Problems. Evolutionary Computation 7 (1999) 205-230 13) Holland, J. H.: Adaptation in Natural and Artificial Systems. Ann Arbor, University o Michigan Press. 1975 14) Fonseca, M., Fleming, P. J.: Multiobjective optimization and multiple constraint handling with evolutionary algorithms I: A unified formulation Research report 564, Dept. Automatic Control and Systems Eng. University of Sheffield, Sheffield, U.K., Jan. 1995 15) Fonseca, M., Fleming, P. J.: An overview of evolutionary algorithms in multiobjective optimization. Evolutionary Computation 3 (1995) 1-16 16) Horn, J., Nafpliotis, N.: Multiobjective optimization using the niched Pareto genetic Algorithm. IlliGAL Report No.93005. Illinois Genetic Algorithms Laboratory. University of Illinois at UrbanaChampaign, 1993
Fig. 1. An example of the convergence of and for the SPEA algorithm.
Fig. 2. Probability P(fS, fV) of a generation of a random point at fS ± 0.05, fV ± 0.05.
Fig. 3. A fit of a two-dimensional normal distribution.
Fig. 4. Probability distribution for a cervix implant with 215 sources for 100000 random, SPEA and deterministic algorithm solutions.
Fig. 5. Distribution of the weights for the deterministic algorithm for the importance factors wV = 0.0 and 0.3, see 4.
Fig. 6. Probability distribution for a breast implant with 250 sources for 100000 random, SPEA and deterministic algorithm solutions.
Fig. 7. An example of the distribution of 100000 random solutions in objective space for a prostate implant with 20 sources. The Pareto set obtained by the deterministic algorithm is presented by the line. The non dominated solutions of SPEA and FFGA with elitism are shown without initialization. For SPEA additional the Pareto set is show if four members of the population are initialized with solutions from a deterministic algorithm.
Fig. 8. An example of the SPEA algorithm for the case of a breast implant with 250 sources. The majority of random solutions in the objective space is very far from the global Pareto front. Without initialization the evolutionary algorithms do not reach the Pareto front. The path of the deterministic algorithm is shown for a set of fixed weights.
Fig. 9. An example of the SPEA algorithm for the case of an breast implant with 250 sources. The convergence in the objective space is presented at different number of generations t .