Multidimensional optimization in image reconstruction from projections I. Garca, P.M. Ortigosa and L.G. Casado
Dpto. Arquitectura de Computadores y Electronica Universidad de Almera, 04120-Almera, Spain e-mail:
[email protected]
G.T. Herman and S. Matej y
Medical Image Processing Group University of Pennsylvania, Philadelphia, PA 19104, USA
Abstract. A parallel implementation of a global optimization algorithm is described. The algorithm is based on a probabilistic random search method. Computational results are illustrated through application of the algorithm to a time consuming problem, in a multidimensional space, which arises from the eld of image reconstruction from projections. Key words: Global Optimization, Parallel Algorithms, Random Search, Reconstruction from Projections
1. Introduction The Image Reconstruction from Projections (IRP) problem is one of recovering a function from its line integrals. It arises in a variety of scienti c, medical, and engineering elds such as electron microscopy, radiology or industrial testing [1]. IRP is a topic of study for several research groups which have spent a considerable eort in the development of algorithms for solving the IRP problem. Most of these algorithms have several free parameters that strongly aect both the accuracy of the solutions and the computational cost [2]. Our nal goal is to establish a methodology for selecting the free parameters of algorithms in the eld of IRP. As a rst step, we apply optimization techniques to a particular algorithm which has several free parameters. We aim at optimizing the accuracy of the reconstructed images (as measured by a general objective function). The paper is organized as follows: Section 2 introduces the IRP algorithm and the function to be optimized. Section 3 describes our parallel implementation of the Controlled Random Search optimization algorithm. Section 4 contains a discussion of the results for IRP. This work was supported by the Ministry of Education of Spain (DGICYT PR94-357) and Consejeria de Educacion de la Junta de Andalucia (07/FSC/MDM) y and by National Institutes of Health (HL28438 and CA54356).
2. An IRP algorithm: ART using blobs (ARTblob) The series expansion approach to solving the IRP problem assumes that the function f to be reconstructed can be approximated by a linear combination of a nite set of known and xed basis functions bj , f (z )
X J
j
xj bj (z );
(1)
=1
and that our task is to estimate the unknowns xj . Since the measurements depend linearly on the object to be reconstructed and we know what the measurements would be if the object to be reconstructed was one of the basis functions (we use ri;j to denote the value of the ith measurement of the j th basis function), we can conclude that the ith of our measurements of f is approximately
X J
j
(2)
ri;j xj :
=1
Our problem is then to estimate the xj from the measured approximations (for 1 i I ) to (2). To simplify notation, the image is represented by a J -dimensional image vector x (with components xj ) and the data form an I -dimensional measurement vector y. There is an assumed projection matrix R (with entries ri;j ). We let ri denote the transpose of the ith row of R (1 i I ) and so the inner product hri ; xi is the same as the expression in (2). Then y is approximately Rx. In this formulation R and y are known and x is to be estimated. Substituting the estimated values of xj into (1) will then provide us with an estimate of the function f . One possible set of basis functions was proposed by Lewitt [3]. They are spherically symmetric and are not only spatially limited, but also can be chosen to be very smooth. For the purpose of our discussion here, it suces to say that their mathematical description contains the free parameters blrad and blalpha; the choices of which will aect the quality of reconstructions. The basic version of ART operates as follows [1]. The method cycles through the measurements repeatedly, considering only one measurement at a time. Only those xj are updated for which the corresponding ri;j for the currently considered measurement i is nonzero and the change made to xj is proportional to ri;j . The initial estimate of the algorithm x0 is a J -dimensional vector with constant components. For k 0, we set k (3) xk+1 = xk + [yik ? hrik ; x i] r ;
krik k2
ik
with ik = [k(modI ) + 1]. The positive real number is called the relaxation parameter. It is a free parameter of the algorithm. The essential fact for our paper is that a better selection of , blalpha and blrad leads to a better accuracy of the reconstruction. In order to apply an optimization procedure it is necessary to de ne a gure of merit (FOM) which measures the accuracy of a particular reconstruction. Several functions have been de ned for this purpose [2]. In our work we have chosen the RMS (root mean square) of the dierences between the values of the original and the reconstructed images, both discretized on a rectangular grid.
3. A parallel version of the Controlled Random Search algorithm In the selection of a global optimization method for our particular application we have taken into account the fact that the computational cost of the function to be evaluated is enormous (15-20 minutes on a Sparc 10, 50Mhz). As a consequence, a parallel algorithm for global optimization seems to be the most appropriate. In this work we propose a parallel algorithm which is based on the Controlled Random Search (CRS) algorithm of Price [4, 5]. Some parallel approaches have been proposed by McKeown [6], Sutti [7], Ducksbury [8], Price [9] and Woodhams and Price [10] using various kinds of parallel computers and strategies. Our proposal makes only small modi cations to the original sequential version of CRS. These modi cations are aimed at estimating the objective function on several processors simultaneously. The general strategy used in CRS remains in our parallel version. The Parallel Controlled Random Search (PCRS) algorithm is based on a master-worker communication model. In this strategy the master processor executes the PCRS algorithm and a worker processor only evaluates the objective function at the trial points supplied by the master processor. After every evaluation the worker sends the result back to the master. PCRS starts with the evaluation at N trial points chosen at random from the search domain V over which the objective function : Rn ! R is to be optimized. In our description the coordinates of a trial point j are stored in a vector Aj = (Aj1 ; : : : ; Ajn ) and Aj0 = (Aj1 ; : : : ; Ajn ). The objective function at the N trial points is computed in parallel by the worker processors. Two procedures, called SEND and RECEIVE, are used by the master and worker processors to exchange a vector A = (A0 ; : : : ; An ) (see the algorithmic description of PCRS in the Appendix at the end of the paper). Once the N trial points have been evaluated, the master processor chooses randomly
n +1 points (R0; : : : ; Rn) from the set A0 ; : : : ; AN ?1 and determines the centroid G of the set R1 ; : : : ; Rn and a trial point P = 2 G ? R0 . If P is in the domain V , then P is sent to one of the idle worker processors, otherwise a new random choice of (R0 ; : : : ; Rn ), G and P are comput-
ed. In order to get the best eciency of the parallel implementation, this procedure is repeated NP times (NP is the number of worker processors). As a consequence, every processor in the parallel system is doing useful work and the workload of the parallel system is balanced. At this moment, a procedure is executed by the master processor iteratively until a stopping criterion is satis ed. During an iterative step the largest value Am0 in the set A00 ; : : : ; A0N ?1 is determined. Also a new 0 G and consequently a new P are computed, either as P = 2 G ? R 0 or as P = G+2R (see Price [4, 5]). Then the master processor waits for the arrival of a new function evaluation (B0 ) from any worker processor. After this, the master processor sends the new trial point P to this worker processor, for its evaluation. If B0 is smaller than Am0 , then Am is replaced by B . The stopping criterion is based on the maximum distance between any two points in the set A0 ; : : : ; AN ?1 and on the maximum dierence of the objective function in the set A00 ; : : : ; A0N ?1 . A set of test functions was used to check the convergence of the algorithm and its parallel performance. Some results are given in Tables I and II for the problems of Goldstein/Price, Hartman and Shekel. For each problem the same series of ve random sequences were used. Data in Table I are the maximum number of evaluations over the series (the sum totals for all NP worker processors). The index of success for nding the global minimum was 100% for every test functions. The percentage of increase (or decrease) in the number of function evaluations using NP processors, relative to the sequential case, is also given. The results suggest that the number of function evaluations does not increase with the number of worker processors; it seems to depend on the speci c function within a range of 20% as compared to the sequential version. The performance of a parallel algorithm is usually measured by the t1 , where t1 and tNP are speed-up. Speed-up is de ned as the ratio tNP the times spent by the algorithm using one and NP processors, respectively. It is clear that t1 and tNP depend on the number of evaluations in a particular execution of the algorithm and tNP is also a function of the delay introduced in the parallel system because of the interprocessor communications. Let tf and tc be the CPU times for evaluating once the objective function and for the interprocessor communication delay, respectively. Let n1 and nNP be the number of evaluations for a
Table I. Maximum number of function evaluations (absolute and relative) for a set of test functions versus the number of worker processors (NP )
NP Gold/Price Hartman-6 Hartman-3 Shekel-5 Shekel-7 Shekel-10 1 338 2904 852 1419 1270 1190 2 384 2845 853 1215 1242 1258 4 388 2781 865 1218 1240 1250 8 395 2869 943 1254 1236 1245 16 376 2784 983 1157 1155 1254 2 13.6% -2.0% 0.1% -14.4% -2.2% 5.7% 4 14.8% -4.2% 1.5% -12.8% -2.4% 5.0% 8 16.9% -1.2% 10.7% -11.6% -2.7% 4.6% 16 11.2% -4.1% 15.3% -18.5% -9.1% 5.4%
uniprocessor system and for a multiprocessor system with NP worker processors. Then t1 = n1 tf , tNP = nNP (tf + tc ), and
speed ? up = t t1 = nn1 1 +1 tc NP NP tf
(4)
There are two terms in the speed-up equation; the ratio nnNP1 and that due to the delay for communicating data in the parallel system. Table II provides the ratio NP nnNP1 for the set of test functions of Table I. From Table II it can be concluded that, when ttfc < 0:1, almost a linear speed-up and sometimes a super speed-up can be achieved. The algorithm has been implemented with up to eight i860 processors and also on a distributed system of work stations. Data in Table II have been obtained from the distributed work station system. Table II. Values for speed-up considering that ttfc 0:01500g, = f j 0:01440 < 0:01500g, = f j 0:01422 < 0:01440g, + = f j 0:01420 < 0:01422g, = f j < 0:01420g.
maximum dierence between the objective functions is a small fraction of their average value). As a consequence of these results a multidi-
mensional optimization has been done using one set of projection data only. Blob Shapes
0.8
0.8
0.6
0.6
Blob function
Blob function
Blob Shape
0.4
0.2
0.0 0.0
Blrad=2.5556103 Blalpha=19.72911419 0.4
0.2
1.0
2.0 Radius
(a)
3.0
Non Optimum Optimum
0.0 0.0
Blrad=3.352882 Blalpha=4.338262
1.0
2.0 Radius
3.0
4.0
(b)
Figure 3. (a) Blob shapes for the set of trial points which belong to the cluster of minimum value of the objective function, depicted as \." in Figure 2. (b) Blob shapes used for reconstructing the images depicted in Figure 4(b) and (c).
The variables optimized were blalpha, blrad and the relaxation parameter (represented as lambda in Figure 2). Values of the objective function at the trial points tested in an execution of PCRS are shown in Figure 2. Figure 2(b) is an enlargement of Figure 2(a) around the cluster where the minimum value was found. In this graph, the values of the objective function have been represented by several symbols. This has been done to show where the cluster is located. The progression from the higher to the smaller values of the objective function is represented in Figure 2 by the sequence o, *, x, + , of symbols. It can be seen that the shape of the cluster is similar to a line (or \tunnel" through the 3D parameter space) in the plane = 0.0340 and a quasi-linear relation between blalpha and blrad is shown, especially in Figure 2(a). It is interesting to mention that the blobs with parameters close to the optimum have a very similar shape although individually blrad and blalpha vary signi cantly. This can be seen in the Figure 3(a), where the shape of the blob for those points belonging to the cluster, presented by dots in the Figure 2, built by PCRS are shown. For this cluster the values of the objective function dier minimally (they are in the range 0.014194 - 0.014200), while the values of the parameters blrad, blalpha and are in the ranges (2.285, 2.556), (14.57, 19.73) and (0.0335, 0.0345), respectively. A similar behavior was observed in the work [12] for the 2D blob parameter space, where the error function of image representation using blobs showed deep narrow valleys (equivalent to the mentioned \tunnel" in the 3D parameter space). The relationship between the parameters blrad and blalpha which guarantees similarity of blob shape (and consequently similar quality of reconstruction) can be found analytically [12]. This leads to the speculation that for this particular
case it would be advantageous to include into the optimization procedure knowledge about the form of the \tunnel" to speed-up the search for the optimum parameters.
(a)
(b)
(c)
Figure 4. The original image (a) and two examples of its reconstruction using optimum values of ; Blrad; Blalpha (b), and non-optimum values (c).
Figure 3(b) shows examples of two blobs: one optimal and the other not. These blobs were used for the ARTblob reconstruction to produce images shown in Figure 4(b) and (c), respectively, of the phantom depicted in the Figure 4 (a). Finally, in Figure 5, the pro les of the images in Figures 4(a), (b), and (c) along the 64th row have been drawn. It can be seen that for approximating the pro le of (a), the pro le of (b) is overall better than that of (c).
profile at 64-th row
0.25
(a) (b) (c)
0.15
0.05
-0.05
0
50 x co-ordinate of the image
100
Figure 5. Pro les of images represented in Figure 4 at the 64th row.
Appendix A. The PCRS algorithm
Begin PCRS(N; n; V; NP )
Choose N points at random over V ! A0 ; : : : AN ?1 . do j = 0 : min(N ? 1; NP ? 1) SEND Aj to PEj (PEj computes A0 = (A1 ; : : : ; An )) k=0 if N > NP do j = NP ? 1 : N ? 1 RECEIVE (A; IDP ) A ! Ak SEND Aj to PEIDP k =k+1 do j = 0 : min(N ? 1; NP ? 1) RECEIVE (A; IDP ) A ! Ak k =k+1 do j = 0 : NP ? 1 Choose randomly n + 1 points R0 ; : : : ; Rn from the set A0 ; : : : ; AN ?1 . Determine the centroid G for R1 ; : : : ; Rn Pj = 2 G ? R 0 if Pj 2 V SEND Pj to PEj FLAGj = 0 G ! gj R0 ! r0j else j = j ? 1 flag = 0 while until convergency Determine the stored point m which has the greatest function value Am0 if flag = 0 Choose randomly n + 1 points R0 ; : : : ; Rn from the set A0 ; : : : ; AN ?1 . Determine the centroid G for R1 ; : : : ; Rn P = 2 G ? R0 INSIDE = 0 else 0 P = gidp +2 ridp INSIDE = 1 flag = 0 if P 2 V RECEIVE (B; IDP ) SEND P to PEIDP if FLAGIDP = 0 then update success rate (succ) if B0 < Am0 then B ! Am else if (FLAGIDP = 0 and succ < 50%) flag = 1 IDP ! idp if INSIDE = 0 G ! gIDP R0 ! r0IDP FLAGIDP = INSIDE End while End PCRS
References 1. G.T. Herman, Image Reconstruction from Projections: The Fundamentals of Computerized Tomography. New York: Academic Press, 1980. 2. S. Matej, G.T. Herman, T.K. Narayan, S.S. Furuie, R.M. Lewitt and P.E. Kinahan, \Evaluation of task-oriented performance of several fully 3D PET reconstruction algorithms," Phys. Med. Biol., vol. 39, pp. 355{367, 1994. 3. R.M. Lewitt, \Alternative to voxels for image representations in iterative reconstruction algorithms," Phys. Med. Biol., vol. 37, no. 3, pp. 705{716, 1992. 4. W.L. Price, \A controlled random search procedure for global optimization," in Towards Global Optimization 2 (L.C.W Dixon and G.P. Szego, ed.), pp. 71{84, Amsterdam: North Holland, 1978. 5. W.L. Price, \Global optimization algorithms by controlled random search," Journal of Optimisation Theory and Applications, no. 40, pp. 333{348, 1983. 6. J.J. McKeown, \Aspects of parallel computations in numerical optimization," in Numerical Techniques for Stochastic Systems (F. Arcetti and M. Cugiani, ed.), pp. 297{327, 1980. 7. C. Sutti, \Local and global optimization by parallel algorithms for MIMD systems," Annals of Operating Research, no. 1, pp. 151{164, 1984. 8. P.G. Duckbury, Parallel Array Processing. Chichester: Ellis Horward, 1986. 9. W.L. Price, \Global optimization algorithms for a CAD workstation," Journal of Optimisation Theory and Applications, no. 55, pp. 133{146, 1987. 10. F.W.D. Woodhams, W.L. Price, \Optimizing acelerator for CAD workstation," IEE Proceedings Part E, vol. 135, no. 4, pp. 214{221, 1988. 11. I. Garca, G.T. Herman, \Global optimization by parallel constrained biased random search," in State of Art in Global Optimization: Computational Methods and Applications (C.A. Floudas and P.M. Pardalos, ed.), Kluwer Inc, In press. 12. S. Matej and R. M. Lewitt, \Practical considerations for 3D image reconstruction using spherically-symmetric volume elements," IEEE Trans. on Medical Imaging, vol. 15, no. 1, pp. 68{78, 1996. 13. C.G. Han, P.M. Pardalos and Y. Ye., \Implementation of interior point algorithms for some entropy optimization problems," Optimization and Software, no. 1, pp. 71{80, 1992. 14. P.M. Pardalos, A. Phillips and J.B. Rosen, Topics in Parallel Computing in Mathematical Programming. Science Press, 1992.