Comparison of Parallel Reconstruction Algorithms for 3D X-RAY Tomography on MIMD Computers C. LAURENT, F. PEYRIN and J.-M. CHASSERY TIMC-IMAG Laboratory, URA CNRS D 1618 Institut Albert Bonniot, Domaine de la Merci, 38706 La Tronche cedex, FRANCE E-mail :
[email protected]
Abstract
The problem of reconstructing a 3D image from a set of its 2D conic projections from dierent angles of view arises in medical imaging for 3D tomography. 3D tomography requires voluminous data and long computation times. Parallel computing, on MIMD computers, seems to be a good approach to manage this problem. In this study, we propose dierent approaches using a SPMD model on an Intel Paragon, a Cray T3D and a workstations network supporting PVM ( ve SUN 4). We use the synchronous mode and the asynchronous mode for communications. We implement two basic operators (projection, backprojection) and fully reconstruction methods (Feldkamp,ART), on the dierent architectures. We evaluate the theoretical complexities and present experimental computation times for the dierent implementations. The scalability of the implementations is very good because we obtain an eciency equal to 3.7 for an T3D implementation. As an illustration, we show 3D reconstructed images from 2D simulated projections.
1 Introduction
Truly three dimensional tomography is a generalization of conventional 2D tomography allowing the reconstruction of volumes (3D images). In X-Ray tomography, some prototypes using the rotation of a (or several) cone-beam X-Ray source(s) have been built [13][6]. Cone-beam geometry acquisition systems are also used in Simple Photon Emission Tomography (SPECT). In all these cases, the computational problem is to reconstruct a 3D image from a set of its 2D conic projections from dierent angles of view. The 2D reconstruction methods are not adapted and the problem has to be considered directly in 3D. In this study, we are interested with the implementation of cone beam X-Ray reconstruction algorithms. We recall the principles of tomography (section 2) and the main reconstruction methods. We particularly describe two methods: an analytic method, Feldkamp algorithm (section 2.2.1), and an algebraic method, a bloc Art method (section 2.2.2) and we present their sequential algorithms(section 2.3).
These reconstruction algorithms involve large amount of data (536 Mbytes for a 5123 image), and large computation time. For example, the reconstruction of a 1283 image from about of 100.1282 projections, using the extension of the classical ltered back-projection algorithm known as Feldkamp algorithm, requires about 104 seconds on a SUN4 workstation. The parallel approaches seem to be a good solution to reduce these computational times and these memory requirements, and to reconstructed 3D image of realistic size (1283 2563 5123 ). Some parallel approaches have already been proposed the in literature; They may be classi ed according to the type of parallel architecture used : SIMD or MIMD. Miller[11] has proposed a SIMD approach for image reconstruction with a maximun likehood algorithm in SPECT . This approach uses a particular data partition on a 128x128 processors MasPar and a 64x64 processors Decmpp-Sx parallel computer. The 3D image is shared on PE memory. This approach shows the main limitation of SIMD computers: to perform an implementation, the data have to be arranged to have as many as possible active processors . In a previous study [10], we used a similar approach to implement Feldkamp algorithm on a 32x32 processors MasPar using two communications scheme for 3D cone-beam X-Ray tomography. MIMD architectures have dierents topologies (hypercube, mesh and torus) and communication schemes (binary tree, ring). Chen [5] used a communication based on binary tree to compute a 3D CT image with an EM (Expectation Maximization) algorithm divided in two stages: a convolution stage and a backprojection stage. The implementation is done on a IPSC and shows the load-balancing problem to optimize the performance. A simple communication scheme, a ring on an IPSC is used by Charles [3] for the 3D ltered backprojection algorithm. He presents experimental processing times as a function of the number of processor and of the object size. A MIMD approach on an hypercube, implemented by Zapata[14], shows a data partition using a Gray code to minimize data transfers. In another study, Chen [4] proposed some hybrid data partition to overcome the load-balancing problem and to reduce the communication cost. To reduce the cost of parallel implementations Atkins[1] proposed an original solution with a transputers network to reconstruct a PET image. His communication scheme is based on two classes of processors: a control processor called master node and multiple transformation processors called worker node or processor farms. This communication model allows to realize an ecient load-balancing. He obtained good performances on a network of ten processors, but this approach is limited by the architecture choice of network and processors. In this study, we propose dierent solutions for the parallel implementations of basic operators in tomography (projection and backprojection) and two fully reconstructions methods ( Feldkamp, ART) on dierent MIMD ar-
chitectures. The target machines are an Intel Paragon, a Cray T3D and a SUN 4 workstations network (section 3.1). The implementations are based on a SPMD model and we consider either synchronous or asynchronous communication modes. We use an appropriate data partition to improve the load balancing (section 3.2). We have chosen to work with a ring for communication scheme. Since it may easily embedded in the MIMD topologies (hypercube, mesh and torus) used in this study (section 3.3). the implementations based on dierents communications modes are presented in section 3.4. To reduce the development times, the algorithms have been implemented using a message passing library called PVM [2]. We compare the dierent speed up and eciency obtained on a MasPar, an Intel Paragon, a Cray T3D and a workstations network. We also present images reconstructed from simulated data.
2 Principle of 3D Tomography
2.1 The Problem of 3D reconstruction
The problem of truly 3D tomography consists to reconstruct a 3D image from a set of 2D data corresponding to 2D projections of the 3D object. Globally, these data are integrals of the 3D object on straight line of the 3D space. Their organization depends of the acquisition geometry. For a 3D reconstruction using cone-beam X-Ray projections, the straight lines are distributed on cones whose vertices correspond to the dierent positions of X-Ray source ( gure 1). Z S X-Ray Source τ Θ O
M Y
ψ X
S’ r’ P
Acquisition Plane
Figure 1: Projection on cone-beam geometry For each vertices (or X-Ray source position), we obtain a cone-beam projection which can be parametrized by the unit vector ~ of the cone axis. Let P be a point in the projection plane. The projection value denoted P~ () at point?!P may be written as the line integral of the 3D function f on each point M (OM = ~r) on the straight line (SP) (formula 1). Z +1 k + : q kS 0S k:~ ? r0 )d where r0 = S?!0P (1) f (r0: kkSSO P~ (r0) = 0S k ?1 kS 0S k2 + kr0k2
The reconstruction problem consists in determining a function f (x; y; z) from a set of P~ (r0), ~ belonging to a subset of S 2 the unit sphere. The possible 3D reconstruction methods, from cone-beam acquisitions, may be classi ed into analytic methods, algebraic methods and statistical methods [7]. The principles of two rst methods are recalled.
2.2 Reconstruction methods 2.2.1 Analytic methods
Analytic methods are based on a continous formulation of the problem, and on an inverse analytic formula expressing the 3D object f (~r) to be reconstructed in function of its 2D cone-beam projection P~ (~l). In the particular case of a cone beam geometry, all the 2D reconstruction methods are not generalizable. A fundamental operator in this class is the backprojection operator, which may be considered as the dual of the projection operator. This operator associates a 3D image to the set of 2D projections. The value of each voxel value (volume element) is de ned as the sum of the intensity of all projection rays passing though it. The generalization of the conventional 2D ltered backprojection method [8] is known in 3D as the Feldkamp method. When the X-Ray source is rotated on a circular orbit (the vector ~ stay in a plane), it gives an approximated reconstruction. The algorithm simply consists in ltering and backprojecting(4) the 2D weighted projections(2). The ltering operation(3) may be implemented using the Fast Fourier Transform. Another analytic exact reconstruction formula may be derived when the X-Ray source describes the surface of a sphere [12]. Feldkamp algorithm may be summarized as:
Feldkamp algorithm: k P~0 (r0) = P~ (r0): q kSO kSOk2 + kr0k2 P~~ (r0) = P~0 (x0; y0) y k(y0) 0
where k(y) is a lter operator in Fourier space. Z 2 kS 0S k2 :P~ (~r)d f (~r) = (kOS k ? ~r:~ )2 ~ =0
2.2.2 Algebraic methods
(2) (3) (4)
Algebraic methods are based on a discrete formulation of the problem. It is equivalent to solve a large linear system X f = p, where f represents the image, p the set of acquisitions and X the projection matrix. Since for the reconstruction of a 1283 image from 100*1282 projections, the matrix size X is approximately 4:106 8:105, it is scarcely generated on the computer. The algorithm ART [8] is very popular in tomography. This iterative method consists in processing sequentially each projection ray. Iterations may be expressed as:
f k+1
p n ? xn f k k = f + k kx k2 (xn )t n
(5)
Where xn is the nth line of matrix X and k a relaxation factor. Many algorithms, derived from ART, have been proposed in the literature. In particular a bloc ART iterative method [12] which allow to process a complete projection at each iteration has been adapted to 3D reconstruction from cone-beam projections. Other algorithms like SIRT [9] consists, in processing at each iteration, all rays passing through a same voxel. Conventional reconstruction methods such as ART or SIRT are iterative and may be interpreted as iterations of weighted backprojection of some dierences between computed and measured data.
2.3 Sequential Implementations
From the brief review on 3D reconstruction algorithms, it may be seen that most of algorithms use the same basic operations: projection and backprojection. The projection allows to go from 3D space to 2D space, while the backprojection goes from 2D space to 3D space. Since these operations may be considered as dual from each others, their implementation will be similar. The implementation of such operators, may be done using either a "VoxelDriven" method either a "Ray-Tracing"method. In this paper, we use the second approach with an bilinear interpolation. The schemes of the algorithms are given below. We use the following notations.For convenience, the complexP) ity is simply evaluated in function of the number of Projector Operation (! and Backprojection Operation ( B ). V : reconstructed image or 3D image of size N 3 vi: element (voxel) of V Vi : subimage of V composed of Ndp vi where dp mean data partition Pj : Projection or 2D Acquisition of size M 2 there is m Projections Pja: element (pixel) of the Projection Pj !P : Projection Operation: vi put its value on the pixels Pja ; Pjb; Pjc ; Pjd B : Backprojection Operation: v get its value from the pixels P ; P ; P ; P i ja jb jc jd Projection Algorithm Backprojection Algorithm read(V ) create(V ) For all Pj For all Pj create(Pj ) read(Pj ) For all vi For all vi P vi ! Pja; Pjb ; Pjc ; Pjd vi B Pja ; Pjb; Pjc ; Pjd write(Pj ) write(V ) 3 Complexity:O(m:N ) Complexity:O(m:N 3)
We now present two reconstruction algorithms using these basic operators: the Feldkamp Algorithm, and the Bloc ART iterative Algorithm. We use following notations to describe the algorithms.
Pmj : Projection or 2D Acquisition of size M 2. There are m Projections Pcj : Calculated Projection of the reconstructed image V Pdj : Dierence image between Pmj and Pcj Pwj : Weighted image of the projection of V calculate at the rst iteration Feldkamp Algorithm Bloc ART iterative Algorithm Algorithm for one cycle create(V ) For all Pmj For j = 1 to m read(Pmj ) For all vi weighting(Pmj ) P vi ! Pcja; Pcjb; Pcjc ; Pcjd ltering(Pmj ) Pdj = (Pmj ? Pcj )=Pwj ) For all vi For all vi vi B Pmja; Pmjb; Pmjc ; Pmjd vi B Pdja; Pdjb ; Pdjc ; Pdjd write(V ) Complexity:O(2:m:N 3) Complexity:O(m:N 3)
3 Parallelization of reconstruction methods
In this section, we present the parallel implementation of the projection and backprojection operators and of the two reconstruction algorithms on the different parallel computers. The main problem, for the 3D reconstruction algorithms is data interdependence. Data are divided in two sets, the 2D acquisitions (or projections) set and the 3D image to be reconstructed. Each voxel of the 3D image has to see all acquisitions and each pixel of the 2D projections has to see the whole 3D image. For such a problem, a simple approach is to distribute data on the processor memory, to make local computations and to exchange data through the communication network. To compare our dierent approaches, we use the SPMD model. With MIMD architectures, the problem is to choose appropriate communication schemes. We use asynchronous implementations to win some eciency.
3.1 Parallel computers
We present the dierent architecture used for our implementations. We describe their topology, processor and performances. an Intel Paragon : Its topology is a grid of 32 nodes, but only 30 processors are available. Each node has two processors i860 XP, one dedicated to the communications and other one dedicated to calculations. The peak performance of i860 XP processor is 75 M ops and memory is 64 MBytes. Then the Paragon peak performance is 2,25 Giga Flops. The communication protocol is the Wormhole. a Cray T3D: Its topology is a 3D torus (8x4x4) of 128 processors or 64 nodes. Each node is composed of two ALPHA processors with 64 MBytes. The peak performance of one processor is 150 M ops, then the T3D peak performance is 19,2 Giga Flops. We use library message passing library PVM[2] to implement our algorithms.
A workstations network: Its topology is a set of ve SUN4 workstations
linked by an Ethernet network. The processors of the workstations are SPARC with 16 MBytes and a peak perform at 5 or 2,5 M ops. The workstations network performance is 17.5 Mega Flops. The workstations communicate with PVM library. Two of these workstations go two times faster than the others, then the network is heterogeneous workstations and this heterogenity may be taken into account to implement the dierent algorithms.
3.2 Data partition
Since MIMD computer has a memory of some Mbytes on each processor(PE processors), all data can not be loaded as the same time. An ecient reconstruction algorithm implementation must take into account the input-output exchange of data (3D image, set of projections). The choice of the data partition is function of the sum of processor memory and the size of problem. Each voxel of the 3D image is independent of the other voxels, the position of its projection depends of the geometric parameters of the acquisition system and changes for each angle of view. Then we have chosen to decompose the 3D image, rather than the 2D projections, on the processor memory. We propose two data partition in function of the size of the processor memory. When the size of data is greater than the size of processors memory, we assume that the 3D image (N 3) is decomposed in T subcubes of size NT :N 2 with T:NT = N , and that we have m 2D projections (M 2). Then we consider, that at any moment, a NT :N 2 subimage and a M 2 projection are shared on the PE memory. When the size of data is less than the size of processors memory, we assume that the 3D image (N 3) is decomposed in PE subimages of size NPE :N 2 with PE:NPE = N , and that we have m 2D projections (M 2) distributed on PE processors, with mPE images M 2 on each processor (m = mPE :PE ). Generally because of data size we are in the second data partition case.
3.3 Communication schemes
To compute a local projection or a local backprojection on a processor, the 2D projection and a subimage from the 3D image are required, on this processor. As the data are distributed on the processor memory, and to minimize the communications cost, we must choose to communicate either the 3D subimages either the 2D projections. For instance, a global projection on a workstations network, computed on a ring of PE processors with a synchronous communication, will cost either m:M 2, if the 2D projections are communicated, either N 3, if the 3D image is communicated. The choice of communication scheme depends if N 3 > m:M 2. In this study, we assume that it is the case for all computers and we prefer to communicate 2D projections by the network rather than the 3D image. The topology of this exchange is the ring because all processors must see all 2D images.
3.4 Parallel implementations
In this part we present three methods to implement our algorithms. We show the parallel algorithms of the basic operators of projection and backprojection. Like for sequential implementations, the parallel implementations of Feldkamp and Art algorithms will be based on the parallel implementations of these operators.The corresponding algorithms are given below. Synchronous method Since data are distributed on the processor memory, we make alternately a communication phase and a calculation phase on a "synchronous" mode. For instance to compute a globally projection, each processor must see all the 2D projections to compute the local projection of its 3D subimage . When each 2D projection has seen the whole set of 3D subimages they may be written. The theoretical cost is 2 ) for the comO(NPE :N 2:m) for the calculations and O((PE ? 1) m:M PE munications of the 2D projections. Asynchronous method The main dierence of the asynchronous algorithm with the synchronous algorithm is the use of non-blocking receive. Then a processor can compute a projection while he receives an another projection. This method allows to overlap the communications by the calculations,2 and the total theoretical cost becomes O(NPE :N 2:m + ":(PE ? 1) m:M PE ) where " is the percentage of communications which do not overlap. Asynchronous method with an Adaptive data partition In a MIMD computer all processor are same, but the workstations network is heterogeneous, and the slowest workstations penalize the fastest. Then to improve the total execution time, we distribute the 3D image according to the speed of each workstation using a "Adaptive data partition". For this purpose, after each execution, we evaluate the processor performance and their location to calculate the workstation speed. During the next execution, these informations are used to allocate the 3D image. All the processors determine their location and their theoretical speed up, and communicate them to one processor. This later calculates the partition of data for each processor and reply to them. The theoretical cost is 2 m:M 2 O(Nspeedup :N :m + ":(PE ? 1) PE ) where Nspeedup is the size depending of the workstation speed up. Projection Algorithm Asynchronous Communications Synchronous Communications While(Vi have not seen all Pj ) For = 1 to If (recv(Pk )=false) M M? g create( j ), j 2 f n PE n PE create(Pj ) For all j For v For i 2 i i 2 Vi P P ;P ;P ;P P vi ! ja jb jc jd i ! ja jb jc jd send( j ) If (Pj have seen all PE ) recv( j ) write(Pj ) write( j ) Else send(Pj ) n
PE
P
P
P
: : : P( +1)
P
v
V
v
P
P
P
P
;P
;P
;P
1
Backprojection Algorithm
Synchronous Communications Asynchronous Communications For n = 1 to PE While(Vi have not seen all Pj ) read(Pj ),Pj 2 fPn PE M : : :P(n+1) M ?1 g If (recv(Pk)=false) PE read(Pj ) For all Pj For vi 2 Vi For vi 2 Vi B vi B Pja ; Pjb; Pjc ; Pjd vi Pja; Pjb ; Pjc ; Pjd If (Pj have not seen all PE ) send(Pj ) send(Pj ) recv(Pj ) write(Vi) in parallel write(Vi) in parallel
4 Experimentations 4.1 Execution Times
We give the execution times obtained for the two operators (projection, backprojection) and for an analytic reconstruction algorithm (Feldkamp algorithm) which performs the backprojection of the ltered 2D projections. We have tested the following cases: On a single SUN 4 workstation, we compute the sequential algorithms for comparison. On a SIMD computer (MasPar 1K), we implemented synchronous algorithm in a previous study [10]. On a Intel Paragon, we have used the synchronous algorithms. On Cray T3D, we have implemented both the algorithms and asynchronous algorithms. On a workstations network, we have implemented the synchronous and asynchronous algorithms and also the asynchronous algorithms with adaptive data partition. 0.85
1000
0.8
0.7
100 Feldkamp
Efficiency
Speed up (log scale)
0.75
PVM Synchronous PVM Asynchronous PVM Adaptative Partition
0.65 0.6 0.55
10
0.5 0.45 0.4 1
30 Network MASPARPARAGON SP1
FARM
(a) Relative Speed up
T3D
40
50
60
70
80 90 size of N
100
110
120
(b) Eciency of Feldkamp implementations on workstations network
Figure 2: Comparison of speed up for dierent computers and dierent implementations
SUN 4 Times on a log scale for In gure 2(a), we show the relative speed up Parallel Times the workstations network with synchronous algorithms (Net. Sync) and asynchronous algorithms with adaptive data partition (Net. A.A.), the Maspar implementations, the Paragon implementation and the T3D implementations with synchronous algorithms. This gure compare the computer performances for a reconstruction problem. The algorithms have been tested for with 128 2D acquisitions 1282 and 1283 3D image. The implementations on the Maspar 1K are penalized by the small memory size which increases the input-output exchanges. The T3D implementations reach relative very large speed up .
Figure 2(b) shows the eciency of Feldkamp implementations on workstations network with 128 2D acquisitions 1282 and a 3D image of size N 3. The eciency increases with the size of the 3D image. We observe that we approximately win 20 to 30 percent of eciency for the dierent implementations. The "asynchronous" algorithms with the "Dynamic data partition" allows to optimize calculations and communications with this network of heterogeneous workstations. We obtain a good improvement by using overlapping and adaptive techniques on workstations network, since the part of communication is important 2500
4
2000
3.5
Synchronous Asynchronous
efficiency
times(sec)
3 1500
1000
Synchronous Asynchronous
2.5
2 500
1.5
0
1 20
40
60 PE
80
100
120
(a) Times of Feldkamp implementations on Cray T3D
20
40
60 PE
80
100
120
(b) Eciency of Feldkamp implementations on Cray T3D
Figure 3: Comparison of Feldkamp implementations on Cray T3D We compare the dierents times on T3D with obtained by increasing the number of processors. On the contrary the improvement of asynchronous implementation on T3D is weak, because of the good eciency of the communications on this machine (see gure 3(a)). Although, the scalability of the implementation is very good. We have average eciency at 1:5 and a peak at 3:7 for 32 processors ( gure 3(b)). We can explain this peak by a size of data which is lower than the cache memory size. We have tested the implementations with larger data, but to calculate the eciency are limited by the size of processor memory. The Art method have also been implemented, but we are limited by the number of iterations of this method for an acceptable
size of data. One iteration of Art takes a little less than two times Feldkamp algorithm, which is in accordance with the theoretical complexities.
4.2 Reconstructed Image from simulated data
As an illustration gure 4 presents three planes of an simulated data 4(a), a reconstructed data with a Feldkamp algorithm 4(b) . The illustration 4(c) shows the dierent pro les of this plane.
300 Simulated image Feldkamp algorithm 250
200
150
100
50
0 0
(a) Simulated image
(b) Feldkamp Algorithm
50
100
150
200
(c) Pro le of dierent planes
Figure 4: A Sphere simulated and reconstructed by the Feldkamp algorithm
5 Conclusion and Perspectives We have presented the implementation of basic reconstruction operators and fully reconstruction algorithms on dierent architectures (Paragon, Cray and a workstations network). Dierent communication modes (synchronous and asynchronous) have been used to improve the computational times. The overlapping and adaptive techniques increases the eciency, but the improvement is function of the part of communication. However, the dierent results show that the eciency with increases with the size of the 3D image, and demonstrates the scalability of implementations of reconstruction algorithms. Similar implementations of projection and backprojection operators and fully reconstruction methods (Feldkamp, ART) will be tested on a SP1 (IBM) and on a farm of processors with an ATM network. They will be easily developed thanks to the portability of PVM . Other algebraic reconstruction methods, derived from the ART method, will be easily elaborated using the same parallel methods. Experimental data will be used to reconstruct a physical 3D image. The parallelization of reconstruction algorithms allows to reconstruct large images with reduced computation times.
250
Acknowledgments
This work has been done within the french CNRS research group (GDR 134) TDSI (Traitement du signal et de l'image).
References
[1] M. S. Atkins, D. Murray, and R. Harrop. Use of transputers in a 3-D Positron Emission Tomograph. IEEE Transactions on Medical Imaging, 12(2):173{181, 1993. [2] A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM3 user's guide and reference manual. Technical report, Oak Ridge National Laboratory, 1993.
[3] H.-P. Charles, J.-J. Li jjli, and S. Miguet. 3D image processing on distributed memory parallel computers. In SPIE, volume 1905, pages 379{390, 1993. [4] C. M. Chen and S.-Y. Lee. On Parallelizing the EM Algorithm for 3-D PET Image Reconstruction. IEEE Transactions on Parallel and Distributed Sytems, 5(8):860{873, 1994. [5] C. M. Chen, S.-Y. Lee, and Z. H. Cho. A Parallel Implementation of 3-D CT Image Reconstruction on Hypercube Multiprocessor. IEEE Transactions on Nuclear Science, 37(3):1333{1346, 1990. [6] D. Saint-Felix et al. A New system for 3D computerized X-RAY angiography: rst in vivo result. In Annual International Conference of the IEEE Engineering in Medecine and Biology Society, pages 2051{2052, 1992. [7] L. Garnero and F. Peyrin. Methodes de Reconstruction 3D en Tomographie X. Technical report, GDR TDSI CNRS, France, May 1993. Rapport de synthese 93-01. [8] G. T. Herman. Image Reconstruction from Projections. Academic Press, New York, 1980. [9] A.V. Lakshminarayanan and A. Lent. Methods of least squares and SIRT in reconstruction. J. Theor. Biol, 76:267{295, 1979. [10] C. Laurent, F. Peyrin, and J.-M. Chassery. Calculs Paralleles de Projection en Tomographie 3D. In Actes des 6iemes rencontres sur le parallelisme-RenPar6, page 324, 1994. [11] M. I. Miller and C. S. Butler. 3-D maximun a posteriori Estimation for Single Photon Emission Computed Tomography on Massively-Parallel Computers. IEEE Transactions on Medical Imaging, 12(3):560{565, 1993. [12] F. Peyrin. Methodes de Reconstruction d'Images 3D a partir de Projections Coniques de Rayons X. PhD thesis, University Claude Bernard LYON 1, 1990. [13] E.L. Ritman, J.H. Kinsey, R.A. Robb, L.D. Harris, and B.K. Gilbert. Physics and technical considerations in the design of the DSR. J. of Roenhgenology, 134:369{374, 1980. [14] E. L. Zapata, I. Benavides, F. F. Rivera, J. D. Bruguera, and J. M. Carazo. Image Reconstrcution on Hypercube Computers. In The 3rd Symposium on the Frontiers of Massively Parallel Computation, pages 127{133, 1990.