Parallel Algorithm Based on a Frequential ... - cs.UManitoba.ca

0 downloads 0 Views 395KB Size Report
We present a tomographic reconstruction algorithm based on a frequential decomposition of the data. We show that the frequential components of the ...
Parallel Algorithm Based on a Frequential Decomposition for Dynamic 3D Computed Tomography Thomas Rodet, Laurent Desbat TIMC-IMAG, UMR CNRS 5525 IAB, Facult´e de m´edecine, UJF 38706 La Tronche Cedex, FRANCE [email protected] [email protected]

Abstract

Pierre Grangeat LETI-DSIS-DRT-CEA 17 Rue des Martyrs F 38054 GRENOBLE Cedex 9, FRANCE [email protected]

1.1. Tomography

We present a tomographic reconstruction algorithm based on a frequential decomposition of the data. We show that the frequential components of the attenuation function to be identified can be reconstructed from the frequential decomposition of the data. Moreover, down sampling techniques added to the identification of null components and coupled to compression techniques, speed up the reconstruction time up to six compare to the classical FBP. We identify the optimal number of frequential components. We show reconstructions from real data. A parallel implementation of our new algorithm is then proposed and evaluated on two small PC clusters.

1. Introduction

We first recall basics of 2D reconstruction of an attenuation function from projections. For simplicity, we suppose that f ∈ S(R2 ), where S is the Schwartz space of fast decreasing infinity smooth functions, see [21] for generalizations and proofs. The Radon transform of f ∈ S(R2 ) is the integral on lines L(θ, s) = {x = (x1 , x2 ) ∈ R2 , x1 cos θ + x2 sin θ = s, θ ∈ [0, π], s ∈ R} (see figure 1(a)):  f (s cos θ − t sin θ, s sin θ + t cos θ)dt. (1) Rf (θ, s) = R

In the following, we consider the Radon transform at a fixed angle θ as a mono-dimensional function of s denoted by Rθ f (s) = Rf (θ, s). The backprojection operator R is the dual operator associated to R: ∀g ∈ S([0, π[×R), ∀x ∈ R2 ,  π    g(θ, x1 cos θ + x2 sin θ)dθ. (2) R g (x) = 0

Dynamic X ray tomography is a new medical imaging modality. In conventional scanner, a static cross-section of the patient is reconstructed from a set of x-ray projection acquired from the rotation around the patient of a X ray source and a set of detectors. In dynamic tomography a set of successive cross sections is reconstructed enabling the imaging of naturally moving organs such as the heart or organs in dynamic interaction with exterior elements. This is the case for CAS (Computer Assisted Surgery) where we would like to control under CT imaging the progression of surgical tools (example: pericardiac punction). The diffusion of a contrast product in angiography, or the diffusion of a marker in nuclear imaging, can also be studied dynamically. Thus dynamic tomography is becoming a very promising tool for the diagnostic of diseases, the study of their evolution but also for guiding surgeons during CAS.

The most efficient reconstruction algorithm used in CT scanners is the filtered backprojection. It is based on Fourier analysis. The Fourier transform of f ∈ S(R2 ) is a function of the frequencies ν ∈ R2 :  1 f (x)e−iν·x dx fˆ(ν) = 2π R2  1 = f (x1 , x2 )e−i(ν1 x1 +ν2 x2 ) dx1 dx2 . 2π R2 The Fourier transform of the Radon transform is on the second variable:  1  Rθ f (σ) = √ Rθ f (s)e−isσ ds. 2π R

The key theorem in tomography is the so called  projection-slice theorem: The Fourier transform R θ f of

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

the Radon transform of f in the direction θ is the slice along the same direction θ of the Fourier transform fˆ of f (see figure 1). Let f ∈ S(R2 ), we have ∀θ ∈ [0, π[, 1  ˆ R θ f (σ) = (2π) 2 f (σ cos θ, σ sin θ).

s

σ∈R

(3)

For demonstrations, see [21, 23]. FBP algorithms are discretizations of (5) with apodization filter of (6). The discrete projection (1) measured in a scanner are numerically filtered and backprojected into the reconstructed image being a 2D discretization of f .

1.2. Tomofluoroscopy

Rθ f

x2

θ O

x1 (a)

f Fourier transform 1D Fourier transform 2D

ν2

θ 0

ν1 (b)

Figure 1. (a) The Radon transform of f , (b) the Fourier transform of f and the Fourier transform of the Radon transform of f . From the projection slice theorem, we have directly the convolution theorem: ∀g ∈ S(R2 ), ∀f ∈ S(R2 )Rθ (f ∗ g) = Rθ f ∗ Rθ g

(4)

Moreover, from (3) we obtain the Filtered BackProjection (FBP) inversion formula for the Radon transform ∀f ∈ S(R2 ), f =

1  −1 R I Rf, 4π

(5)

where I −1 is the filter operator defined by:  1 |σ|−1 gˆ(σ)eiσt dσ (6) ∀g ∈ S(R), I −1 g(t) = (2π)− 2 R

Tomofluoroscopy is a particular case of dynamic tomography. It is CT reconstruction of dynamic organs. Tomofluoroscopy is used to guide the surgeon during an intervention in a CT scanner. This is the case for pericardiac punction in which we want to control the introduction of a punction tool close to a beating heart [3] or more generally for biopsy close to moving critical organs such as lungs or the heart [8, 6]. A “real time” reconstruction is needed. Thus fast acquisition systems and fast image reconstruction methods are necessary. That is the reason why tomofluoroscopy is a recent technique. The early developments in 1993 lead to the first clinical tests in 1996 [17]. Indeed, the speed of the last generation of scanner allowing more than 2 slices acquisition per second, is necessary for dynamic CT. Moreover, specialized computer architectures are also mandatory for fast reconstruction. Nowadays tomofluoroscopy is becoming 3D in space [15, 16]. New 3D scanners, with multi-line detectors yields the acquisition of several slices in parallel during less than one turn of the source around the patient. In 3D tomofluoroscopy, a weakly 3D volume (multi-slice up to 512 × 512 × 32) must be reconstructed continuously along the time (4D data). A huge number of data must be processed in real time. For the reconstruction of L images N × N , where L is the number of detector lines (number of slices) and N is the number of pixel in a direction of the image, we need M = N π/2 x-ray projections on N detector cells in a detector line (see the sampling conditions [21, 22, 10]) times the number of detector lines L. The number of data is thus πN 2 L/2. The number of floating point operations for the reconstruction of L images N × N from π/2N 2 L is close to 3LπN 3 /2 floating point operations [27]. For N = 512 and L = 16, this means 10 Gflops. Tens of such 3D images are needed each second for CAS, thus more than 100Gflops are necessary. Performance of computer is growing but today the flat panel digital detector technology performance grows at least at the rate of the computer technology. Flat panels of L × N pixels with L and N close to 2000 and with up to 30 frames per seconds, are emerging [7]. Thus, in order to obtain real time reconstructions, we have to speedup the reconstruction algorithms. In this paper, we describe our speedup approach of image reconstruction algorithm based on computation compression. We then show that this new algorithm can be fur-

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

ther speedup on small PC clusters. After a brief state of the art in section 2, we present in section 3 our Multi Channel FBP algorithm and in section 4 its parallelization on two PC clusters.

2. Tomographic reconstruction speed up 2.1. Hardware speed up Since the very beginning of the development of CT scanner, dedicated computer architectures have been developed for the image reconstruction. Indeed, radiologists want to have immediately after the acquisition, an image for establishing a diagnosis available for the patient or for other physicians. Dedicated ASIC have been developed by the CT companies. These very specialized hardware systems are very efficient but their development costs are high and they are not very flexible. Only big companies can implement ASIC for CT on products having a sufficient large market. FPGAs are less efficient but have the great advantage to be programmable. Their development is much easier and the cost is much lower in comparison to ASIC. Recently several teams have developed FPGAs for the fast processing of large volume of tomographic data [13, 18]. Dedicated processor architectures are also developed for CT applications. Contrary to ASIC, they can be programmed and thus are much more flexible. I/O and data bandwidth are often the critical parts of these systems [26]. Finely, common researchers or users, wanting to develop new acquisition geometries, new usages of CT or mobile X-ray (for interventional imaging), have generally only access to standard computers such as PC. Now, PC clusters offer high performance for a very low cost. They generally suffer from relatively low I/O and communication performances.

eral times in a recursive and hierarchical way to build up the full backprojection. In this approach, just as in the FFT algorithm, the backprojection of the complete image is simultaneously computed [4, 28, 27, 9, 5]. The last family is based on a divide and conquer approach. The reconstructed image is divided into smaller reconstructed images. S. Basu and Y. Bresler [2] divide the reconstructed image in the direct space into a grid of small images. In order to reconstruct each of them, they shift and down-sample the projections. This approach is not an exact method and some artifacts appear on the final image. Our approach is based on a divide and conquer method too, but we divide the Fourier transform of the image in a set of smaller images through a grid decomposition in the Fourier space. By using the Fourier slice theorem we can prove that our method is theoretically exact [24, 25]. In the next section, we introduce our speedup approach based on compression techniques.

3. A computation compression approach for FBP 3.1. Multi Channel FBP The main idea of our approach is to divide the reconstructed image into B 2 smaller reconstructed images. We divide the problem in the Fourier space. In practice, we decompose B 2 frequential channels (fgj ) of f in the direct space using a classical backprojection operator. Only the significant channels are reconstructed. We define a frequential decomposition of f on the set of functions (gj )j∈{1,...,B 2 } by 2

2.2. Algorithm speed up Three families of fast tomographic reconstruction algorithms can be identified. The first is the family of Fast Fourier Transform (FFT)-based algorithms. These algorithms are directly derived from the well-known Fourier Slice theorem [21]. This approach requires an interpolation from a polar grid to a rectangular grid in the Fourier space in order to make use of the 2D inverse FFT. This interpolation is the main issue and it is difficult to offer good accuracy with a significant speedup [20, 19]. But recently, some works based on the linogram geometry [11, 12], and new interpolation techniques [14, 29] or on the fractionar Fourier transform [1] have appeared, showing that these approaches could become a good way to produce faster and efficient algorithms. The second family speeds up the backprojection step. The basic idea is that partial sums of 2 can be reused sev-

f≈

B  j=1

f ∗ gj

(7)

The functions fgj are elementary frequential channels of f : fgj = f ∗ gj , with j = 1 . . . , B 2 . In practice, we have chosen the functions gj such that gˆj are the indicators of adjacent squares covering a 2D band region of the Fourier space. These functions gˆj are the most compact to cover a square region. They allow also to maximize the downsampling factor of f ∗ gj . After the reconstruction of all frequential components, we merge them to obtain the full image (see figure 2). Our new reconstruction algorithm is divided into three steps: the multichannel data decomposition, the backprojection producing fgj and their merging into f (see figure 2). This algorithm is a MultiChannel Filtered BackProjection (MCFBP) algorithm.

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

3.3. Speeding up the MCFBP algorithm Sinogram

Indirect decomposition

Null projections The projections of the frequential components Rfgj are usually equal to zero for θ outside of a relative small interval θ ∈ [θmin , θmax ]. On figure 3, for instance, we have:    g5 ) f supp R θ g5 = supp (ˆ

Frequential components

sinograms

 ⇒ R / [θmin , θmax ] θ fg5 = 0, ∀θ ∈

Back projection

Thus for the reconstruction of the frequential components fgj , we backproject only the angles between the corresponding θmin and θmax . The speed-up obtained depends on the frequential component. For example, we backproject all angles for fg0 and only approximatively M B angles for the frequential component fgB−1 , if B is large. To analyze the speed up factor obtained by this property, we count the number of projection which contribute to each frequential component. This problem is equivalent to count the num ber of lines (R θ f ) which intersect a square of the 2D grid (supp(fˆgj )) (see figure 3).

Merging step

Frequential components

Full image

νy

Figure 2. MCFBP algorithm scheme based on the frequential decomposition of image through a filtering applied on the sinogram.

2b

gˆ5 θmax

We apply the inversion formula of the Radon transform (5) on the frequential components fgj : 1  −1 R I R (f ∗ gj ) 4π

1  −1 R I (Rf ∗ Rgj ) 4π

b− 5

b+ 5

(8)

Figure 3. Angular interval containing frequential informations

(9)

If the grid is composed by B × B squares, it is easy to state that each line intersects less than 2B squares. The mean number of contributions per frequential component is 2M less than 2BM B 2 = B . Thus, we reduce the computation at B least by a factor 2 compared to a classical backprojection algorithm.

Using the convolution theorem (4), we get: fgj =

νx

O

3.2. Theoretical foundation

fgj = f ∗ gj =

θmin

Thus to reconstruct fgj , we apply the filter Rgj on the data Rf . After this filtering step, we use a classical Filtered Backprojection (FBP) algorithm on Rfgj to obtain the frequential component fgj . When all small size problems of the identification of fgj are solved, we compute the full image f by summing the frequential components fgj B 2 (f = j=1 fgj ).

Sampling theory The sampling conditions of the Radon transform (Rf ) depend only on the maximum radial fre-

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

quency of f [21]. Let us denote b+ j the maximum radial frequency of fgj . The function fgj is a b+ j -band limited function (see figure 3). We assume that the number of detector cells N and the number of projections M verify the sampling conditions to reconstruct f . We define the restricted number of detector cells Ngj and the restricted number of projections Mgj adapted to the support of the function f gj :

16: Send the frequential channel to the master process 17: end for 18: Receive all frequential channels 19: Merging all channel in Fourier space

Ngj and Mgj verify the sampling condition too [21]. We can show that if we backproject only Mgj projections, we speed up the backprojection step by a factor two.

4.2. Numerical experiments

Ngj =

b+ j N b

and Mgj =

b+ j M b

(10)

4. Parallelization and numerical experimentation 4.1. Parallel algorithm

The different steps of the MCFBP algorithm are distributed on P process. One process is the master and P − 1 process are slave. The step 2 to 5 are execute by all process, the step 6 to 17 are distributed on the P −1 slave process and the step 18 and 19 are compute only by the master process.

We reconstruct a 3D phantom f (see figure 4(a)) composed by a scanner volume reprojected in a cone beam geometry. The reconstruction is calculated on a 32×512×512 voxel grid. The data are rebinned to produce a fan-parallel projection geometry collected on 720 angles, uniformly spaced over [0, π[.

In the previous section, we have seen that the frequential decomposition of the reconstruction yields a floating point operation reduction. This decomposition is also naturally parallel. Indeed, each frequential component can be independently computed. In the following, we present the general scheme of our parallel MCFBP algorithm using MPI. MCFBP Algorithm j , maps of non 1: Initialisation: Decomposition filter Rg zero projection ηj (θ) (performed off line) 2: rebin fan beam projection into parallel beam geometry (Rf ) 3: for all angle θ ∈ [0, π] do 4: FFT 1D of Rθ f 5: end for 6: for all channel j ∈ [1, B 2 ] do  7: for all projection R θ f belonging to the non zero projection map ηj (θ) do   8: multiply R θ f by the decomposition filter Rθ gj and the ramp filter.  9: inverse FFT of R θ fgj 10: for all pixel (xi , xl ) belonging to the downN sampled grid N B × B do 11: evaluate the address projection of x on  f R θ gj 12: linear interpolation on the address projection 13: accumulation of this value with fgj (xi , xl ) 14: end for 15: end for

(a)

(b)

(c)

(d)

Figure 4. (a) Reconstruction with a classical fan-parallel beam algorithm (z=6), (b) Reconstruction with our fan-parallel beam algorithm, using 256 frequential components (z=6), (c) Slice of the volume reconstruct with the classical algorithm (y=353), (d) Slice of the volume reconstruct with the MCFBP algorithm (y=353).

The figure 5(b) shows that the software speedup of the MCFBP compare to FBP is 6 when the problem is decomposed on 256 frequential components. However, the price to pay is a small image quality degradation of the reconstructed image, see figure 4(b), due to interpolation errors in the Fourier space [25]. Indeed, the Discrete Fourier Transform yields a numerical estimation of the sampling of the Fourier Transform on a cartesian grid. Our MPI implementation of the MCFBP with 256 channels has been tested on two clusters of bi-processors. The

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

first one is composed of Athlon MP1800+ interconnected by an Ethernet 100 network (see results on figure 6 (c) and (d) the top curves) and the second one is composed of Athlon MP2000+ processors interconnected by an Ethernet 1000 network (see results on figure 6 (c) and (d) the bottom curves). In figure 6(d) we see that we obtain a parallel speedup of two on four processors and that a speedup of 3 is obtained for 10 processors. The use of an Ethernet 1000 network on the second cluster does not increase significantly the performance of the program. Moreover the parallel speedup on the Ethernet 1000 cluster is lower because its processors (MP2000+) are around 15% faster than the MP1800+ (Ethernet 100 cluster processor) on this application. The relatively poor speedup can be explained by two facts. Firstly, more than 16% of the computation is sequentially executed before the beginning of the parallel execution. This means that the speedup is essentially bounded by 6. Secondly, non blocking communication were used in order to overlap communication by computation. However, they do not improve the elapsed time. The communication could not be masked by computations.

CPU Time (secondes)

Number of frequential channels Software speed up factor

5. Conclusions and perspectives We have presented the MCFBP algorithm, a new approach to speed up the reconstruction in tomography. The MCFBP is derived from the FBP method, the most efficient sequential algorithm used in scanner. The MCFBP speedups the FBP algorithm by a factor up to 6 for 32 simultaneous reconstructions of 512 × 512 images. We have shown that a MPI implementation of the MCFBP on a small PC cluster on Ethernet (100 or 1000) yields a speedup of a factor two on 4 processors up to three for 10 processors. This means that a speedup factor of 15.5 can be obtained on a 6 processors PC cluster compare to the FBP algorithm. In a future work, we want to test our MPI implementation on a small Myrinet PC cluster in order to get a better parallel speedup. Indeed, the used of a small PC cluster in an operation room is possible and can make the 3D interventional imaging feasible.

Acknowledgment This work is supported by EC and Regional grants (MI3 EC project, IST 1999 12338, see mi3.vitamib.com ; DynCT EC project, IST 1999 10515, the project AD´eMo from the R´egion Rhˆone Alpes and the National/R´egional project CIMENT : the software have been developed on the i-cluster from ID-IMAG (see www.id-imag.fr) and tested on the PhyNum cluster and on the BioIMAGe cluster (see www.ujf-grenoble.fr/CIMENT)).

Number of frequential channels

Figure 5. (a) CPU Time, (b) software speed up factor according to the number of frequential channels.

References [1] A. Averbuch, R. R. Coifman, D. L. Donoho, M. Israeli, and J. Wald´en. Fast slant stack: a notion of Radon trasnform for data in a cartesian grid which is rapidly computible, algrebraically exact, geometrically fairthful and invertible. submitted to SIAM Scientific Computing; (http://www.math.tau.ac.il/ ˜ amir), 2002. [2] S. Basu and Y. Bresler. o(n2 log2 n) filtered backprojection reconstruction algorithm for tomography. IEEE Trans. Image Processing, 9:1760–1773, 2000. [3] J. Bellow, H. Wright, and C. Unger. CT-guided pericardial drainage catheter placement with subsequent pericardial sclerosis. J. Comput. Assist. Tomog., 19:672–675, 1995. [4] M. Brady. A fast discrete approximation algorithm for the Radon transform. SIAM J. Sci. Comput., 27(1):107–119, 1998. [5] A. Brandt, J. Mann, M. Brodski, and M. Galun. A fast and accurate multilevel inversion of the Radon transform. SIAM Journ. on Appl. Math., 60(2):437–462, 1999.

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

CPU Time (secondes)

[11]

[12]

[13]

[14] Number of processors Ethernet 100

Ethernet 1000

[15]

Hardware speed up factor

[16]

[17]

[18]

[19]

Ethernet 100

Number of processors Ethernet 1000

Figure 6. (c) CPU Time, (d) hardware speed up factor according to the number of processors

[20]

[21] [22] [23]

[6] S. Carlson, C. Bender, K. Classic, F. Zink, J. Quam, E. Ward, and A. Oberg. Benefit and safety of CT fluoroscopie in interventional radiologic procedures. Radiologiy, 219:515–520, 2001. [7] C. Chaussat, J. Chabbal, T. Ducourant, V. Spinnler, and G. Vieux. New superior detectivity CsI/a-Si 43cm x 43cm xray flat panel detector for general radiography provides immediate direct digital output and easy interfacing to digital radiographic systems. In H. Lemke, editor, CAR, 1998. [8] B. Daly and P. A. Templeton. Real-time CT fluoroscopy: evolution of an interventional tool. Radiology, 211:309–315, 1999. [9] P. Danielsson and M. Ingerhed. Backprojection in O(n2 log2 n) time. In IEEE Medical Imaging Conference 1997, Alberquerque, New Mexico, USA, 1997. [10] L. Desbat. Efficient sampling in 3D tomography: parallel schemes. In P. Grangeat and J. Amans, editors, Three-

[24]

[25]

[26] [27] [28]

[29]

Dimensional Image Reconstruction in Radiology and Nuc lear Medicine, pages 87–100. Kluwer Academic, 1996. P. Edholm and G. Herman. Linograms in image reconstruction from projections. IEEE Trans. Med. Im., 6:301–307, 1987. P. Edholm, G. Herman, and D. Roberts. Image reconstruction from linograms: implementation and evaluation. IEEE Trans. Med. Im., 7(3):239–246, 1988. I. Goddard and M. Trepanier. High-speed cone-beam reconstruction: an embedded systems approach. In Proc. SPIE : Medical Imaging Conference, volume 4681, pages 483–491, San Diego (CA),USA, Feb. 2002. D. Gottlieb, B. Gustafsson, and P. Forss´en. On the direct Fourier method for computer tomography. IEEE Trans. Med. Im., 19(3):223–232, 2000. P. Grangeat, A. Koenig, T. Rodet, and S. Bonnet. Theoretical framework for a dynamic cone-beam reconstruction algorithm based on a dynamic particle model. In 3D-2001, pages 171–174, 2001. P. Grangeat, A. Koenig, T. Rodet, and S. Bonnet. Theoretical framework for a dynamic cone-beam reconstruction algorithm based on a dynamic particle model. Phys. Med. Biol., 47(15):2611–2625, August 2002. K. Katada, H. Anno, and al. Guidance with real-time CT fluoroscopy: early clinical experience. Radiology, 200:851– 856, 1996. K. Kornmesser, B. Sch¨adler, and J. Hesser. Fast feldkampreconstruction for real-time reconstruction using C-armsystems. In Conf. Rec. CARS 2002, pages 430–434, 2002. M. Magnusson. Linogram and other Direct Fourier Methods for tomographic Reconstruction. Thesis no 672, Link¨oping University, 1993. M. Magnusson, P. Danielsson, and P. Edholm. Artefacts and remedies in direct Fourier tomographic reconstruction. In Conf. Rec. 1992 IEEE Med. Imag. Conf., pages 1138–1140, 1992. F. Natterer. The Mathematics of Computerized Tomography. Wiley, 1986. F. Natterer. Sampling in fan beam tomography. SIAM Journ. on Appl. Math., 53:358–380, 1993. T. Rodet. Algorithmes rapides de reconstructions en tomographie par compression de calculs. Application a` la tomofluoroscopie 3D. Mmoire de thse, Institut National Polytechnique de Grenoble, 2002. T. Rodet, P. Grangeat, and L. Desbat. A new computation compression scheme based on a multifrequential approach. In Conf. Rec. 2000 IEEE Med. Imag. Conf., volume 15, pages 267–271, 2000. T. Rodet, P. Grangeat, and L. Desbat. Multichannel algorithm for fast reconstruction. Phys. Med. Biol., 47(15):2659–2671, August 2002. Terarecon, 2002. http://www.terarecon.com. H. Turbell. Cone-Beam reconstruction using filtered backprojection. Thesis no 672, University of Link¨oping, 2001. H. Turbell and P. Danielsson. Fast Feldkamp reconstruction. In 3D-1999, volume 5, pages 311–314, Egmond aan Zee, Holland, 1999. J. Wald´en. Analysis of direct Fourier method for computed tomography. IEEE trans. Med. Im., 19(3):211–222, 2000.

0-7695-1926-1/03/$17.00 (C) 2003 IEEE