An algorithm for total variation regularization

0 downloads 0 Views 571KB Size Report
Oct 31, 2014 - This paper describes an iterative algorithm for high-dimensional linear ... action' algebraic reconstruction technique (ART) [18] (the Kaczmarz ...
Home

Search

Collections

Journals

About

Contact us

My IOPscience

An algorithm for total variation regularization in high-dimensional linear problems

This content has been downloaded from IOPscience. Please scroll down to see the full text. 2011 Inverse Problems 27 065002 (http://iopscience.iop.org/0266-5611/27/6/065002) View the table of contents for this issue, or go to the journal homepage for more

Download details: IP Address: 157.193.96.239 This content was downloaded on 31/10/2014 at 09:18

Please note that terms and conditions apply.

IOP PUBLISHING

INVERSE PROBLEMS

Inverse Problems 27 (2011) 065002 (16pp)

doi:10.1088/0266-5611/27/6/065002

An algorithm for total variation regularization in high-dimensional linear problems Michel Defrise1 , Christian Vanhove1 and Xuan Liu2 1 Department of Nuclear Medicine, Vrije Universiteit Brussel, Laarbeeklaan 101, B-1090 Brussels, Belgium 2 Skyscan, Kartuizersweg 3B, 2550 Kontich, Belgium

E-mail: [email protected], [email protected] and [email protected]

Received 29 October 2010, in final form 21 March 2011 Published 10 May 2011 Online at stacks.iop.org/IP/27/065002 Abstract This paper describes an iterative algorithm for high-dimensional linear inverse problems, which is regularized by a differentiable discrete approximation of the total variation (TV) penalty. The algorithm is an interlaced iterative method based on optimization transfer with a separable quadratic surrogate for the TV penalty. The surrogate cost function is optimized using the block iterative regularized algebraic reconstruction technique (RSART). A proof of convergence is given and convergence is illustrated by numerical experiments with simulated parallel-beam computerized tomography (CT) data. The proposed method provides a block-iterative and convergent, hence efficient and reliable, algorithm to investigate the effects of TV regularization in applications such as CT. (Some figures in this article are in colour only in the electronic version)

1. Introduction We consider in this paper image reconstruction from CT data, with total variation (TV) regularization. The use of TV to regularize image denoising problems has been proposed in [1] and extended to inverse problems such as image deblurring in optics. TV regularization is not only effective for piece-wise smooth, cartoon-like, images but has also been proposed for medical imaging applications such as the reconstruction from CT data with low signal-tonoise ratio, or with incomplete, irregular or sparse sampling. Promising results have motivated several studies on the benefits and limitations of TV regularization for these applications [2–11]. We refer to [12] for a more general discussion of the role of iterative reconstruction in CT, and to [7, 13] for enlightening discussions on potential pitfalls in using TV and other edge preserving regularizations. The goal of this paper is to derive an efficient block-iterative algorithm for CT reconstruction with TV regularization, to prove its convergence to a regularized solution, 0266-5611/11/065002+16$33.00 © 2011 IOP Publishing Ltd Printed in the UK & the USA

1

Inverse Problems 27 (2011) 065002

M Defrise et al

and to verify the convergence on simulated data sets. The paper makes no claim on the relevance of TV regularization for specific CT applications. A variety of optimization algorithms have been proposed for inverse problems with TV regularization, which define the problem either as penalized likelihood estimation [3, 8, 14, 15], or as a TV minimization [4, 6] or TV superiorization [16, 17] under the constraint of fitting the data exactly or within a prescribed error margin. In the former case, even for low dose CT, the photon flux is sufficient to assume a Gaussian distribution of the measurement noise and therefore a Gaussian likelihood is usually considered, leading to a penalized least squares problem. Due to the large size of 3D CT data a majority of previous studies on iterative CT reconstruction considered algorithms which sequentially process sufficiently small subsets of the data, updating the solution once after processing each subset. Examples include the ‘rowaction’ algebraic reconstruction technique (ART) [18] (the Kaczmarz method) but also the simultaneous algebraic reconstruction technique (SART) [19, 20], which in cone-beam CT is typically implemented by assigning one projection to each data subset. An alternative is to use coordinate-descent methods such as the Gauss–Seidel algorithm, which updates sequentially individual image voxels [8, 21]. A TV regularized version of ART has been introduced by Sidky and Pan [4, 6]. This algorithm, the adaptive steepest descent projection on orthogonal convex subsets (ASDPOCS), interleaves ART iterations with gradient descent steps for the TV penalty, aiming at the solution of the constrained minimization problem. Although convergence properties are unclear, the algorithm takes advantage of the numerical efficiency of ART and has been validated on simulated and real data. Related approaches include TV superiorization based on a block-iterative orthogonal projection algorithm [16, 17] and the algorithm proposed by Jia et al [22], which interlaces SART iterations with iterative TV denoising. This paper proposes an alternative algorithm for TV-penalized weighted least squares estimation. Similar to ASD-POCS, this algorithm is based on an interlaced iteration and is a simple combination of two ingredients. The first one is an external iteration, which uses optimization transfer [23] with a separable quadratic TV surrogate similar to the surrogate proposed in [14]. The second ingredient is the quadratically regularized fixed-block iterative algorithm of Eggermont et al [20], which is used here as an internal iteration to minimize the surrogate cost function. We use the name regularized SART (RSART) for this algorithm, and TV-RSART for the combined interlaced method. To our knowledge, TV-RSART is the first algorithm for TV-penalized weighted least squares estimation which is block-iterative and has proven convergence in the limit where the number of RSART iterations is infinite (recall that the non-regularized SART is not convergent unless the data are consistent or strong under-relaxation is applied). After defining the problem in section 2, we describe the optimization transfer algorithm in section 3, with the separable surrogate of the TV penalty. This is applied in section 4 to derive TV-RSART. Section 5 briefly describes an alternative algorithm used in the results section for comparison purposes; the TV regularized image space reconstruction algorithm (ISRA) [24–26]. Numerical results with simulated data are presented in section 5. 2. Problem definition We consider a linear system y = Af , with a matrix A : RN → RMP . This system models the reconstruction of an image f ∈ RN discretized on a grid of N voxels from a set of M projections measured on a detector with P pixels. The data can be written as a vector of M projections as y = (y1 , y2 , . . . , yM ), with yj ∈ RP for j = 1, . . . , M. In CT applications the 2

Inverse Problems 27 (2011) 065002

M Defrise et al

index j defines the position of the source-detector assembly as it moves around the object during data acquisition. Using the symbol p to index the detector pixels we note the components of the j -th projection as yj,p . Similarly the system matrix A is the concatenation of M projections matrices Aj : RN → RP , for j = 1, . . . , M:  t A = At1 At2 · · · AtM . (1) In CT the data yj are the linearized data and the matrix element A(j,p),i are usually modeled as the length of intersection of ray p of projection j with voxel i though more sophisticated models can be used as long as they are linear, see e.g. [27, 28]. We define a positive vector σ ∈ RM×P , also written as σ = (σ1 , σ2 , . . . , σM ), with 2 > 0 equal to the (estimated) variance of the measurement yj,p . σj ∈ RP , and σj,p Assuming a Gaussian distribution of the measurements, we wish to calculate the penalized weighted least squares solution fβ∗ = arg min (f ) f ∈RN

(2)

which minimizes the cost function    y − Af 2  (3) (f ) =   σ  + βP (f )   where ||g||2 = j =1,M p=1,P |gj,p |2 and the fraction must be understood as a componentwise division. The first term in (3) is the data fidelity term and the second term is the penalty weighted by the regularization parameter β > 0. The penalty is the usual differentiable discrete approximation of the TV penalty, P (f ) =

N 

|∇i f |

(4)

i=1

with the notation

⎧ ⎫1/2 ⎨ ⎬  |∇i f | =  2 + (fi − fj )2 ⎩ ⎭

(5)

j ∈Bi

for the discrete approximation of the magnitude of the gradient of image f at voxel i, calculated using some neighborhood Bi ⊂ {1, . . . , N } of voxel i. Typically in 2D, with i = (ix , iy ), one might take B(0,0) = {(1, 0), (0, 1)}, with two neighbors of (0, 0) giving a finite difference estimate of the image gradient components along the x and y axis at (0, 0). In this paper we use a small value of the parameter  > 0 so that P (f ) is close to the ‘true’ discrete TV penalty. For larger values of , P (f ) becomes similar to other edge-preserving penalties such as the Huber penalty or the penalty proposed by Thibault et al [21]. If  > 0 and if the constant image fi = 1, i = 1, . . . , N does not belong to the null space of A, the cost function (3) is strictly convex and has a unique global minimum. 3. Penalized least squares estimation by optimization transfer We minimize the penalized cost function (3) using the optimization transfer technique [23] (also referred to as majorization–minimization). At each iteration, this technique replaces the minimization of (f ) by the minimization of a substitute, surrogate, function sur (f, fˆ ), which depends on the current solution estimate fˆ and satisfies the two conditions 3

Inverse Problems 27 (2011) 065002

M Defrise et al

(i) (f ) = sur (f, f ) for each f ∈ RN (ii) (f )  sur (f, fˆ ) for each f, fˆ ∈ RN . The aim is to find a surrogate that is easy to minimize and at the same time has a low curvature so as to provide a good approximation of the cost function (f ). As the likelihood term is quadratic, we only need a surrogate for the TV penalty. Oliveira et al [14] propose the following quadratic surrogate for P (f ): 

 N  2 2 + j ∈Bi {(fˆ i − fˆ j )2 + (fi − fj )2 } Oliv (6) Psur (f, fˆ ) = 2|∇i fˆ | i=1 We use instead an alternative surrogate, 

 N   2 + j ∈Bi {(fi − fj )(fˆ i − fˆ j ) + (fˆ j − fj )2 + (fˆ i − fi )2 } Psur (f, fˆ ) = . |∇i fˆ | i=1

(7)

Oliv The rationale for replacing Psur by Psur is that the latter is separable as a sum of terms each depending on a single parameter fi. As will be seen in the next section this property allows for an efficient block-iterative minimization. The price to pay is that Psur has a higher curvature Oliv . than Psur The proof that Psur satisfies the surrogate properties (i) and (ii) is given in appendix A. The separability property is easily verified by noting that (7) can be written as

Psur (f, fˆ ) =

N 

pi (fˆ )(fi − zi (fˆ ))2 + terms independent of f

(8)

i=1

with the curvature



pi (fˆ ) = ⎝ and



1

|∇i fˆ | j ∈Bi

1+

 j ∈B i

⎞ 1 |∇j fˆ |



(9)

⎞  fˆ i + fˆ j  fˆ i + fˆ j ⎠ ⎝ + zi (fˆ ) = 2 pi (fˆ ) |∇i fˆ | j ∈Bi 2 |∇j fˆ | ⎛

1

1

(10)

j ∈B i

In equations (9) and (10), B j denotes the ‘adjoint’ neighborhood of Bj, defined by k ∈ B j iff j ∈ Bk .3 The curvature pi (fˆ ) is (up to a constant factor) a weighted local average of the inverse gradient of fˆ at voxel i, and zi (fˆ ) is a local average of fˆ weighted by the inverse gradient values. We describe in the next section an algorithm for the minimization of the cost function (3), which is based on surrogate (7). 4. A TV regularized SART algorithm The algorithm, TV-RSART, is an interlaced iterative algorithm, with the same external iteration as in [14], except for the replacement of the surrogate (6) by the separable surrogate (7): 0

• Start with an initial estimate fˆ ∈ RN of the solution. For example, in 2D, with i = (ix , iy ) and B(ix ,iy ) = {(ix + 1, iy ), (ix , iy + 1)}, the adjoint neighborhood is B (ix ,iy ) = {(ix − 1, iy ), (ix , iy − 1)}. 3

4

Inverse Problems 27 (2011) 065002

M Defrise et al

• Obtain the (k + 1)-th estimate as k+1 k fˆ = arg min sur (f, fˆ ) f ∈RN

with

   y − Af 2  + βPsur (f, fˆ )  ˆ sur (f, f ) =  σ 

(11)

(12)

One has the following convergence property: Theorem 1. Consider the cost function (3) and assume the following hypothesis: (i)  > 0 and β > 0. (ii) The uniform object u ∈ RN with uj = 1, j = 1, . . . , N does not belong to the null-space of A, i.e. ||Au|| = m ||u|| for some m > 0. (iii) Any pixel j can be linked to pixel 1 via a sequence of Nj pixels kj,1 = 1, kj,2 , . . . , kj,Nj = j such that two successive pixels in the sequence belong to a neighborhood, i.e. kj,l−1 ∈ Bkj,l . Then the iterates defined by recurrence (11) converge to the minimizer fβ∗ of the cost function (3). Oliveira et al [14] note that a convergence proof for problem (11) with surrogate (6) instead of the separable surrogate (7) could be given by following the general scheme of the convergence proof for the EM algorithm. Appendix B explicitates a proof for the case of our separable surrogate, and clarifies the origin of the hypothesis (ii) and (iii). To solve the quadratic minimization problem (11), we apply the regularized fixed-block algorithm [20], which is based on the idea introduced by Herman et al [18, 29] to regularize ART, see also [30]. The derivation for our specific problem made in appendix C uses the convergence theorem in [31] and leads to an internal iteration RSART, which is interlaced within the k-th optimization transfer iteration of TV-RSART. k,s Before defining RSART, additional notations are needed. We denote by fˆ the image estimate at the s-th internal iteration. Define the P × P positive definite diagonal matrices Dj with elements  j = 1, . . . , M; p, p = 1, . . . , P (13) (Dj )p,p = δp,p βσj,p and the N × N positive definite diagonal matrix Wk related to the curvature of the surrogate: 1 k Wi,i ,  = δi,i   k pi (fˆ )

i, i  = 1, . . . , N.

(14)

Note that Wk depend on the external iteration k. Assume we have a set of P × P relaxation matrices Cjk , which satisfy Cjk  Aj (W k )2 Atj + Dj2 .

(15)

The algorithm uses an auxiliary data vector ρ = {ρ1 , . . . , ρM } ∈ RMP . The internal iteration is initialized with k,0 k fˆ i = zi (fˆ )

ρ0 = y

(16) 5

Inverse Problems 27 (2011) 065002

M Defrise et al

and the iterative RSART mapping is  −1  s k,s  ρj − Aj fˆ rjs = Cjk k,s+1 k,s fˆ = fˆ + ω(W k )2 Atj rjs

(17)

ρjs+1 = ρjs − ωDj2 rjs ρjs+1 = ρjs  

j  = j

where projection j = J (s) is processed at iteration s and the relaxation parameter ω ∈ (0, 2). The access order of the projections is defined by a permutation {J (1), J (2), . . . , J (M)} of {1, 2, . . . , M}, extended periodically as J (s) = J (s + M). We use a random permutation in section 6 (see [32, 33] for alternative ordering schemes). k,s k+1 in equation (11). In When s → ∞, the estimate fˆ converges to the minimizer fˆ practice one stops the internal iteration after S steps (corresponding to S/M passes through k+1 k,S = fˆ . A few remarks are in order. the data set) and sets fˆ • The RSART iteration (17) requires keeping a full auxiliary data vector ρ ∈ RMP in memory. Note that the updated component ρJs (s) is reused only at iteration s + M, and therefore the auxiliary vector ρ does not need to be stored or even computed if one makes only M internal RSART iterations (i.e. a single pass through the data set) for each external iteration. • We use a diagonal relaxation matrix Cjk = cjk II , with cjk being an estimate of the largest eigenvalue of Aj (W k )2 Atj + Dj2 . In principle cjk should be calculated for each projection j , but our implementation estimates the largest eigenvalue of Aj (W k )2 Atj + Dj2 for a few projections j using the power method, and then uses the largest of the thus obtained values for all projections j = 1, . . . , M. This calculation must be repeated at each external k iteration since Wk depends on fˆ . k,0 k • The internal iteration (17) starts with the estimate fˆ = z(fˆ ) (see equation (16)). Using equation (4), equation (10) can be rewritten as   ∂P (f ) 1 k k  . (18) zi (fˆ ) = fˆ i −  k  ˆk ∂f ˆ i 2 p (f ) i

f =f

This is one step of a scaled gradient descent for TV minimization. The proposed TVRSART algorithm can thus be seen as alternating such TV gradient descent steps with RSART iterations. This is similar to the ASD-POCS algorithm of Sidky and Pan [6], with the two differences that equation (18) has a well-prescribed step length related to the curvature of the surrogate, and that TV-RSART uses the regularized iteration (17) which guarantees convergence. Another advantage of TV-RSART is that it depends on a smaller number of parameters than ASD-POCS, namely β, ω and the number of RSART iterations per external iterations. 5. A TV regularized ISRA algorithm To validate the numerical results on the convergence of TV-RSART in section 6, we use the ISRA algorithm [24, 25] as an independent method for minimizing the cost function (3). This regularized TV-ISRA algorithm [26] is based on surrogate (7) and is applicable when the elements of the matrix A and of the data y are non-negative, i.e. when Aj,i  0 and yj  0 for 6

Inverse Problems 27 (2011) 065002

M Defrise et al

(a)

(b)

Figure 1. Reconstruction of the thorax phantom from 2D noisy data (M=600 projections). The horizontal axis is the number of passes through the data (external iterations × internal iterations). k Left: the cost function (fˆ ). Right: the RMSE ek. Thick full line: TV-RSART with S = M and f FBP as the initial image. Dot-dashed: TV-RSART with S = 10M and FBP as the initial image. Dashed: TV-ISRA with 10 subsets and the initial image equal to max{f FBP , 0.01 max(f FBP )}. k Thin full line: TV-RSART with S = M and ‘random’ initial image. (a) Cost function (fˆ ) (b) RMSE ek.

j = 1, . . . , MP ; i = 1, . . . , N , where j = (j, p) stands for the pixel p of projection j . The TV-ISRA iteration is

 k k bi + β pi (fˆ ) zi (fˆ ) k+1 k ˆ ˆ fi = fi i = 1, . . . , N (19) k k bk + β fˆ i pi (fˆ ) i

with pi and zi defined by equations (9) and (10), and bi =

MP  1 A y, 2 j,i j σ j j=1

bik =

MP  1 k A (Afˆ )j 2 j,i σ j j=1

(20)

are respectively the weighted backprojection of the measured data and of the data estimate obtained by projecting the current solution estimate. We use an ordered-subset implementation [34, 35], which accelerates initial convergence by a factor of order L, but does not guarantee convergence or monotonic decrease of the cost function. Contrary to TV-RSART, the TV-ISRA algorithm generates a sequence of non-negative images. 6. Numerical experiments The TV-RSART algorithm is motivated by applications in cone-beam CT, where blockiterative (row-action or in our case ‘projection-action’) methods may be a pre-requisite for a practical processing of the very large data sets. In this section, we nevertheless consider a 2D problem, which allows analyzing the convergence and stability by performing a large number of iterations. We stress again that the aim of the experiments is not to evaluate the benefit of TV regularization for specific clinical imaging problems, compared to analytic algorithms such as filtered-backprojection. This is why physical effects such as scatter, cross-talk and, polychromaticity are not considered. We simulate parallel-beam 2D CT data of one slice of the FORBILD thorax phantom4 . This elliptical phantom has external diameters equal to 400 and 200 mm, with a linear attenuation coefficient equal to 0.017 mm−1 in the background and 0.0044 mm−1 in the lungs. The image is 4

See www.imp.uni-erlangen.de/forbild/english/results/index.htm 7

Inverse Problems 27 (2011) 065002

M Defrise et al

(a)

(b)

(c)

(d)

(e)

(f)

Figure 2. Reconstruction of the thorax phantom from 2D noisy data (left: M = 600 projections, right: M = 200 projections). Gray scale [0.0153, 0.0187] mm−1. (a) M = 600, FBP (b) M = 200, FBP (c) M = 600, TV-RSART 1000 iter., (d) M = 200, TV-RSART 1000 iter., (e) M = 600, TV-RSART 40 iter., and (f) M = 200, TV-RSART 40 iter.

digitized on a 600×600 grid with the pixel size of = 0.7 mm. The data consist of M = 600 or M = 200 angular views uniformly spaced on [0, π ), and for each angular sample, of P = 600 parallel lines of response (LORs) with radial sampling . Each LOR is calculated as the average of three analytic line integrals through the phantom, so as to roughly simulate the finite size of the detector. Poisson noise is added corresponding to 3 × 105 incident photons per LOR, the mean number of transmitted photons in the most attenuated LOR being then 773. We present results for an unweighted least squares estimation (σj = 1), and the TV regularization parameter in equation (3) is equal to β = 0.25 and β = 0.08 for M = 600 and M = 200, respectively. These values were selected empirically to provide a good regularization, but were not optimized. The convergence of the algorithm was also verified for β = 2.5 and β = 0.025, though the limit images are then, as expected, too oversmoothed or too noisy (results not shown). Unless otherwise specified we initialize the algorithm with the FBP reconstruction of the data (with a rectangular apodization of the ramp filter) [36]. For TV-RSART the relaxation coefficient is ω = 1. The projector A is discretized with Joseph’s 8

Inverse Problems 27 (2011) 065002

M Defrise et al

(a)

(b)

Figure 3. Reconstruction of the thorax phantom from 2D noisy data (M = 200 projections). The horizontal axis is the number of passes through the data (external iterations × internal iterations). k (a): the cost function (fˆ ). (b): the RMSE ek. Thick full line: TV-RSART with S = M and f FBP as the initial image. Dot-dashed: TV-RSART with S = 10M and f FBP as the initial image. Dashed: TV-ISRA with 10 subsets and the initial image equal to max{f FBP , 0.01 max(f FBP )}. Thin full line: TV-RSART with S = M and ‘random’ initial image.

method [27], and the backprojector At is discretized using the pixel driven method. We report the value of the cost function (3) and of a normalized root mean square error (RMSE) defined by

1/2   k 2  2 fˆ − fexact,i  |fexact,i | (21) ek = i

i∈E

i∈E

where E = {i | fexact,i = 0} is the support of the digital image f exact of the thorax phantom. The computation time (M = 600) was 5.85 s per iteration for TV-RSART, compared to 2.15 s for FBP (3.33 GHz 6-core Intel Xeon, 16 GB memory). One practical disadvantage of TV-RSART is that it requires keeping track of a full data set (ρ ∈ RMP in (17)). Figure 1 compares the convergence of TV-RSART implemented with S = M or S = 10M internal SART iterations (corresponding respectively to 1 or 10 passes through the whole data set) for each external iteration. Using a single pass (S = M) does not seem to affect convergence in this specific case. This observation suggests that in cone-beam CT, TV-RSART could be implemented with a single RSART iteration, with the benefit that the auxiliary variable ρ does not need to be stored. The initial RMSE in figure 1(b) corresponds to the error of the FBP reconstruction used to start the iteration. Therefore, the decrease of ek below e0 represents the improvement achieved by the TV regularization for this particular figure of merit. Figure 2 compares the FBP reconstruction and the TV-RSART reconstruction with 1000 × M iterations, and with 40 × M iterations. The result illustrates the dramatic reduction of the variance achieved by the TV regularization, as described in other works. Even with 40 iterations, well before convergence, a significant variance reduction is obtained. A promising potential application of TV regularization is the reconstruction from coarsely sampled data [3, 4, 7, 10, 11]. Figure 3 illustrates the application of TV-RSART to data with a reduced angular sampling (M = 200 projections). The convergence is similar to that observed with fine sampling (M = 600) in figure 1, and, as expected from theorem 1, convergence is not affected by the fact that the system matrix A now has a large null-space. To verify that the convergence is independent of the initial image, we have also run the algorithm by initializing each voxel with an independent pseudo-random variates with uniform distribution in (0, 1)[14] (called ‘random image’ in the legends). 9

Inverse Problems 27 (2011) 065002

M Defrise et al

Figures 1 and 3 suggest that the limit value of the cost function and of the RMSE are very close to those obtained when using the FBP reconstruction as the initial image. A similar observation holds when the data are reconstructed using TV-ISRA [26] with L = 10 subsets. 7. Conclusions We have considered the inversion of high-dimensional linear problems by minimization of a cost function equal to the sum of a L2 data fidelity term and of a TV penalty. An iterative algorithm, TV-RSART, has been proposed, which interleaves optimization transfer iterations with a block-iterative algorithm. As shown by previous works, block-iterative methods are efficient when dealing with very large data sets in problems such as cone-beam CT. The usual methods such as the widely used SART, however, do not guarantee convergence with nonconsistent data. In contrast, TV-RSART uses a regularized block-iterative algorithm introduced by Eggermont et al [20] and was shown in appendix B to converge to the minimizer of the cost function. In addition TV-RSART is based on optimization transfer and thereby guarantees a good stability because the cost function is non-increasing as iterations proceed. Formally, these attractive properties only hold if an infinite number of SART iterations is performed for each optimization transfer step, but our numerical experiments suggest that the cost function monotonically decreases even with a single SART iteration. This result allows overcoming in practice a weakness of the regularized SART, the fact that it requires keeping track of an auxiliary data set during successive iterations. The convergence of TV-RSART has been illustrated by applying a large number of iterations to 2D simulated data. We have checked that the limiting value of the cost function and of the reconstruction error is very close to the values obtained with an alternative algorithm, TV-ISRA [26] and to the values obtained using a random initial image estimate instead of the FBP reconstruction. Acknowledgments We thank Christine De Mol (Universit´e Libre de Bruxelles) for helpful discussions and suggestions and Fr´ed´eric Noo (University of Utah) for generating the simulated data in section 6. This work was supported in part by the grant G.0569.08 of the Fund for Scientific Research Flanders (FWO). Appendix A. Proof of the surrogate properties Consider the proposed surrogate function (7) for the TV penalty (4). The first property Psur (f, f ) = P (f ) is immediate. For the second, note that

 N  K i Psur (f, fˆ ) − P (f ) =  { 2 + j ∈Bi (fˆ i − fˆ j )2 }1/2 i=1 with Ki = −

⎧ ⎨ ⎩

(fi − fj )2 +  2

j ∈Bi

+ 2 +

 j ∈Bi

10

⎫1/2 ⎧ ⎬ ⎨ ⎭



j ∈Bi

(fˆ i − fˆ j )2 +  2

⎫1/2 ⎬ ⎭

{(fi − fj )(fˆ i − fˆ j ) + (fˆ j − fj )2 + (fˆ i − fi )2 }

Inverse Problems 27 (2011) 065002

M Defrise et al





 1 ⎝ (fi − fj )2 +  2 + (fˆ i − fˆ j )2 +  2 ⎠ 2 j ∈B j ∈Bi i  2 + + {(fi − fj )(fˆ i − fˆ j ) + (fˆ j − fj )2 + (fˆ i − fi )2 }

 −

j ∈Bi

1 ˆ = (f + fˆ j − fi − fj )2  0 2 j ∈B i i

which shows that P (f )  Psur (f, fˆ ), thus concluding the proof that Psur (f, fˆ ) is a valid surrogate of P (f ). Appendix B. Convergence of the optimization transfer algorithm We prove theorem 1. To simplify notations we consider the unweighted least squares case with σj = 1, generalization to non-uniform data variance σj = 1 is straightforward by rescaling matrix A and the data. Surrogate (12) is quadratic and the iterative update (11) can be written explicitly as f k+1 = T (f k ) = (At A + βH k )−1 (At g + βH k zk )

(B.1)

with the N ×N diagonal matrix (H k )i,j = δi,j pi (f k ) and the vector zk ∈ Rk with components zik = zi (f k ) (see (10)). Note that H k = (W k )−2 with Wk the matrix defined in equation (14), we use Hk to simplify notations. From (9), one has pi (f k ) > 0 and therefore, as β > 0, the matrix (At A + βH k ) is non-singular. In addition, with  > 0, Hk and zk are differentiable functions of fk and therefore the mapping f k → f k+1 = T (f k ) is continuous. We first introduce some lemmas. Lemma 1. The sequences (f k ) and sur (f k+1 , f k ) defined by recurrence (11) are nonincreasing. Proof. This is the usual proof for optimization transfer algorithms, see e.g. [23, 37]. Using equation (12) and the two properties Psur (f, f k )  P (f ) and P (f ) = Psur (f, f ) of the penalty surrogate (7), one has (f k+1 )  sur (f k+1 , f k )  sur (f k , f k ) = (f k )  sur (f k , f k−1 )

(B.2)

where the second inequality follows from the fact that f k+1 is the minimizer of sur (f, f k ).  Lemma 2. The sequence fk is uniformly bounded. Proof. Consider any f ∈ RN and define v = f − f1 u (with f 1 equal to the first component of vector f and u the constant vector defined in hypothesis (ii)). For each j = 1, . . . , N ,     N  j  |vj | = |fj − f1 | =  (fkj,l − fkj,l−1 )  l=2  

Nj  l=2

|fkj,l − fkj,l−1 | 

Nj 

|∇kj,l f |  P (f )

(B.3)

l=2

11

Inverse Problems 27 (2011) 065002

M Defrise et al

where we have used definition (5) of the gradient approximation and the fact that two successive fkj,l belong to a neighborhood (see hypothesis (iii)). Therefore,   N N   N (B.4) vi2  |vi |  N P (f )  (f ) ||v|| =  β i=1 i=1 Now, m||f1 u|| = ||A(f1 u)||  ||Af − g|| + ||g|| + ||A|| ||f − f1 u||  ||A|| N  (f ) + ||g|| + (f ) β And since ||f || = ||v + f1 u||  ||v|| + ||f1 u||, we obtain   1  N ||A||N (f ) . ||f ||  (f ) + (f ) + ||g|| + β m β

(B.5)

(B.6)

Since by lemma 1 the iterates satisfy (f k )  (f 0 ), we have that ||f k ||  C with C  independent of k and given by the RHS of (B.6) with f = f 0 . Lemma 3. A fixed point of the iterative mapping (11) is a minimizer of the cost function (3). Proof. With  > 0, both the surrogate sur (f, f k ) and the cost function (f ) are differentiable. The iterate f k+1 satisfies  ∂sur (f, f k )  = 2(At (Af k+1 − g))i + 2βpi (f k )(fik+1 − zi (f k )) = 0 (B.7)  ∂f k+1 f =f

i

for i = 1, . . . , N . If f k = f k+1 = f is a fixed point, this equation becomes, 2(At (Af − g))i + 2β pi (f ) (fi − zi (f )) = 0

(B.8) 

But the LHS of (B.8) is equal to ∂(f )/∂fi and therefore f is a minimizer of . Lemma 4. (f k ) − (f k+1 )  (f k − f k+1 )t (At A + βH k ) (f k − f k+1 )

(B.9)

Proof. From the proof of lemma 1, . (f k ) − (f k+1 )  sur (f k , f k ) − sur (f k+1 , f k ) = k .

(B.10)

Using the definition of the surrogate cost function, equations (8) and (12), the RHS becomes

k = ||Af k − g||2 − ||Af k+1 − g||2 + β

N 

   pi (f k ) fik − fik+1 fik + fik+1 − 2zi (f k ) .

i=1

Multiplying for each i = 1, . . . , N equation (B.7) by yields



fik − fik+1



(B.11) and subtracting from (B.11)

k = ||Af k − g||2 − ||Af k+1 − g||2 N N    k  2   fi − fik+1 At (Af k+1 − g) i + β −2 pi (f k ) fik − fik+1 i=1

= ||A(f k+1 − f k )||2 + β

i=1 N  i=1

12

2  pi (f k ) fik − fik+1

(B.12)

Inverse Problems 27 (2011) 065002

M Defrise et al

which, recalling inequality (B.10) and the definition (H k )i,j = δi,j pi (f k ), concludes the proof.  Lemma 5. lim ||f k − f k+1 || = 0

(B.13)

k→∞

Proof. From lemma 1, the sequence (f k ), n = 1, 2, . . . is positive and non-increasing and therefore it converges and lim ((f k ) − (f k+1 )) = 0.

(B.14)

n→∞

We now show that the matrix At A + βH k in equation (B.1) is uniformly bounded below. From lemma 2, the sequence fk is uniformly bounded with ||f k ||  C for some constant C independent of k. This implies that for each k and each i = 1, . . . , N, fik   C, and from definition (5), one has  |∇i f k |   2 + 4C 2 NB (B.15) where NB < N is defined as the largest number of pixels in the neighborhoods Bi. Inequality (B.15) then leads with (9) to the uniform lower bound 1 . =ξ >0 β pi (f k )  β  2 2  + 4C NB

(B.16)

Therefore, for any vector x ∈ RN , x t (At A + βH k )x  β x t H k x = β

N 

pi (f k )xi2  ξ ||x||2

(B.17)

i=1

Applying this inequality to x = f k − f k+1 and recalling lemma 4, one obtains ||f k − f k+1 ||  ξ −1/2 ((f k ) − (f k+1 ))1/2 . As the sequence (f k ) is converging (see equation (B.14)) this  implies that ||f k − f k+1 || → 0. Proof of theorem 1. From lemma 2 the sequence fk is bounded and hence contains a converging subsequence f k(j ) , j = 1, 2, . . .. Denote by f˜ the limit of this subsequence. By the triangular inequality one has ||T (f˜ ) − f˜ ||  ||T (f˜ ) − T (f k(j ) )|| + ||T (f k(j ) ) − f k(j ) || + ||f k(j ) − f˜ ||.

(B.18)

The three terms in the RHS tend to 0 as j → ∞: • ||T (f˜ ) − T (f k(j ) )|| → 0 because T is continuous and f k(j ) → f˜ , • ||T (f k(j ) ) − f k(j ) || = ||f k(j )+1 − f k(j ) || → 0 owing to lemma 5, • ||f k(j ) − f˜ || → 0 because the subsequence f k(j ) converges to f˜ . Therefore, equation (B.18) implies T (f˜ ) = f˜ , so f˜ is a fixed point of mapping (11) and by lemma 3, a minimizer of the cost function. This minimizer is unique and therefore each subsequence of the sequence of iterates converges to this minimizer.  13

Inverse Problems 27 (2011) 065002

M Defrise et al

Appendix C. Derivation of RSART We give for completeness the derivation of the regularized SART iteration (17), which essentially transposes the derivation in [18] to the present problem. We consider a fixed ˆk external iteration k of TV-RSART and omit the index k, thus denoting zi = zi (f ) as defined k by (10), and W for the matrix with elements Wi,j = δi,j / pi (fˆ ) > 0 in (14). With these notations the minimization problem (11) to be solved at the k-th external iteration becomes     y − Af 2 −1 2   ˆ f = arg min  + β ||W (f − z)|| . (C.1) σ  f ∈RN

The idea of regularized SART is to cast (C.1) as the problem of finding the minimum norm solution of an underdetermined system of linear equations. Define the vector h = W −1 (f − z) ∈ RN and a vector n ∈ R

M×P

(C.2)

with components

yj,p − (Af )j,p yj,p − (Az)j,p − (AW h)j,p = √ √ βσj,p βσj,p j = 1, . . . , M, p = 1, . . . , P . nj,p =

(C.3)

One can interpret nj,p as a scaled data fit error. With these definitions the cost function in the RHS of (C.1) is equal to β(||n||2 + ||h||2 ). Therefore, calculating fˆ is equivalent to finding the minimum-norm solution of the system of MP equations for the N + MP unknowns (h, n), defined by (C.3) and written in matrix form as   h y − Az = (AW D) (C.4) n with the MP × MP diagonal matrix D with elements  D(j,p),(j  ,p ) = δj,j  δp,p βσj,p

(C.5)

−1

The system (C.4) is consistent since (h, D (y − Az − AW h)) is a solution for any vector h. The minimum norm solution of such a consistent system of linear equations can be obtained using the SART algorithm with a zero initial estimate (h0 , n0 ) = 0. SART is a block-iterative algorithm and a natural choice for CT is to take M data blocks corresponding to the projections y1 , y2 , . . . , yM , with a projection access order defined by the function J (s) as explained in section 4. The SART iteration that maps the current solution estimate (hs , ns ) at iteration s onto the next iterate (hs+1 , ns+1 ) is   hs+1 = hs + ωW t Atj Cj−1 yj − Aj z − Aj W hsj − Dj nsj   ns+1 = nsj + ωDj Cj−1 yj − Aj z − Aj W hsj − Dj nsj (C.6) j s ns+1 j  = nj 

j  = j

with j = J (s) and the P ×P diagonal matrix Dj is the j -th block of D (see equation (13)). With appropriate redefinitions, equation (C.6) is equivalent to equation (2.17) in [20]. As shown in Natterer and Wubbeling ([31], theorem 5.1), algorithm (C.6) initialized with h0 = n0 = 0 converges to the minimum norm solution of (C.4) if 0 < ω < 2 and if the P × P relaxation matrix Cj satisfies Cj  Aj W W t Atj + Dj2 . 14

(C.7)

Inverse Problems 27 (2011) 065002

M Defrise et al

The TV-RSART update (17) is finally obtained by reexpressing the iterative mapping (C.6) in s terms of f . Recalling from (C.2) that the image estimate at the s-th iteration is fˆ = (W hs +z), defining a new variable ρρ s = y − Dns , and reintroducing the dependence on the external iteration k, one obtains equation (17). References [1] Rudin L, Osher S and Fatemi E 1992 Nonlinear total variation based noise removal algorithms Physica D 60 259–68 [2] Panin V Y, Zeng G L and Gullberg G T 1999 Total variation regulated EM Algorithm IEEE Trans. Nucl. Sci. 46 2202–10 [3] Persson M, Bone D and Elmqvist H 2001 Total variation norm for three-dimensional iterative reconstruction in limited view angle tomography Phys. Med. Biol. 46 853–66 [4] Sidky E Y, Kao C M and Pan X H 2006 Accurate image reconstruction from few-views and limited-angle data in divergent-beam CT J. X-Ray Sci. Technol. 14 119–39 [5] Velikina J, Leng S and Chen G-H 2007 Limited view angle tomographic image reconstruction via total variation minimization Proc. SPIE 6510 651020 [6] Sidky E Y and Pan X H 2008 Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization Phys. Med. Biol. 53 4777–807 [7] Herman G T and Davidi R 2008 Image reconstruction from a small number of projections Inverse Problems 24 045011 [8] Tang J, Nett B E and Chen G-H 2009 Performance comparison between total variation (TV)-based compressed sensing and statistical iterative reconstruction algorithms Phys. Med. Biol. 54 5781–804 [9] Yu H and Wang G 2009 Compressed sensing based interior tomography Phys. Med. Biol. 54 2791–805 [10] Bian J, Wang J, Han X, Sidky E Y, Ye J, Shao L and Pan X H 2010 Reconstruction from sparse data in offset-detector CBCT Proc. First Int. Con. on Image Formation in CT (Salt Lake City) pp 96–100 [11] Ritschl L, Bergner F, Fleischmann C and Kachelriess M 2011 Improved total variation-based CT image reconstruction applied to clinical data Phys. Med. Biol. 56 1545–61 [12] Pan X H, Sidky E Y and Vannier M 2009 Why do commercial CT scanners still employ traditional FBP for image reconstruction? Inverse Problems 25 123009 [13] Kohler T and Proksa R 2009 Proc. 10th Int. Conf. Fully 3D Reconstruction, Beijing pp 263–66 [14] Oliveira J P, Bioucas-Dias J M and Figueiredo M A T 2009 Adaptive total variation image deblurring: a majorization–minimization approach Signal Process 89 1683–93 [15] Beck A and Teboulle M 2009 Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems IEEE Trans. Image Proc. 18 2419–34 [16] Penfold S N, Schulte R W, Censor Y and Rosenfeld A B 2010 Total variation superiorization schemes in proton computed tomography image reconstruction Med. Phys. 37 5887–95 [17] Censor Y, Elfving T, Herman G T and Nikazad T 2008 On diagonally relaxed orthogonal projection methods SIAM J. Sci. Comput. 30 473–504 [18] Herman G T 1980 Image Reconstruction from Projections, the Fundamentals of Computerized Tomography (New York: Academic) [19] Andersen A H and Kak A C 1984 Simultaneous algebraic reconstruction technique (SART): a superior implementation of the ART algorithm Ultrason. Imag. 6 81–941 [20] B. Eggermont P P, Herman G T and Lent A 1981 Iterative algorithms for large partitioned linear systems, with applications to image reconstruction Linear Algebr. Appl. 40 37–67 [21] Thibault J-B, Sauer K D, Bouman C A and Hsieh J 2007 A three-dimensional statistical approach to improved image quality for multislice helical CT Med. Phys. 34 4526–44 [22] Jia X, Lou Y, Lewis J, Li R, Gu X, Men C and Jiang S B 2010 GPU-based cone beam CT reconstruction via total variation regularization Med. Phys. 37 1757 [23] Lange K, Hunter D R and Yang I 2000 Optimization transfer algorithms using surrogate objective functions J. Comput. Graph. Stat. 9 1–59 [24] Daube-Witherspoon M and Muehllehner G 1986 An iterative image space reconstruction algorithm suitable for volume ECT IEEE Trans. Med. Imag. MI-6 61–6 [25] De Pierro A 1993 On the relation between the ISRA and the EM algorithm for positron emission tomography IEEE Trans. Med. Imag. MI-12 328–33 [26] Defrise M, Vanhove C and Liu X 2010 Iterative reconstruction in micro-CT Proc. First Int. Conf. Image Formation in CT (Salt Lake City) pp 82–5 15

Inverse Problems 27 (2011) 065002

M Defrise et al

[27] Joseph P 1983 An improved algorithm for reprojecting rays through pixel images IEEE Trans. Med. Imag. 1 192–6 [28] De Man B and Basu S 2004 Distance-driven projection and backprojection in three dimensions Phys. Med. Biol. 49 2463–75 [29] Herman G T, Lent A and Hurwitz H 1980 A storage-efficient algorithm for finding the regularized solution of a large inconsistent system of equations J. Inst. Math. Appl. 25 361–6 [30] Lu H H-S, Chen C-M and Yang I-H 1998 Cross-reference weighted least squares estimate for positron emission tomography IEEE Trans. Med. Imag. 17 1–8 [31] Natterer F and Wubbeling F 2001 Mathematical Methods in Image Reconstruction (Philadelphia: SIAM) [32] Mueller K, Yagel R and Cornhill J F 1997 The weighted-distance scheme: a globally optimizing projection ordering method for ART IEEE Trans. Med. Imag. 16 223–30 [33] Winkelmann S, Schaeffter T, Koehler T, Eggers H and Doessel O 2007 An optimal radial profile order based on the golden ratio for time-resolved MRI IEEE Trans. Med. Imag. 26 66–76 [34] Hudson H M and Larkin R S 1994 Accelerated image reconstruction using ordered subsets of projection data IEEE Trans. Med. Imag. 13 60–609 [35] Kontaxakis G, Strauss L G and Van Kaick G 1998 Optimized image reconstruction for emission tomography using ordered subsets, median root prior and a web-based interface Conf. Record IEEE Nucl. Sc. Symp. 2 1347–52 [36] Zbijewski W and Beekman F J 2004 Suppression of intensity transition artifacts in statistical x-ray computer tomography reconstruction through Radon inversion initialization Med. Phys. 31 62–9 [37] Daubechies I, Defrise M and De Mol C 2004 An iterative thresholding algorithm for linear inverse problems with a sparsity constraint Commun. Pure Appl. Math. 57 1413–57

16