Toeplitz embedding for fast iterative regularized imaging

0 downloads 0 Views 605KB Size Report
For large-scale linear inverse problems, a direct matrix-vector multiplication may not be computationally feasible, rendering many .... Second, when the matrix A is ill-conditioned, the matrix inversion becomes sensitive to small .... F ; pointwise ... All operations in Eq. 16 are performed on an element-by-element basis.
Toeplitz embedding for fast iterative regularized imaging Ahmad R.a , Austin C.D.b and Potter L.C.b a Department b Department

of Internal Medicine, The Ohio State University, Columbus OH 43210, USA; of Electrical and Computer Engineering, The Ohio State University, Columbus OH 43210, USA ABSTRACT

For large-scale linear inverse problems, a direct matrix-vector multiplication may not be computationally feasible, rendering many gradient-based iterative algorithms impractical. For applications where data collection can be modeled by Fourier encoding, the resulting Gram matrix possesses a block Toeplitz structure. This special structure can be exploited to replace matrix-vector multiplication with FFTs. In this paper, we identify some of the important applications which can benefit from the block Toeplitz structure of the Gram matrix. Also, for illustration, we have applied this idea to reconstruct 2D simulated images from undersampled non-Cartesian Fourier encoding data using three popular optimization routines, namely, FISTA, SpaRSA, and optimization transfer. Keywords: Toeplitz, Circulant, Tomography, NUFFT, Sparse

1. INTRODUCTION Linear inverse problems are encountered across several disciplines, including economics, statistics, and engineering. There exists a vast literature on developing and testing algorithms for solving inverse problems.1 Despite the progress, mathematical and algorithmic development for solving inverse problems remains an active area of research. In the last four decades, an exponential increase in computation power has broadened the applicability of well-established algorithms to new areas; and new applications, in turn, have lead to renewed interest in further improving the algorithms in terms of speed, convergence behavior, and the ability of handle a diversity of mathematical constraints that may accompany certain applications. For a basic linear inverse problem, we assume the discrete model y = Ax + ε,

(1)

where y ∈ CM is a known vector of measurements, A ∈ CM ×N is known matrix, x ∈ CN is an unknown vector to be estimated, and ε ∈ CM is additive measurement noise. For example, for spotlight-mode synthetic aperture radar (SAR) imaging, the matrix A represents an operator that computes Fourier transform samples along radial lines in the spatial frequency domain of an unknown complex reflectivity image x;2 the vector y represents the resulting Fourier transform samples of x; and the vector ε represents zero-mean complex Gaussian random vector with covariance σ 2 I. For images in two or more dimensions, the image pixels are lexicographically ordered as a complex vector x. Solving a linear inverse problem using gradient-based methods invariably involves computing Axn or AHAxn in each iteration, with AH being the adjoint (conjugate transpose) of A and xn being the current estimate of x. For a shift invariant impluse response, the matrix A possesses a block circulant structure, and the product Axn can be computed in O (N log N ) via FFTs. For applications, such as magnetic resonance imaging (MRI), where data collection can be modeled via Fourier encoding, the Gram matrix, AHA, possesses a block Toeplitz structure. By embedding these matrices in larger block circulant matrices, it becomes possible to accelerate the computation of AHAxn via FFTs, which is the subject discussed in this paper. The remainder of the paper is organized as follows: Section 2 presents the link between the regularized least squares solution of a linear problem and maximum a posteriori estimation; Section 3 points to some important Send correspondence to: Lee C. Potter. E-mail: [email protected], Telephone: 1 614 292 7596 Algorithms for Synthetic Aperture Radar Imagery XVIII, edited by Edmund G. Zelnio, Frederick D. Garber, Proc. of SPIE Vol. 8051, 80510E · © 2011 SPIE · CCC code: 0277-786X/11/$18 · doi: 10.1117/12.888952

Proc. of SPIE Vol. 8051 80510E-1 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/26/2015 Terms of Use: http://spiedl.org/terms

applications where data can be accurately modeled by a partial sampling of the Fourier transform of x; Section 4 discusses how the block Toeplitz structure of AHA can be exploited to compute the product, AHAxn , using FFTs; Section 5 briefly discusses three popular image restoration algorithms which benefit from this special structure of AHA and summarizes their performances for a simulation example; and the conclusions are drawn in Section 6.

2. REGULARIZED LEAST SQUARES A classical approach to the linear problem in Eq. 1 is to find the least squares (LS) solution in which the estimator is chosen to minimize the ℓ2 norm of the difference between Ax and y, i.e., ˆ LS = argmin ∥Ax − y∥22 . x

(2)

x

ˆLS = A+y, with A+ being the Moore-Penrose A direct solution to Eq. 2 can be found by inverting A, yielding x + −1 pseudoinverse. For nonsingular square matrix, A y = A y. For M > N , A+y = (AHA)−1AHy generates the least squares solution to an overdetermined problem, while for M < N , A+y = AH (AAH )−1y generates the minimum norm solution to an underdetermined problem. There are some serious limitations associated with this direct approach. First, depending on the nature of x, minimum norm might not be the best criterion for reconstructing x. Second, when the matrix A is ill-conditioned, the matrix inversion becomes sensitive to small ˆ LS . To overcome these challenges, eigenvalues of AHA or AAH and may result in unwanted oscillations in x regularization methods are used to generate stable solutions. The basic idea behind regularization is to tolerate an additional small bias in the estimate of x at the cost of a significant reduction in the estimate’s variance. Regularization is achieved by adding a penalty term to the error term of Eq. 2, yielding ˆ R = argmin ∥Ax − y∥22 + λ∥Ψx∥κκ , x

(3)

x

where ∥.∥κ denotes the ℓκ -norm, Ψ is an operator generally selected to penalize the roughness or the size of estimate, and λ is a nonnegative real parameter that controls the trade off between the error term and the regularization term. Common choices for κ are 1 and 2 while common choices for Ψ are identity matrix, discrete approximation to the gradient operator, and discrete wavelet transform operator.3 When Ψ is a linear operator and κ = 2, Eq. 3 reduces to Tikhonov regularization. At first glance, the choices for κ and Ψ may appear arbitrary. The selection process, however, can be justified from a Bayesian point of view. Consider an estimate that yields the maximum a posteriori (MAP) probability of x, i.e., ˆMAP = argmax p (x|y) = argmax p (y|x) p (x) /p (y) , x (4) x

x

which, due to the monotonicity of the logarithm, can equivalently be expressed as ˆ MAP = argmin − ln p (y|x) − ln p (x) . x

(5)

x

For independent identically distributed zero-mean Gaussian noise, the likelihood term, p (y|x) has the form p (y|x) ∝ exp{−∥Ax − y∥22 },

(6)

and if the prior probability p (x) is given by p (x) ∝ exp{−λ∥Ψx∥κκ },

(7)

ˆMAP , is also the solution to Eq. 3. the corresponding maximum a posteriori estimate, x For a Gaussian prior, p (x) ∝ exp{−λ∥x∥22 }, for example, the minimizer of Eq. 3 has a closed-form solution given by ( )−1 ˆ R = AHA + λI x AHy, (8) but minimization of Eq. 3 in general does not yield a closed-form solution. Therefore, numerical minimization techniques such as conjugate gradient (CG) method, Newton’s method, or quasi-Newton methods are called ˆR . upon to find x

Proc. of SPIE Vol. 8051 80510E-2 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/26/2015 Terms of Use: http://spiedl.org/terms

3. FOURIER ENCODING There are a number of imaging applications where data can be accurately modeled by a partial sampling of the Fourier transform of the unknown image x. The image may reside in two or three spatial dimensions and may be complex, and the data sampling grid may possibly be non-Cartesian. SAR imaging, magnetic resonance imaging, paramagnetic resonance imaging, and positron emission tomography are some of the important examples which obey this mode of data collection. Let x e be a d-dimensional image expressed in terms of unknown coefficients x = [xk ]k and a known basis function b, i.e., N ∑ x e (⃗r) = xk bk (⃗r) , (9) k=1

where ⃗r represents spatial coordinates in d dimensions and bk (⃗r) = b (⃗r − ⃗rk ) denotes translates of b, which is usually a narrow support function such as rect function for square pixels. Data y, which are the Fourier coefficients of x e computed at known locations, w ⃗ i , can be expressed as ∫ yi = x e (⃗r) e−j w⃗ i ·⃗r d⃗r, i = 1, . . . , M =

Rd N ∑

∫ xk

Rd

k=1

=

N ∑

bk (⃗r) e−j w⃗ i ·⃗r d⃗r

(10)

xk aik ,

k=1

where the second equality uses Eq. 9, and aik =



b Rd k

(⃗r) e−j w⃗ i ·⃗r d⃗r represents elements of A = [aik ]i,k .

If B (w) ⃗ is the d-dimensional Fourier transform of b (⃗r), using the shift property of the Fourier transform, we can write aik = B (w) ⃗ e−j w⃗ i ·r⃗k , (11) which in the form of matrices can be written as A = BF,

(12)

where B ∈ CM ×M is a diagonal matrix such that B = diag{B (w ⃗ i )}, i = 1, . . . , M ; and the Fourier encoding matrix F ∈ CM ×N has elements Fik = e−j w⃗ i ·r⃗k . Now the image restoration problem is simplified to estimating x in the linear model of Eq. 1 with A given by Eq. 12.

4. TOEPLITZ EMBEDDING The main computational cost of all gradient-based iterative methods is in multiplying AHA with the current estimate, xn , in each iteration. A direct matrix-vector multiplication to compute AHAxn requires O (2N M ) operations, which can be computationally expensive for large problems. Also, storage of A for large problems is problematic. On the fly computation of A can alleviate the storage issues but only at the cost of additional computation. It is well-known that for Fourier encoding problems, discussed in Section 3, the resulting Gram matrix, AHA ∈ CN ×N , possesses a block Toeplitz structure.4 Such matrices can be embedded in larger 2d N × 2d N block circulant matrices which are diagonalized by discrete Fourier matrices.5 For illustration, let d = 1, N = 3, and T = AHA be the 3 × 3 Toeplitz matrix given by   t11 t12 t13 T =  t21 t11 t12  . t31 t21 t11

Proc. of SPIE Vol. 8051 80510E-3 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/26/2015 Terms of Use: http://spiedl.org/terms

For convenience, we define a 3 × 3 matrix V by 

γ V =  t13 t12

 t21 t31  , γ

t31 γ t13

where γ is an arbitrary number. If we define a larger 6 × 6 ] [ T V C = V T  t11 t12 t13  t21 t11 t12   t31 t21 t11 =   γ t31 t21   t13 γ t31 t12 t13 γ then it is easy to see that

[ C

x 0

]

[ =

circulant matrix C by (13) t31 γ t13 t12 t11 t21

γ t13 t12 t11 t21 t31 Tx Vx

t21 t31 γ t13 t12 t11

    ,   

] ,

(14)

which, after cropping, yields Tx. For Fourier encoding, the first column of the circulant (block circulant for d > 1) matrix can be computed by calling the nonuniform FFT (NUFFT) and its adjoint.4 NUFFT is an efficient way to compute Fourier coefficients at nonuniform locations, given by w ⃗ i , i = 1, . . . , M .6, 7 Primary operations in each NUFFT or its adjoint are oversampled FFT and frequency-domain interpolation between Cartesian and non-Cartesian grids. The computation cost for NUFFT depends on M , N , and the width of the interpolation kernel that controls the accuracy. ( ( Once the ( first )))column of C has been constructed and rearranged to form a d-dimensional array, its FFT O 2d N log 2d N generates a d-dimensional discrete Fourier operator, G. Now multiplication of AHA with the current estimate, xn , is equivalent to: zero-padding xn (equally along all dimensions) to the size of 2d N ; taking d-dimensional FFT of zero-padded xn , denoted by xnF ; pointwise multiplying xnF with G; taking inverse d-dimensional FFT of the product; and cropping the results (equally in all dimensions) back to the size of N . Although NUFFTs can be used instead of the Toeplitz appraoch to compute AHAxn in each iteration, the additional frequency domain interpolation makes NUFFT an unattractive option. Figure 1 compares the speeds of the two appraoches for various ratios of N and M . 1

10

CPU time (seconds)

Toeplitz NUFFT

0

10

−1

10

−2

10

0

10

20

30

40

50

60

70

N/M

Figure 1. Times to compute AHAxn with NUFFT and the Toeplitz embedding approach. The width of the NUFFT interpolation kernel is 6, N = 2562 , and N/M = 2i , i = −1, . . . , 6. The computation times are independent of the values of xn or the sampling pattern. Due to the additional frequency domain interpolation step, the computation time for NUFFT depends on M .

Proc. of SPIE Vol. 8051 80510E-4 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/26/2015 Terms of Use: http://spiedl.org/terms

In addition to G, the product, AHy, which is also used in each iteration, can be precomputed using the NUFFT adjoint. Since both AHy and G are computed only once during the algorithm, the overhead in computation is generally small compared to the overall computational cost of the iterative algorithm.

5. DEMONSTRATION There is a vast literature on gradient-based optimization algorithms for solving regularized least squares (Eq. 3). In this section, we briefly discuss three popular methods, out of several available, to reconstruct images from undersampled non-Cartesian Fourier encoding data. These methods were selected because of their ability to handle complex images, nonsmooth penalty terms, and non-Cartesian sampling; these constraints are frequently encountered in SAR and MRI applications.

5.1 Selected Methods for ℓ2 -ℓ1 Minimization Specifically, we will solve Eq. 3 for Ψ = I and κ = 1, which is a commonly encountered ℓ2 -ℓ1 optimization problem. See Fu et al.8 and references within for more details. Three algorithms for solving 3–FISTA, SpaRSA, and optimization transfer–are briefly discussed below. 5.1.1 FISTA FISTA stands for fast iterative shrinkage-thresholding algorithm.9 It is a variation of the iterative soft threshold algorithm,10 and is used to solve optimization problems with a smooth error term and a nonsmooth regularization function. The FISTA iteration to solve the ℓ2 -ℓ1 problem is given by ( ) xn = S(zn − µ AHAzn − AHy , λµ/2) ) ( √ (15) an+1 = 1 + 1 + (an )2 /2 an − 1 n (x − xn−1 ), an+1 where µ is the user-defined step size and S, the soft threshold function, is defined by zn+1

= xn +

S(x, λ) = max{|x| − λ, 0}x/|x|.

(16)

All operations in Eq. 16 are performed on an element-by-element basis. 5.1.2 SpaRSA SpaRSA stands for sparse reconstruction by separable approximation.13 Like FISTA, it is closely related to iterative shrinkage-thresholding. This algorithm is capable to handling a smooth error term and a nonsmooth regularizer function. The SpaRSA framework is defined by the following psuedo-algorithm. Initialize x0 , x1 , η > 1, αmin , and αmax Repeat n=1 ( ) αn = ∥A xn − xn−1 ∥22 /∥xn − xn−1 ∥22 αn ∈ {αmin , αmax } Repeat

( ) un = xn − 2/αn AHAx − AHy n+1

n

n

x = S (u , λ/α ) n+1 α = ηαn n=n+1 Until xn+1 satisfies an acceptance criterion x0 = xn x1 = xn+1 Until stopping criterion satisfied

Proc. of SPIE Vol. 8051 80510E-5 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/26/2015 Terms of Use: http://spiedl.org/terms

(17)

where S is defined in Eq. 16. The different flavors of SpaRSA rely on different design choices for α and acceptance criterion to end the inner loop of the pseudo-algorithm. 5.1.3 Optimization Transfer In optimization transfer (OT), also called majorization minimization, the objective function is majorized by a surrogate function that is easier to minimize. The surrogate function satisfies certain properties that ensure that its minimization converges–monotonically for a quadratic surrogate function–to a local minima of the original objective function.11 The OT framework can be used to solve Eq. 3 for any 0 < κ ≤ 2. The OT iteration12 is given by ( )−1 λ H n+1 n x = argmin A A + D(x ) AHy, (18) 2 x ( ) where D(xn ) = diag κ|xni |κ−2 , i = 1, . . . , N . This algorithms intrinsically involves a nested loop. The outer loop is on the index n, and the inner loop is a CG method to solve the inverse matrix in the iterate. For 1 ≤ κ < 2, the objective function is convex, and thus Eq. 18 converges to the global minimum.

5.2 Simulations We applied the three algorithms discussed in Section 5.1 to a 512 × 512 simulated phantom. The phantom was generated to mimic a sparse number of point scatterers typically seen in SAR images. Each point in the image was represented by an intensity and a phase. The data y was simulated by Fourier encoding of the phantom using a fast Gaussian gridding-based NUFFT routine.14 The non-Cartesian samples residing along radial lines were distributed with uniform angular density. Each spoke of the radial sampling consisted of 1024 samples with a total of 32 radial lines as shown in Fig. 2.

. I I.

Figure 2. A 512 × 512 digital phantom (left) with ten randomly generated scatterer locations of varying size. The intensity (shown) and phase at each point were selected randomly. A non-Cartesian radial grid (right) with a total of 32 radial lines, each with 1024 samples, shows the location of Fourier encoding data.

We empirically determined that λ = 5 × 10−1 was approximately the optimal choice in terms of normalized ˆ∥22 /N . The rest of the tuning parameters, for each of the three mean-square-error (NMSE) defined by ∥x − x algorithms, were tested in various combinations, and the combination with the fastest convergence is presented here. For SpaRSA, due to a large number of variants and tuning parameters, it was not possible to try all parameter values across wide ranges. Therefore, the SpaRSA performance shown here may have room for improvement. All three algorithms were terminated when the normalized difference between two successive estimates of x was below 1 × 10−5 . After the termination, all three estimates were visually indistinguishable from the original image for the noiseless case. The quantitative results, from both noiseless and noisy data, are shown in Fig. 3. From the simulations presented here and other examples not shown here, the speed of OT was consistently superior to those to the other two methods. The difference was more dramatic for highly underdetermined problems. Using the OT method, we were able to reconstruct 2048 × 2048 2D images as well as 128 × 128 × 128

Proc. of SPIE Vol. 8051 80510E-6 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/26/2015 Terms of Use: http://spiedl.org/terms

6

6

10

10 OT FISTA SpaRSA

OT FISTA SpaRSA

5

5

10

objective function

objective function

10

4

10

3

4

10

3

10

10

2

10 0 10

2

1

2

10

10 0 10

3

10

10

1

10

CPU time (seconds)

2

3

10

10

CPU time (seconds)

−2

−2

10

10

OT FISTA SpaRSA

OT FISTA SpaRSA

−3

10

−3

10 −4

NMSE

NMSE

10

−5

10

−4

10

−6

10

−5

10 −7

10

−8

10

−6

0

10

1

2

10

10

3

10

10

0

1

10

10

CPU time (seconds)

2

10

3

10

CPU time (seconds)

Figure 3. Objective function value and normalized mean-squared error (NMSE) versus CPU time for all three optimization methods used to restore simulated image shown in Fig. 2. The first column corresponds to noiseless data while the second column corresponds to noisy data. For noisy data, σ = max |yi |/200, with yi for i = 1, . . . , M being the noiseless measurements.

3D images in less than ten mintues except for highly underdetermined (N/M ≥ 8) cases where the convergence took considerably longer. Figure 4 shows the computation times for OT algorithm for various combinations of N and M . All reconstructions were carried out in Matlab on a 2.8 GHz Intel Core i5 processor equipped with 4 GB RAM. 1800 N/M = 1 N/M = 2 N/M = 4 N/M = 8

1600

CPU time (seconds)

1400 1200 1000 800 600 400 200 0

0

500

1000

1500

2000

2500

N

Figure 4. Computation times for various values of N and M for the OT method. Each point on the plot is an average over five trial runs, each with a different randomly generated sparse phantom. All simulation were conducted using radial sampling and σ = 0.

6. CONCLUSION The Gram matrix for applications involving Fourier encoding possesses a block-Toeplitz structure. This special structure of AHA allows an efficient computation of the matrix-vector product, AHAxn , via FFTs, making the

Proc. of SPIE Vol. 8051 80510E-7 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/26/2015 Terms of Use: http://spiedl.org/terms

gradient-based iterative methods tractable for large-scale problems. For illustration, we carried out 2D imaging of a sparse digital phantom using three common algorithms, namely FISTA, SpaRSA and optimization transfer. The optimization transfer-based algorithm was the best of the three and was able to reconstruct images as large as 2048 × 2048 in less than ten minutes using ordinary computing resources.

ACKNOWLEDGMENTS This work was supported by the Air Force Office of Scientific Research under grant FA9550-06-1-0324.

REFERENCES [1] Ramm, A. G., [Inverse Problems: Mathematical and Analytical Techniques with Applications to Engineering], Springer, New York (2010). [2] Munson, D. C., O’Brien, J. D., and Jenkins, W. K., “A tomographic formulation of spotlight-mode synthetic aperture radar,” IEEE Proc. 71, 917–925 (1983). [3] Lustig, M., Donoho, D., and Pauly, J. M., “Sparse mri: The application of compressed sensing for rapid mr imaging,” Magn. Reson. Med. 58, 1182–1195 (2007). [4] Fessler, J. A., Lee, S., Olafsson, V. T., Shi, H. R., and Noll, D. C., “Toeplitz-based iterative image reconstruction for mri with correction for magnetic field inhomogeneity,” IEEE Trans Signal Processing 53, 3393–3402 (2005). [5] Chan, R. H. and Ng, M. K., “Conjugate gradient methods for toeplitz systems,” SIAM Rev. 38, 427–482 (1996). [6] Nguyen, N. and Liu, Q. H., “The regular fourier matrices and nonuniform fast fourier transforms,” SIAM J. Sci. Comput. 21, 283293 (1999). [7] Greengard, L. and Lee, J. Y., “Accelerating the nonuniform fast fourier transform,” SIAM Rev. 46, 443454 (2004). [8] Fu, H., Ng, M. K., Nikolova, M., and Barlow, J. L., “Efficient minimization methods of mixed ℓ2-ℓ1 and ℓ1-ℓ1 norms for image restoration,” SIAM J. Sci. Comput. 27, 1881–1902 (2006). [9] Beck, A. and Teboulle, M., “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Img. Sci. 2, 183–202 (2009). [10] Daubechies, I., Defrise, M., and Mol, C. D., “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Comm. Pure Appl. Math. 57, 14131457 (2004). [11] Hunter, D. R. and Lange, K., “A tutorial on mm algorithms,” Amer. Statistician 58, 30–37 (2004). [12] Kragh, T. J. and Kharbouch, A. A., “Monotonic iterative algorithms for sar reconstruction,” IEEE International Conference on Image Processing - ICIP, 645–648 (Oct 2006). [13] Wright, S. J., Nowak, R. D., and Figueiredo, M. A. T., “Sparse reconstruction by separable approximation,” IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP, 3373–3376 (2008). [14] Ferrara, M., “Implementation of 1D-3D NUFFTs via fast Gaussian gridding,” Personal Communication (2010).

Proc. of SPIE Vol. 8051 80510E-8 Downloaded From: http://proceedings.spiedigitallibrary.org/ on 02/26/2015 Terms of Use: http://spiedl.org/terms