Optimized Compressed Sensing for Curvelet-based Seismic Data ...

2 downloads 14826 Views 2MB Size Report
merical experiments on synthetic and real seismic data show good ... Based on the CS theory, a curvelet-based data recovery by sparsity-promoting in-.
Optimized Compressed Sensing for Curvelet-based Seismic Data Reconstruction Wen Tang 1 , Jianwei Ma 1 Institute

1∗ ,

Felix J. Herrmann

2

of Seismic Exploration, School of Aerospace, Tsinghua University, Beijing 100084, China

2 Seismic

Laboratory for Imaging and Modeling, Department of Earth and Ocean

Sciences, University of British Columbia, Vancouver, V6T 1Z4, BC, Canada ∗

Corresponding author: [email protected] First draft, December 2008.

Abstract Compressed sensing (CS) or compressive sampling provides a new sampling theory to reduce data acquisition, which says that compressible signals can be exactly reconstructed from highly incomplete sets of measurements. Very recently, the CS has been applied for seismic exploration and started to compact the traditional data acquisition. In this paper, we present an optimized sampling strategy for the CS data acquisition, which leads to better performance by the curvelet sparsity-promoting inversion in comparison with random sampling and jittered sampling scheme. One of motivation is to reduce the mutual coherence between measurement sampling schemes and curvelet sparse transform in the CS framework. The basic idea of our optimization is to directly redistribute the energy in frequency domain making original spectrum easily discriminated from the random noise induced by random undersampling, while offering control on the maximum gap size. Numerical experiments on synthetic and real seismic data show good performances of the proposed optimized CS for seismic data reconstruction. Key words: Compressed sensing, compressive sampling, curvelets, optimized sam1

pling, mutual coherence, seismic data recovery

1

Introduction

Seismic data is often irregularly sampled due to miss traces along spatial coordinates, because of the presence of dead or severely contaminated traces or constraints of topography. Recovery of missed traces is one of important issues in seismic processing. On the other hand, one always expect if we can obtain high-resolution data from highly incomplete measurements, while saving the cost of measurements. Recently, a new mathematical theory named compressed sensing (CS) (see e.g. [3, 5, 6, 7, 8, 15]) makes us possible to solve such a problem. Our traditional measurements are limited by Shannon/Nyquist sampling theorem: the sampling rate must be at least twice the maximum frequency of signal. CS provides a new sampling theory for data acquisition that only obeys sub-Shannon/Nyquist sampling ratio. It says that one can recover a compressible signal from a small set of incomplete measurements that are far fewer measurements than traditional measurement uses. The CS essentially includes two steps: randomly compressed measurement and nonlinear off-line recovery. Based on the CS theory, a curvelet-based data recovery by sparsity-promoting inversion (CRSI) was recently developed by Herrmann and Hennenfent [24]. The authors indicated that regular undersampling rises to well-known periodic aliases that look like original signal components and lead to fail in the sparsity-promoting inversion. But random undersampling renders coherent aliases into harmless incoherent random noise in its amplitude spectrum (as shown in Fig. 1), and the incoherent noise can be removed easily by an iterative denoising method. Unfortunately, random undersampling does not control the size of maximum missing gap. If by any chance the missing gaps are larger than the size of elements of curvelet transform, one can not obtain a favorable recovery. To improve this problem, Hennenfent and Herrmann [23] proposed a so-called jittered undersampling scheme that shares the benefits of random sampling while offering control on the maximum gap size. 2

Offset (m) 100

200

300

Offset (m) 400

100

500

200

300

400

500

0.00

0.00

0.00

0.50

0.50

0.50

0.50

1.00

1.00

1.00

1.00

1.50

1.50

1.50

1.50

Time (s)

Time (s)

0.00

100

200

300

400

500

100

400

500

Frequency (Hz)

(a) Figure 1:

300

Wavenumber (1/m)

Frequency (Hz)

Wavenumber (1/m)

200

(b)

Comparison between different undersampling schemes. (a) Regular four-fold

undersampling along spatial axis and its amplitude spectra. (b) Random four-fold undersampling along spatial axis and its amplitude spectra.

3

Recent results in CS theory show that a sampling scheme that is optimally designed to a transform dictionary [19] or to a certain signal [18] can further improve the recovery accuracy or further reduce the necessary number of samples. Elad [19] proposed to design optimal adjustable measurement matrices for a given sparse transform or dictionary, by using an average measure of mutual-coherence of the effective dictionary. The performances can be improved to some extent in compare with random measurement matrices. Duarte-Carvajalino and Sapiro [18] proposed a learning compressed sensing, which means simultaneous sensing matrix and sparsifying dictionary optimization by leaning to measured signals. In this paper, motivated by Elad [19] and Hennenfent et al. [23], we propose an optimized undersampling scheme that favors sparsity-promoting recovery. But here we fix the dictionary with curvelet transform rather than previous discrete cosine transform. This leads somewhat different algorithm because there are no any explicit expressions in spatial domain for curvelet transform, while such an expression is required in Elad’s method. We also present several experimental results and comparisons between different undersampling schemes. It should be noted that the sampling strategy in fields of seismic petroleum and gas exploration is trace/column sampling, which is different from the sampling strategies in other fields such as medical imaging and digital camera. We emphasize that the proposed method indicates a potential usage for seismic data acquisition, which leads to a new design of the locations of receivers and to reduce the data acquisition and cost. In the remainder of this paper, we give a brief review on CS theory in section 2. The optimized CS is contributed in section 3. In section 4, we describe the iterative curvelet thresholding for recovery of CS from incomplete trace-missed measurement data. We apply our method to seismic data in section 5, and draw a conclusion in section 6.

4

2

Compressed sensing

Compressed sensing asserts that one can recover certain signals from far fewer samples or measurements than traditional methods use. In the following measurements, y = Ax + ²,

(2.1)

the rows of measurement matrix A are much fewer than the columns of A. Here ² denotes measurement noises. Normally, it is an underdetermined ill-posed problem to recover x from its measurements y. To make it possible, CS relies on two tenets, namely, sparsity and incoherence. Sparsity: Let x be seismic data and D a fixed dictionary such as wavelets and curvelets. One can decompose the x into x = Dα where α is the vector of coefficients that represents x on D. A signal or seismic profile is said to be sparse if all but a few entries of α are zero or they can be discarded without much loss of information. Seismic data is known to be sparse at representing on an appropriated basis such as curvelets [22, 24]. Sparse representations provide a priori knowledge for many successful cases involving data recovery and inverse problems. Incoherence: Let R be a sampling matrix. Compressive sampling requires that R and the sparse representation matrix D, be as incoherent as possible. A measurement of coherence between R and D is given by µ(R, D) :=





max

1≤i≤n,1≤j≤N

|hRi , Dj i|

√ µ(R, D) ∈ [1, N ] measures the maximal correlation between both matrix elements. Here i, j denotes the index of column. In particular, random sampling matrix are largely incoherent with most fixed transform basis. The incoherence between the sampling matrices and sparse transform indicates that one can get new information from sampling, which is not already represented by the known dictionary D. That is to say, the measurements are global. Considering y = Rx = RDα be 5

the vector of n measurements of the sparse signal x, with number of non-zero coefficients S = kαk0 ¿ n ¿ N , where there are many more unknowns than observations. Then it turns out that, if the number of measurements n satisfies n ≥ C · µ2 (R, D) · S · logN , the original signal x can be reconstructed exactly from y with overwhelming probability by solving a convex program (P1 ) where ||x||1 =

PN

i=1 |xi |.

min kαkl1

s.t. y = RDα

By solving (P1 ), one seeks the sparsest coefficients among all

possible α satisfying y = RDα. If the solution coincides with α, one can get a perfect reconstruction of the original unknown signal. We emphasize again that the CS includes two steps: random measurement and nonlinear sparse-constraint recovery, which will be addressed in next sections. The measurement matrices should be largely incoherent with the sparse transforms.

3

Curvelet transform

Although applications of wavelets have become increasingly popular in scientific and engineering fields, traditional wavelets perform well only at representing point singularities, since they ignore the geometric properties of structures and do not exploit the regularity of wave fronts and seismic textures. Seismic data basically shows behaviors of C 2 -continuous curves. The main characteristic of seismic wave fronts is their relative smoothness in the direction along the fronts and their oscillatory behavior in the normal direction. Curvelet transform [10, 11] is one of new anisotropic directional wavelet transforms that allow an optimal sparse representation of objects with C 2 singularities. Fig. 2 shows the elements of curvelets in comparison with wavelets. Note that the tensor-product 2D wavelets are not strictly isotropic but have three directional selectivitis, while curvelets have almost arbitrary directional selectivitis. Researchers have shown that seismic data can be sparsely representation by curvelet transform [22, 24, 25]. Unlike wavelets, the system of curvelets is indexed by three parameters: a scale 2−j , j ∈ N0 ; an equispaced sequence of rotation 6

100

100

200

200

300

300

400

400

500

500 100

200

300

400

500

100

200

300

400

500

Figure 2: The elements of wavelets (left) and curvelets on various scales, directions and translations in the spatial domain (right). (j,l)

angles θj,l = 2πl·2−bj/2c , 0 ≤ l ≤ 2bj/2c −1; and a position xk = Rθ−1 (k1 2−j , k2 2−bj/2c )T , j,l    cos θj,l − sin θj,l  (k1 , k2 ) ∈ Z2 , where Rθj,l =   is the 2 × 2 rotation matrix with angle sin θj,l cos θj,l θj,l . The curvelets are defined by (j,l)

ϕj,l,k (ξ) := ϕj (Rθj,l (ξ − ξk

)),

ξ = (ξ1 , ξ2 ) ∈ R2

Let µ = (j, l, k) be the collection of the triple index. The curvelet coefficients of a function f are given by cµ (f ) := hf, ϕµ i. The digital implementations of the curvelet transform [11] can be outlined roughly as three steps: apply 2D FFT, product with frequency windows, and apply 2D inverse FFT for each window. Roughly speaking, in order to conveniently deduce blow, we can rewrite curvelet transform as C = WF F2 where F2 is the 2D Fourier transform matrix and WF denotes the windowing operator followed by 2D inverse fourier transform in each scale and in each direction. The forward and inverse curvelet transform have the same computational cost of O(N 2 log N ) for an N × N data [11]. We refer to [10, 11, 28, 29] for details of the used second-generation curvelet transform,

7

4

Optimized measurement matrices for CS

Random matrices are often taken as measurement matrices because they are incoherent with most sparse transforms. alternatively, Romberg [32] recently presented a new framework named random convolution measurement, i.e., convolution with a random waveform followed by random time domain subsampling. In this paper, we focus on the approach of optimized measurement [19]. An important notion on optimized projection is the mutual coherence of a dictionary. For a fixed dictionary, its mutual coherence is defined as the largest absolute and normalized inner product between different columns in D, i.e., µ(D) =

max

1≤i,j≤k and

|dT i dj | i6=j kdi kkdj k

(4.2)

A different way to understand the mutual coherence is to consider the Gram matrix, G := DT D, computed using the dictionary after normalizing each of its columns. The mutual coherence is the off-diagonal entry gij with the largest magnitude. If x can be constructed by x = Dα (or y = RDα), and the following inequality holds [16, 21] 1 1 kαk0 < (1 + ), 2 µ(D)

(4.3)

then α is the sparsest solution to describe x, i.e., x = Dα (or y = RD α considering the equivalent D0 = RD ), and a basis-pursuit algorithm can be used to recover the α successfully. That is also true for redundant transforms [19]. Hence if the sampling matrix is designed such that µ(RD) is minimal with a dictionary D, the CS works well. Since curvelets do not have any explicit expressions in the spatial domain, computation of the µ(RC H ) is very expensive [26]. But if any subset/submatrix in RC H (take C H as the dictionary of curvelets) is approximately orthogonal (i.e., incoherent), we have CRT RCH α ≈ α with arbitrary vector α. Since the forward and inverse curvelet transform can be roughly written as WF F2 and F2H WFH respectively, where F2 is the discrete 2D Fourier transform 8

matrix, WF is specially designed frequent windowing matrices followed by a 2D fourier transform in each scale and in each direction, and F2H , WFH are their adjoint. We have WF LF2 · f ≈ α where f = WFH α is the spectrum of arbitrary data y corresponding to α and LF2 = F2 RT RF2H . Since F2 · f = F1 · y · F1T , LF2 · y = F1 RT RF1H · y · (F1T )H F1T = LF1 · y. So if LF1 ≈ I , WF LF2 f ≈ F2H W F2 y = CC H α, we get the minimum distance between α and CRH RC H α, or the minimum mutual coherence of RC H . On the other hand, take y as arbitrary seismic data and f = F2 y as its spectrum, so F2 RT RF2H f = F1 RT RF1H f (F1T )H F1T = LF1 f , where LF1 is a circulant matrix. We have the spectrum of undersampled data with zero-filling missing traces, which is the superposition of circulant-shifted versions of original spectrum with a factor corresponding to the off-diagonals of = LF1 . The matrix L = LF1 − γI is known as spectral leakage [35]. When R corresponds to a regular undersampling matrix, the off-diagonals of L have some large entries that there are unwanted coherent aliases. It creates aliases that are sparse as well and therefore result in artifacts in the recovery process. However when R corresponds to a random sampling matrix, the matrix L is a random matrix and the spectral leakage is approximated by additive Gaussian noise, i.e. AH Ax0 = αx0 + n (see Fig. 1). In conclusion, the method for optimizing the sampling matrix is to reduce the average and the maximum of the off-diagonals of LF1 , in order to make original spectrum easily discriminated from the random noise induced by random undersampling. During the optimizing procedure, we control the maximum gap size in each steps, till both the average and the maximum reach a desired value. For instance, for a three-fold undersampling with maximum gap size 4 traces, the proposed algorithm for optimizing the sampling matrix is given as follow: Algorithm (Fig. 3) 1. Initialization: • Initialize R0 : 9

(0)

1. Initialize I0

(0)T

= R0

2. for i = 1, 2, ..., bn/3c

(0)

R0 with n-by-n matrix of zeros. (bkc is the nearest integer less than or equal to k)

In the interval of b3i − 2, 3ic, change one of the entries in the diagonal (i−1)

of I0

(i)

to 1 (I0 ) that makes the average and maximum of the off(i)

(i)

diagonals of LF1 = F1 I0 F1H as small as possible. end 3. R0T R0 = I0 . • Choose a desired value of average M and maximum N . • Compute the average m0 and maximum n0 of off-diagonals of F1 R0T R0 F1H . 2. While mi > M or ni > N • Reducing maximum: 1. Find the position of maximum: e.g. (1, q) 2. Find the biggest component (t, q) of the maximum in RiT Ri F H . In the interval that restriction (t, q) belongs to, find (t0 , q) that is not the component of the maximum but can furthest reduce the maximum. 3. Set RiT Ri (t, t) to 0 and RiT Ri (t0 , t0 ) to 1 • Reducing average: 1. Study the components of the average. Find the biggest component (v, :) of the average. In the interval that (v, :) belongs to, find (v 0 , :) that is not the component of the average but can furthest reduce the average. 2. Set RiT Ri (v, v) to 0 and RiT Ri (v 0 , v 0 ) to 1 • Compute mi+1 and ni+1 . Update Ri+1 . Typically, we need 230 seconds on a computer with a 1.8 GHz Intel Pentium processor and 1G Mbytes memory, to optimal a sampling matrix with 512 × 512 size. Our optimized undersampling scheme reduces the maximum and the average directly, which is more useful than jittered undersampling scheme in real applications. For jittered 10

undersampling scheme, the maximum gap size of is sticked together with the sampling ratio, e.g., the maximum gap size is 4 for a three-fold jittered undersampling scheme. In addition, since the jittered undersampling scheme is dependent on randomization, we can not set the gap size at any place to the constraints induced by the topography, which is of little operation significance. However, with our optimized undersampling scheme, we can initialize the gap size adapted to the constraints induced by topographies and adjust the position of other receivers to reduce the average and the maximum of LF1 .

5

Iterative curvelet thresholding for CS recovery

To recover x in Eq. (2.1), the number of missed traces and their locations and amplitudes are all completely unknown a priori. The priori knowledge that we can use is the sparsity of x. A few recovery algorithms of CS have been recently proposed in the fields of image processing. They are based on, e.g., linear programming [4], gradient projection sparse reconstruction [20], orthogonal matching pursuit [34], Bregman iteration [36, 27], SP GL1 (spectral projected gradient and one-norm) [1], and iterative thresholding [12, 2, 31]. In this paper, we apply an iterative curvelet thresholding for recovery from compressivelysensed data. The motivation to use the curvelet transform for our CS recovery is that seismic profiles consisting of curve-singularity wave fronts are very sparse in the curvelet domain [22]. We define a thresholding function Sτ (f , Ψ) =

X

τ (cµ (f )) ψµ ,

(5.4)

µ

where τ can be taken as, e.g., a soft thresholding function defined by a threshold σp > 0, τ (x) = sgn(x) · |x| − σp if |x| ≥ σp , and τ (x) = 0 if |x| < σp . The soft thresholding sets the small coefficients to zeros and shrinkages the large coefficients. We have the iterative curvelet thresholding xp+1 = Sτ (xp + ΦT (y − Φxp ), Ψ).

(5.5)

Here p denotes the index of iterations. The initial value x0 can be set to zeros. The 11

threshold σp can be taken initially close to the largest curvelet coefficient and decreasing slightly after each iteration. The x can be recovered by stopping the iteration once a given criterion, e.g., kxp+1 − xp k < ε is satisfied. The mathematical properties of iterative thresholding method can be found in [12, 2].

6

Applications to seismic data

In this section, we present four experimental comparisons to show the improvement by our sampling optimization. In each experiment, we also present the recovery results by wavelet-based reconstruction, in which we use the Daubechies ’db4’ wavelets, to outline the advantage of curvelets in the seismic data recovery. We define the signal-to-noise ratio (SNR) as:

P SN R = 10 · log P i∈I

i∈I

u0 (i)2

(u(i) − u0 (i))2

where u0 is the original data, u is the recovered data and I denotes a set of all traces in the data. Recovery error can be defined as Error = ||u − u0 ||2 Since we do not have any priori knowledge on a given seismic data, in the first experiment we consider the average performance of CS before and after the optimization of the sampling matrix. We generate 20 data of size 250 × 250 with 10% non-zero coefficients in each scale and each direction. The non-zero locations were chosen at random. We consider a small measurement noises produced by random numbers with zero-mean and 100 variance Gaussian values. The normalized recovery error was evaluated as a function of measurements before and after the optimization. The initial threshold value was chosen as 0.95cmax where cmax is the largest curvelet coefficient, and decreases with a factor 0.85 in each iteration. The result was shown in Figure 4. It can be seen that the optimized sampling measurement indeed leads to improved performance, especially for the number of measurements between 30% to 50% where the recovery error can be reduced about 10% than jittered sampling. In the second experiment we apply our algorithm to a real seismic data with 60%

12

missing traces. Fig. 5 (a) shows an original data. Fig. 5 (b) shows the off-diagonals entries before and after sampling optimization. Our algorithm reduce the off-diagonals approximately 50%. Fig. 5 (c) and (d) show the restored data by CRSI from threefold jittered undersampling and optimized undersampling, respectively. Fig. 5 (e) and (f) show their recovery errors, i.e., the differences between original and restored data (c) and (d), respectively. The improvement by the optimization can be seen obviously. For comparisons, in Fig. 6 (a) and (b), we display the recovery result and error by using random undersampling measurement without control on maximum gap size. Fig. 6 (c) and (d) are the recovered results and errors by using jittered undersampling measurement and iterative wavelet thresholding reconstruction. Fig. 6 (e) and (f) show those by optimized undersampling measurement and wavelet-based reconstruction. By comparisons between the results in Fig. 5 and Fig. 6, it can be seen clearly that the optimal sampling in measurement step and curvelet-based reconstruction in recovery step indeed improve the performances. In the third experiment we applied our algorithm to synthetic data. Fig. 7 (a) and (b) show data under three-fold jittered undersampling and optimized undersampling. Fig. 7 (c) and (d) show the respective amplitude spectra where the noise in frequency domain was visibly reduced after optimization. Fig. 7 (e) and (f) compare the off-diagonals entries before and after sampling optimization. Fig. 8 (a) and (b) show the restored data by CRSI from jittered undersampling and optimized undersampling corresponding to Fig. 7 (a) and (b), respectively. Fig. 8 (c) and (d) are their recovery errors. In Fig. 9, for comparisons, we display the results from random undersampling without control on maximum gap size, from jittered undersampling by wavelet-based reconstruction, and from optimized undersampling by wavelet-based reconstruction. in the fourth experiment, we applied our algorithm to real seismic data with 60% missing traces. Since the amplitude spectra of real seismic data is not as sparse as the synthetic data, we emphasize on the average’s reduction. Fig. 10 (a) shows the original data. Fig. 9 (b) shows its amplitude spectra. Fig. 10 (c) and (d) are the sampled data

13

by jittered undersampling and optimized undersampling respectively. Fig. 10 (e) and (f) show their recovery by curvelet-based method. Fig. 11 shows the comparisons for different sampling schemes and recovery methods. In the last experiment, we suppose that there are 7-traces missing on seismic data due to a physical restriction of measurements, e.g., a lake. We perform a three-fold optimized sampling as well as a three-fold regular sampling of maximum gap size 7 traces in the seismic data. Fig. 12 (a) and (b) show the synthetic data by regular sampling and optimized sampling respectively. Fig. 12 (c) and (d) are the off-diagonals of LF1 corresponding to the sampling schemes. Fig. 12 (e) and (f) show the recovery by CRSI. Our optimized undersampling can fix the receivers near the lake without generating any aliases, and is more favorable than regular undersampling.

7

Conclusion

In this paper, we proposed an optimized sampling strategy for CRSI to improve the reconstruction accuracy. By reducing the noise in frequency domain, we get a better recovery effect than random sampling and jittered sampling. In addition, different with jittered sampling, the proposed strategy reduce the spectral leakage and control the maximum gap size directly. As a result, we can set the gap size at any place to satisfy the constraints induced by topography. With the optimized sampling, we can reconstruct seismic data from less measurements than those with random sampling. There is big room for real seismic applications of the CS methods in future. Using a speed-up method such as SP GL1 [1] for large-scale sparse reconstruction is our next work. Acknowledgements. J.Ma would like to thank financial support from National Science Foundation of China under Grants No. 40704019 and 40674061, Tsinghua Basic Research Fund (JC2007030), and PetroChina Innovation Fund (060511-1-1).

14

APPENDIX A JITTERED UNDERSAMPLING A practical requirement of wavefield reconstruction with localized sparsifying transform is the control on the maximum gap size. To tackle the problem, Hennenfent and Herrmann [23] proposed a undersampling scheme, coined jittered undersampling, which shares the benefits of random sampling, while offering control on the maximum gap size. A basic idea of jittered undersampling is to regularly decimate the interpolation grid and subsequently perturb the coarse-grid sample points on the fine grid. Consider a undersampling factor γ taken to be odd, i.e. γ = 1, 3, 5, ... and assume that the size N of the interpolation grid is a multiple of γ, so that the number of acquired data points n = N/γ is an integer. For these choices, the jittered-sampled data points are given by y[i] = f [j] f or

i = 1, 2, ..., n

and

j = γ(i − 1) + ²i

where the discrete random variables ²i are integers independently and identically distributed (i.i.d.) according to a uniform distribution on the interval [1, γ]. The above sampling can be adapted for the case that γ is even.

15

References [1] E. van den Berg, M. Friedlander, Probing the Pareto frontier for basis pursuit solutions, SIAM J. Scientific Computing 31 (2), 890-912 (2008). [2] T. Blumensath, M. Davies, Iterative thresholding for sparse approximations, J. Fourier Anal. Appl. 14 (5), 629-654 (2008). [3] E. Cand`es, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory 52, 489-509 (2006). [4] E. Cand`es, T. Tao, Decoding by linear programming, IEEE Trans. Infor. Theory 51, 4203-4215 (2005). [5] E. Cand`es, M. Wakin, An introduction to compressive sampling. IEEE Trans. on Signal Processing 25 (2), 21-30 (2008). [6] E. Cand`es, J. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math. 59, 1207-1223 (2006). [7] E. Cand`es, J. Romberg, Quantitative robust uncertainty principles and optimally sparse decompositions. Found. of Comput. Math. 6 (2), 227-254 (2005). [8] E. Cand`es, T. Tao, Near-optimal signal recovery from random projections: universal encoding strategies. IEEE Trans. Inform. Theory 52, 5406-5425 (2006). [9] E. Cand`es, D. Donoho, Curvelets - a surprisingly effective nonadaptive representation for objects with edges, in Curves and Surface Fitting: Saint-Malo 1999, A. Cohen, C. Rabut, L. L. Schumaker (Eds.), Vanderbilt Univ. Press, Nashville, 105-120 (2000) [10] E. Cand`es, D. Donoho, New tight frames of curvelets and optimal representations of objects with piecewise singularities, Comm. Pure Appl. Math. 57, 219-266 (2004). [11] E. Cand`es, L. Demanet, D. Donoho, L. Ying, Fast discrete curvelet transforms, Multiscale Model. Simul. 5, 861 (2006). [12] I. Daubechies, M. De Friese, C. De Mol, An iterative thresholding algorithm for linear inverse problems wiht a sparsity constriant, Commun. Pure Appl. Math. 57, 1413-1457 (2004). 16

[13] R. De Vore, Deterministic constructions of compressed sensing matrices, J. of Complexity 23, 918-925 (2007). [14] T. Do, T. Tran, L. Gan, Fast compressive sampling with structually random matrices, Proc. ICASSP, (2008). [15] D. Donoho, Compressed sensing, IEEE Trans. Inform. Theory 52 (4), 1289-1306 (2006). [16] D. Donoho, M. Elad, Optimally sparse representation in general(non-orthogonal) dictionaries via l1 minimization, Proc. Nat. Aca. Sci. 100 (5), 2197-2202 (2003). [17] D. Donoho, Y. Tsaig, I. Drori, J. Stark, Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit, Technical report, Stanford Statistics Department (2006). [18] J. Duarte-Carvajalino, G. Sapiro, Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization, IEEE Trans. Signal Processing (2008). [19] M. Elad, Optimized Projections for Compressed-Sensing, IEEE Trans. Signal Processing 55 (12), 5695-5702 (2007). [20] M. Figueiredo, R. Nowak, S. Wright, Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems, IEEE J. Select Topic in Signal Process, 1 (4), 586-597 (2007). [21] R. Gribonval, M. Nielson, Sparse representations in unions of bases, IEEE Trans. Inform. Theory 49 (12), 3320-3325 (2003). [22] G. Hennenfent, F. Herrmann, Seismic denoising with nonuniformly sampled curvelets, Comput. Sci. Eng. 8 (3), 16-25 (2006). [23] G. Hennefent, F.J. Herrmann, Simply denoise: wavefield reconstruction via jittered under-sampling, Geophysics 73 (3), V19CV28 (2008). [24] F. Herrmann, G. Hennenfent, Non-parametric seismic data recovery with curvelet frames, Geophysical J. Int. 173 (1), 233-248 (2008). [25] F. Herrmann, U. Boeniger, D. Verschuur, Nonlinear primary-multiple separation with directional curvelet frames, Geophysical J. Int. 170, 781-799 (2007). 17

[26] E. Lebed, Sparse signal recovery in a transform domain, M.S. Thesis, SLIM, University of British Columbia, Canada, 2008. [27] J. Ma, Compressed sensing by inverse scale space and curvelet thresholding, Appl. Math. Comput. 206, 980-988 (2008). [28] J. Ma, G. Plonka, A review of curvelets and recent applications, IEEE Sig. Proc. Mag, submitted, (2008) [29] J. Ma, G. Plonka, Combined curvelet shrinkage and nonlinear anisotropic diffusion, IEEE Trans. Image Process 16 (9), 2198-2206 (2007). [30] J. Ma, F.-X Le Dimet, Deblurring from highly incomplete measurements for remote sensing, IEEE Trans. Geosci. Remote Sensing, to appear, (2008). [31] G. Peyr´e, Best basis compressed sensing, Proceedings of SSVM07, pp. 80-91 (2007). [32] J. Romberg, Compressive sensing by random convolution, SIAM J. Imaging Sci., submitted, (2008). [33] F. Sebert, L. Ying, Y. Zou, Toeplitz block matrices in compressed sensing and their applications in imaging, Proc. IATB 2008, pp. 47-50, 30-31 (2008). [34] J. Tropp, A. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit, IEEE Trans. Inform. Theory 53 (12), 4655-4666 (2008). [35] S. Xu, Y. Zhang, D. Pham, G. Lambare, Antileakage Fourier transform seismic data regularization, Geophysics 70 (44), 87-95 (2005). [36] W. Yin, S. Osher, D. Goldfarm, J. Darbon, Bregman iterative algorithms for l1 minimization with applications to compressed sensing, SIAM J. Imaging Sci. 1 (1), 143-168 (2008).

18

1    0 

1    0      0        1  

1    0      0        1           0            1  

1    0      0        1           0            1               1                0                  1                     0                      0 

1    0      0        1           0            1               1                0                  1                     0 

1    0      0        1           0            1               1                0 

RT R

In each step, initialize the entries in a interval of 2 in the diagonal of R R to 0 or 1 making both of the maximum and the average as small as possible.

(a) Sampled Not sampled

Reduce the maximum Iteration i: Exchange: can furthest reduce the maximu

the biggest component of the maximum

Iteration i+1: The same for reducing the average (b) +VGTCVKQPK FH

T

RR

+VGTCVKQPK RTR

H

T

R RF

1    0      0        1           0            1               1                0                  1                     0                      0 

H

R RF

F

RTR F

F

TRFH

FR

H

circulant matrix

circulant matrix

The biggest component of the average

The biggest component of the maximum Not the component of the maximum but can furthest reduce the maximum

Not the component of the average but can furthest reduce the average

+VGTCVKQPK 

+VGTCVKQPK  H

F

1    0        1       0          0            1               1                0                  1                     0                      0 

H

F RTRF

average

maximum

RTR

RTRFH

5VWF[VJGEQORQPGPVUQHVJGCXGTCIG

(KPFVJGRQUKVKQPQHVJGOCZKOWO T

FH

1    0      0        1           0            1               1                0                  1                     0                      0 

T

R RF

T

H

RR

FH

RTRFH

1    0        0       1          0            1               1                0                  0                     1                      0 

(c)

(d)

Figure 3: Details of our algorithm. (a) Initialize R0T R0 . (b) Control on the maximum gap size. (c) Reducing the maximum. (d) Reducing the average. 19

Errors vs. Measurements 1

0.9

0.8

jittered sampling

Errors

0.7

0.6

Optimized sampling 0.5

0.4

0.1

0.2

0.3

0.4 0.5 Measurements %

0.6

0.7

0.8

Figure 4: Recovery error as a function of the number of measurements, with jittered undersampling scheme and our optimized sampling scheme.

20

Time (s)

Offtset (m)

(a)

(b)

Time (s)

Offtset (m)

Time (s)

Offtset (m)

(c)

(d)

Time (s)

Offset (m)

Time (s)

Offset (m)

(e)

(f)

Figure 5: Recovery results by different undersampling schemes. (a) Original seismic data. (b) Off-diagonals entries before and after optimization. (c) (d) Restored image by CRSI from jittered undersampling (SNR = 30.8dB) and optimized sampling (SNR = 32.1dB), respectively. (e) (f) Recovery errors by (c) and (d), respectively.

21

Time (s)

Offset (m)

Time (s)

Offset (m)

(a)

(b)

Offset (m)

Time (s)

Time (s)

Offtset (m)

(c)

(d)

Time (s)

Offset (m)

Time (s)

Offset (m)

(e)

(f)

Figure 6: Recovery results by different undersampling schemes. (a) Recovered data using random undersampling without control on maximum gap size (SNR = 23.9dB). (b) Recovery error. (c) (d) Recovered data and error by jittered sampling and wavelet-based reconstruction (SNR = 24.0dB). (e) (f) Recovered data and error by optimized sampling and wavelet-based reconstruction (SNR = 24.2dB).

22

Offset (m) 100

200

Offset (m)

300

400

100

500

200

300

400

500

0.00

0.00

0.50

0.50

0.50

0.50

1.00

1.00

1.00

1.00

1.50

1.50

1.50

1.50

Time (s)

0.00

Time (s)

0.00

100

200

300

400

500

100

200

(a)

300

400

500

(b) Frequency (Hz)

Wavenumber (1/m)

Frequency (Hz)

Wavenumber (1/m)

(c)

(d)

180

180

160

160

140

140

maximum:25.95 average:2.76

maximum:20.36 average:1.84

120

120

100

100

80

80

60

60

40

40

20

20

0

0

100

200

300

400

500

0

600

(e)

0

100

200

300

400

500

600

(f)

Figure 7: Recovery results by different undersampling schemes. (a) (b) Synthetic seismic data under jittered undersampling and optimized undersampling, respectively. (c) (d) The respective amplitude spectra. (e) (f) Off-diagonals entries before and after optimization.

23

Offset (m) 100

200

300

Offset (m) 400

100

500

200

300

400

500

0.00

0.00

0.50

0.50

0.50

0.50

1.00

1.00

1.00

1.00

1.50

1.50

1.50

1.50

Time (s)

0.00

Time (s)

0.00

100

200

300

400

500

100

200

(a) 200

400

500

400

500

(b)

Offset (m) 100

300

300

Offset (m) 400

100

500

200

300

0.00

0.00

0.50

0.50

0.50

0.50

1.00

1.00

1.00

1.00

1.50

1.50

1.50

1.50

Time (s)

0.00

Time (s)

0.00

100

200

300

400

500

100

(c)

200

300

400

500

(d)

Figure 8: Recovery results by different undersampling schemes. (a) (b) The restored image by CRSI from jittered undersampling (SNR=14.8dB) and optimized undersampling (SNR=15.2dB), respectively. (c) (d) their recovery error, respectively.

24

Offset (m) 100

200

300

Offset (m) 400

100

500

200

300

Offset (m) 400

100

500

300

200

400

500

0.00

0.00

0.00

0.50

0.50

0.50

0.50

0.50

0.50

1.00

1.00

1.00

1.00

1.00

1.00

1.50

1.50

1.50

1.50

1.50

1.50

100

200

300

400

Time (s)

0.00

Time (s)

0.00

Time (s)

0.00

500

100

200

(a) 200

400

500

100

200

(b)

Offset (m) 100

300

300

100

500

200

400

500

400

500

(c)

Offset (m) 400

300

300

Offset (m) 400

100

500

200

300

0.00

0.00

0.00

0.50

0.50

0.50

0.50

0.50

0.50

1.00

1.00

1.00

1.00

1.00

1.00

1.50

1.50

1.50

1.50

1.50

100

200

300

400

Time (s)

0.00

Time (s)

0.00

Time (s)

0.00

500

100

200

(d) 200

400

500

1.50

100

200

(e)

Offset (m) 100

300

300

100

500

200

400

500

400

500

(f)

Offset (m) 400

300

300

Offset (m) 400

100

500

200

300

0.00

0.00

0.00

0.50

0.50

0.50

0.50

0.50

0.50

1.00

1.00

1.00

1.00

1.00

1.00

1.50

1.50

1.50

1.50

1.50

100

200

300

(g)

400

500

Time (s)

0.00

Time (s)

0.00

Time (s)

0.00

100

200

300

400

500

1.50

100

(h)

200

300

400

(i)

Figure 9: Comparisons of different sampling schemes and recovery methods. (a) (b) and (c) Data by random undersampling, recovery by CRSI (SNR = 14.4 dB), and error. (d) (e) and (f) Data by three-fold jittered undersampling, recovery by wavelet-based reconstruction (SNR = 11.79dB), and error. (g) (h) and (i) Data by three-fold optimized undersampling, recovered by wavelet-based reconstruction (SNR = 11.76dB), and error.

25

500

Offset (m) 50

Wavenumber (1/m) 250

350

0.50

0.50

1.00

1.00

1.50

1.50

2.00

2.00

50

0.00

150

250

Frequency (Hz)

0.00

50

Time (s)

150

350

(a)

(b)

Offset (m)

Offset (m)

150

250

350

50

350

0.50

0.50

0.50

0.50

1.00

1.00

1.00

1.00

1.50

1.50

1.50

1.50

2.00

2.00

2.00

2.00

150

250

0.00

350

50

(c) 50

150

150

250

350

(d)

Offset (m)

Offset (m) 250

350

50

150

250

350

0.00

0.00

0.50

0.50

0.50

0.50

1.00

1.00

1.00

1.00

1.50

1.50

1.50

1.50

2.00

2.00

2.00

2.00

50

150

(e)

250

350

Time (s)

Time (s)

250

0.00

50

0.00

150

0.00

Time (s)

Time (s)

0.00

0.00

50

150

250

350

(f)

Figure 10: Recovery results by different sampling schemes. (a) Original seismic data. (b) Its amplitude spectra. (c) (d) Data under jittered undersampling and optimized undersampling respectively. (e) (f) The restored image corresponding to (c) (d). The SNR are 26 11.26dB and 11.47dB, respectively.

Offset (m) 50

150

Offset (m) 250

350

50

150

250

350

0.00

0.00

0.50

0.50

0.50

0.50

1.00

1.00

1.00

1.00

1.50

1.50

1.50

1.50

2.00

2.00

2.00

2.00

50

150

250

Time (s)

Time (s)

0.00

350

0.00

50

(a)

150

250

350

(b)

Offset (m)

Time (s)

0.00

50

150

250

350

0.00

0.50

0.50

1.00

1.00

1.50

1.50

2.00

2.00

50

150

250

350

(c) Figure 11: Comparisons of different sampling schemes and recovery methods. (a) Recovery by random undersampling and curvelet method (SNR=11.1dB). (b) Recovery by jittered sampling and wavelet method (SNR=9.2dB). (c) Recovery by optimized sampling and wavelet method (SNR=9.4dB).

27

Offset (m) 100

200

Offset (m)

300

400

500

100

200

300

400

500

0.00

0.00

0.00

0.50

0.50

0.50

0.50

1.00

1.00

1.00

1.00

1.50

1.50

1.50

1.50

Time (s)

Time (s)

0.00

100

200

300

400

500

100

200

300

400

500

maximum gap size 7

(a)

(b)

180

180

160

160

140

140

120

120

100

100

80

80

60

60

40

40

20

20

0

0

100

200

300

400

500

0

600

0

100

200

300

(c) 200

500

600

(d)

Offset (m) 100

300

Offset (m) 400

100

500

200

300

400

500 0.00

0.50

0.50

0.50

0.50

1.00

1.00

1.00

1.00

1.50

1.50

1.50

1.50

Time (s)

0.00

0.00

Time (s)

0.00

400

100

200

300

(e)

400

500

100

200

300

400

500

(f)

Figure 12: Comparison between regular undersampling scheme and our optimized undersampling scheme with fixed gap size 7. (a) Data under regular sampling. (b) Data under optimized sampling. (c)(d) The coefficients corresponding (a)(b). (e) Restored image from (a) (SNR= 14.40dB). (f) Restored image from (b) (SNR=14.77dB). 28