Waveform inversion using a back-propagation algorithm ... - CiteSeerX

6 downloads 128 Views 1MB Size Report
We test our algorithm on synthetic seismic data with two different types of ..... does not recover this boundary well; the edge is weak, and its shape is distorted to ...
GEOPHYSICS, VOL. 74, NO. 3 共MAY-JUNE 2009兲; P. R15–R24, 18 FIGS. 10.1190/1.3112572

Waveform inversion using a back-propagation algorithm and a Huber function norm

Taeyoung Ha1, Wookeen Chung2, and Changsoo Shin2

However, real seismic data present many obstacles to waveform inversion: the absence of low-frequency data, 2D acoustic approximations of 3D real earth wave propagation, source-receiver coupling to the earth, and noise. Noise is perhaps the most important of these issues because ambient background vibrations always contaminate real seismic data. An objective function for waveform inversion that is robust to noise would be valuable. The ᐉ1-norm is more robust to noise than the ᐉ2-norm when outliers are present 共Claerbout and Muir, 1973; Crase et al., 1990; Aster et al., 2004; Tarantola, 2005兲. However, the gradient of the ᐉ1-norm has a singularity when the residuals approach zero. To achieve robustness and stability in waveform inversion, we construct a new objective function based on the Huber function. The Huber function uses the ᐉ1-norm when residuals are large and the ᐉ2-norm when residuals are small 共relative to a predefined threshold兲. By combining the ᐉ1-norm with the ᐉ2-norm in this manner, Crase et al. 共1990兲, Bube and Langan 共1997兲, and Guitton and Symes 共2003兲 show that they can obtain more robust results for several types of noise than by using the ᐉ2-norm alone. Guitton and Symes 共2003兲 use a quasi-Newton method for the inverse seismic problem, implicitly calculating the gradient of a realvalued Huber function. In this study, we perform waveform inversion in the frequency domain and therefore construct a complex Huber function. Specifically, we define two Huber functions: one is a combination of the complex ᐉ1- and ᐉ2-norms, and the other uses only the complex ᐉ2-norm. We also perform a full-waveform inversion to validate the robustness of the Huber function approach. Shin and Min 共2006兲 calculate the gradient efficiently by exploiting the self-adjoint property of the wave equation. To compute the gradient of a Huber objective function, we could also exploit the back-propagation theory of reverse time migration 共Shin and Min, 2006兲. Because the source wavelet is usually unknown, the source wavelet and the velocity model are updated simultaneously 共Shin et al., 2007兲. We test our algorithm on synthetic seismic data with two different types of random noise — coherent noise and band-limited spike noise. Compared with least-squares waveform inversion 共Shin

ABSTRACT Waveform inversion faces difficulties when applied to real seismic data, including the existence of many kinds of noise. The ᐉ1-norm is more robust to noise with outliers than the least-squares method. Nevertheless, the least-squares method is preferred as an objective function in many algorithms because the gradient of the ᐉ1-norm has a singularity when the residual becomes zero. We propose a complex-valued Huber function for frequency-domain waveform inversion that combines the ᐉ2-norm 共for small residuals兲 with the ᐉ1-norm 共for large residuals兲. We also derive a discretized formula for the gradient of the Huber function. Through numerical tests on simple synthetic models and Marmousi data, we find the Huber function is more robust to outliers and coherent noise. We apply our waveform-inversion algorithm to field data taken from the continental shelf under the East Sea in Korea. In this setting, we obtain a velocity model whose synthetic shot profiles are similar to the real seismic data.

INTRODUCTION When low-frequency data are available, and in the absence of computational restrictions, seismic-waveform inversion provides a more detailed subsurface velocity model than traveltime tomography or conventional velocity analysis 共Yilmaz and Claerbout, 1980; Bishop et al., 1985; Bording et al., 1987; Deregowski, 1990; Vidale, 1990; Bednar, 1999兲. Twenty-five years ago, Lailly 共1983兲 and Tarantola 共1984兲 tackled the seismic inversion problem using reverse time migration. Ever since, geoscientists and applied mathematicians 共Pratt et al., 1998; Pratt, 1999; Shin, Jang, et al., 2001; Shin, Yoon, et al., 2001; Ha et al., 2006; Shin and Min, 2006; Ha and Shin, 2007兲 have used similar back-propagation techniques for waveform inversion.

Manuscript received by the Editor 1 August 2007; revised manuscript received 4 September 2008; published online 27 April 2009. 1 National Institute for Mathematical Sciences, Daejeon, Korea. E-mail: [email protected]. 2 Seoul National University, School of Civil, Urban, and Geosystem Engineering, Seoul, Korea. E-mail: [email protected]; [email protected]. © 2009 Society of Exploration Geophysicists. All rights reserved.

R15

Downloaded 03 Aug 2009 to 147.46.136.35. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/

R16

Ha et al.

et al., 2007兲, we obtain improved numerical results for coherent noise and band-limited spike noise. We begin with a brief introduction to the complex Huber function and seismic waveform inversion. Next, we present an efficient method for calculating the gradient of an objective function using the Huber norm. Finally, we demonstrate the robustness of this function by inverting synthetic and real seismic data.

The objective function and its gradient Suppose we have Nr experimental observations, recorded at a subset of nodal points corresponding to receiver locations. We define a discretized model in terms of a parameter vector p. For a wave equation in the frequency domain, we can calculate the model response at each receiver via a finite-element or finite-difference method. In forward modeling for a wave equation in the frequency domain, the discretized matrix equation can be expressed as 共Marfurt, 1984兲

WAVEFORM INVERSION USING THE COMPLEX HUBER FUNCTION

Suˆ ⳱ fˆ ,

Complex Huber function We begin with the objective function suggested by Huber 共1973兲:

M ␧共r兲 ⳱



兩r兩2 , 2␧

兩r兩 ⱕ ␧,

␧ 兩r兩 ⳮ , 兩r兩 ⬎ ␧, 2



M ␧,c共r兲 ⳱



兩r兩22 , 2␧

兩r兩2 ⱕ ␧

␧ 兩r兩1 ⳮ , 兩r兩2 ⬎ ␧ 2



M ␧,2共r兲 ⳱



兩r兩2 ⱕ ␧

␧ 兩r兩2 ⳮ , 兩r兩2 ⬎ ␧ 2



Ns Nr

E共p兲 ⳱ 兺 兺 M ␧,c共rs,r兲,

.

共2兲

.

共3兲

It is very important to choose the threshold. In this paper, we use the threshold suggested by Bube and Nemeth 共2007兲.

共5兲

r

s

We name function 2 the complex Huber function. Because 兩r兩1 is always larger than 兩r兩2, singularities do not occur. Alternatively, we can extend Huber’s function to the complex plane using only 兩r兩2:

兩r兩22 , 2␧

where S is a complex matrix composed of the mass matrix, the stiffness matrix, and the damping matrix. The Fourier-transformed wavefield is uˆ , and fˆ is the Fourier-transformed source vector. Note that S and uˆ are both dependent on the model p. We can define the objective function at each frequency as

共1兲

where ␧ marks the threshold between ᐉ1 and ᐉ2 errors for real values of r 共see Figure 1兲. For complex values r ⳱ a Ⳮ bi, we define two norms: the ᐉ1-norm 兩r兩1 is 兩a兩 Ⳮ 兩b兩, and the ᐉ2-norm 兩r兩2 is 冑a2 Ⳮ b2. We extend the function proposed by Huber over the plane as follows:

共4兲

where rs,r ⳱ uˆs,r ⳮdˆs,r and where Ns and Nr are the number of sources and receivers, respectively. To adjust the model, we need the gradient of the objective function E in equation 5. If 兩r兩2 ⱕ ␧, we can easily obtain this gradient using the method described by Shin and Min 共2006兲. The gradient of the objective function with respect to the kth velocity parameter pk is

⳵ uˆs,r ⳵E ⳱ Re 兺 共uˆs,r ⳮ dˆs,r兲* ⳵ p ⳵ pk k s,r ⳱ Re 兺 s



⳵ uˆs,Nr

⳵ uˆs,1 ¯ ⳵ pk ⳵ pk

册冤

* rs,1

] * rs,N

r



共6兲

,

where ⴱ denotes the complex conjugate. If 兩r兩2 ⬎ ␧, then we have r ⳱ uˆ ⳮdˆ ⳱ rre Ⳮ irim:



⳵E ⳵ uˆs,r * ⳱ Re 兺 r ⳵ pk ⳵ pk s,r,sgn s,r ⳱ Re 兺 s





⳵ uˆs,Nr ⳵ uˆs,1 ¯ ⳵ pk ⳵ pk

册冤

* rs,1,sgn

r*

]

s,Nr,sgn



.

共7兲

The sign function is



1,

if a ⬎ 0



sgn共a兲 ⳱ ⳮ1, if a ⬍ 0 . 0, if a ⳱ 0

共8兲

The elements of the column vector in equation 7 are defined as rs,k,sgn ⳱ sgn共Re共rs,k兲兲 Ⳮ i sgn共Im共rs,k兲兲. The gradient of the objective function can therefore be written succinctly as

冋 册

⳵ uˆ s,r ⳵E ⳱ Re 兺 ⳵ pk ⳵ pk s Figure 1. A sketch of the Huber function.

T

r* ,

where r ⳱ 共rs,1 , . . . , rs,Nr兲 and

Downloaded 03 Aug 2009 to 147.46.136.35. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/

共9兲

Robust seismic waveform inversion



1 共uˆs,k ⳮ dˆs,k兲, 兩rs,k兩2 ⱕ ␧ rs,k ⳱ ␧ 兩rs,k兩2 ⬎ ␧ rs,k,sgn ,



共10兲

for k ⳱ 1 , . . . , Nr. Alternatively, if we replace M ⑀ ,c with M ⑀ ,2 in the objective function in equation 5, the gradient is modified as follows. For 兩r兩2 ⬎ ␧, we have

1 ⳵E ⳵ uˆs,r ⳱ Re 兺 2 共uˆs,r ⳮ dˆs,r兲* 2 冑 ⳵ pk ⳵ p r Ⳮ r k s,r re im ⳱ Re 兺 s



⳵ uˆs,Nr

⳵ uˆs,1 ¯ ⳵ pk ⳵ pk

册冤

* /兩r 兩 rs,1 s,1 2

] * /兩r 兩 rs,N s,Nr 2 r



冋 册

共12兲

where r ⳱ 共rs,1 , . . . , rs,Nr兲. Now,

rs,k ⳱



1 兩rs,k兩2 ⱕ ␧ 共uˆs,k ⳮ dˆs,k兲, ␧ 1 共uˆs,k ⳮ dˆs,k兲, 兩rs,k兩2 ⬎ ␧ 兩rs,k兩2



s

冋 册 ⳵ uˆ s,r ⳵ pk

共13兲

for k ⳱ 1 , . . . , Nr.

Matrix equation 9 can be augmented by adding zero elements:

s

⳵ uˆs,Nr ⳵ uˆs,1 ⳵ uˆs,N ¯ ¯ ⳵ pk ⳵ pk ⳵ pk



冤冥 rs,1 ]

rs,Nr 0

*

,

共14兲

]

0

where N is the number of unknowns in the complex impedance matrix in equation 4. Taking the derivative of equation 4 with respect to pk yields

S

s

s

⳵ uˆ ⳵ uˆ ⳵S ⳱ⳮ uˆ or ⳱ Sⳮ1vk , ⳵ pk ⳵ pk ⳵ pk

共17兲

s

where ˜r ⳱ 共rs,1 . . . rs,Nr0 . . . 0兲T. In equation 17, 关Sⳮ1˜r*兴 is the backpropagated wavefield. We can therefore generate the back-propagated wavefield by treating the residual as a source function in forward modeling. We obtain the gradient by calculating the zero-lag convolution between the virtual source and the back-propagated wavefield. To speed computation, we use the back-propagation algorithm suggested by Pratt et al. 共1998兲 and Pratt 共1999兲.

For each iteration l, the objective function E in equation 5 is minimized by progressing along a modified conjugate gradient dl. Given gl ⳱ ⵜpE, the parameter update is

plⳭ1 ⳱ pl ⳮ ␣ ldl .

共15兲

共18兲

The initial value of the step length ␣ l is the same as the grid interval h of the finite-element mesh. For instance, the mesh might be designed to minimize the frequency dispersion of the numerical model. We reduce the step length ␣ l in each iteration, with the goal of decreasing the rms error. The choice of how to reduce the step length, however, is subjective. Our modified gradient gl** 共see Appendix A兲 is scaled following Shin, Jang, et al. 共2001兲:

冉兺 冉 冉

Back-propagation algorithm



˜r* ⳱ Re 兺 关Sⳮ1vk共␻ 兲兴T˜r*

⳱ Re 兺 vTk 关Sⳮ1兴T˜r* ⳱ Re 兺 vTk 关Sⳮ1˜r*兴,

gl** ⳱ NRM

ⵜ pkE ⳱ Re 兺

T

Conjugate gradient method and inversion flow

T

r* ,

ⵜ pkE ⳱ Re 兺

. 共11兲

The gradient of the objective function is still written as

⳵ uˆ s,r ⳵E ⳱ Re 兺 ⳵ pk ⳵ pk s

R17



NRM Re 兺 gl* s

冊冊冊

,

gl* ⳱ 共diag共vTk vk兲 Ⳮ ␭I兲ⳮ1vTk 关Sⳮ1˜r*兴 ,

共19兲

共20兲

where I is an identity matrix, ␭ is a damping factor, vk is the virtual source vector, gl* is given, and NRM is a normalizing operation. The gradient direction gl at each frequency is normalized by the maximum absolute value of the gradient vector. In other words, we first scale each frequency component using the diagonal elements of the pseudo-Hessian matrix 共Shin, Yoon, et al., 2001兲 and the damping factor 共see Appendix B兲. Next, we normalize the gradient vector at each frequency with its largest absolute value, so that all components lie between ⳮ1 and 1. Finally, we sum over all frequencies and once more normalize the resulting vector. The modified gradient vector gl** is therefore guaranteed to have numerical values between ⳮ1 and 1. Following the approach of Brandsberg-Dahl et al. 共2003兲, we can express the modified conjugate gradient direction 共see Appendix C兲 as

where

⳵S vk ⳱ ⳮ uˆ . ⳵ pk

共16兲

The term vk is referred to as the virtual source vector with respect to the kth model parameter 共Shin and Min, 2006兲. Using this notation, equation 14 becomes

dl ⳱ gl** ⳮ

T gl** gl** T

** g** glⳮ1 lⳮ1

NRM共dlⳮ1兲.

共21兲

In the next section, we present numerical results for synthetic data and real seismic data. Test results for objective functions M ⑀ ,c共r兲 and M ⑀ ,2共r兲 are very similar. To save space, we only refer to M ⑀ ,2共r兲.

Downloaded 03 Aug 2009 to 147.46.136.35. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/

Ha et al.

NUMERICAL RESULTS Synthetic data with spike noise

Synthetic data with coherent noise We now test the proposed method on a data set contaminated with coherent noise, generating synthetic shot profiles from the same simple model. This time, however, we rescale the model to a width typical for field data 共Figure 5兲. Synthetic profiles are created with 299 shots, each received by 100 geophones spaced 15 m apart. The offsets vary from 15 to 1500 m. The sampling interval is 4 ms, and the maximum frequency of the source wavelet is 20 Hz. Figure 6a shows the profile of the hundredth shot without noise. We make an inclined impulsive event whose amplitude is constant

a) 0

1

Distance (km) 2

3

0

4.5 4.0

Depth (km)

To test the robustness of waveform inversion using the Huber function, we chose the simple model shown in Figure 2, which is 2 ⫻ 2 km in the computational domain. The computational model is composed of five layers, and the velocity of each layer is homogeneous. Deeper layers have larger velocities. To generate synthetic seismograms of this model, we use the frequency-domain, finite-element method. We produce 100 shots, spaced 40 m apart. On the surface, 201 collect the shot data. The maximum recording time is 3 s, and the sampling interval is 4 ms. The source wavelet is the first derivative of a Gaussian function, with 9.375-Hz maximum frequency. Figure 3a shows a synthetic seismogram without noise; the source is located at 2 km. Figure 3b shows the same seismogram with six randomly added spikes and 10 missing traces. The amplitudes of the six spikes vary between the maximum signal and half of the maximum signal. They simulate momentary impulses during acquisition. We now compare the results of least-squares waveform inversion and our new method. The initial velocity model is a linear function varying from 1500 m/s at the surface to 4500 m/s at maximum depth: V ⳱ V0 Ⳮ kz, where k is 15 m/s. Figure 4a shows the inverted velocity model obtained by least-squares waveform inversion at the hundredth iteration. Figure 4b shows the model produced by our al-

gorithm. In the source model 共Figure 2兲, the bottom edge of the third layer 共i.e., the small body兲 is inclined. The least-squares method does not recover this boundary well; the edge is weak, and its shape is distorted to an arch. Our algorithm is far from perfect, but it does define an inclined bottom edge. In Figure 2, the top of the fourth layer has two slopes on the left-hand side: first it runs parallel to horizontal, and then it inclines downward. The new objective function does a better job of recovering this change in slope.

3.5 3.0

1

V (km/s)

R18

2.5 2.0 1.5

b)

0

1

2

3

0

4.5

Figure 2. A simple synthetic model with five layers.

a) 0

Time (s)

b)

Distance (km) 0

1

2

3

4

0

0.5

0.5

1.0

1.0

1.5

1.5

2.0

2.0

2.5

2.5

3.5 1

1

2

2.5 2.0

Distance (km) 0

3.0

V (km/s)

Depth (km)

4.0

3

4

Figure 3. A seismogram 共a兲 without noise and 共b兲 with six spikes and 10 missing traces. The source is located at 2 km on the surface.

1.5

Figure 4. Inverted velocity models obtained by 共a兲 the least-squares algorithm and 共b兲 our new algorithm. The initial velocity model is a linearly increasing function. Both images represent the hundredth iteration.

Figure 5. The model of Figure 2, rescaled to a width typical of real data.

Downloaded 03 Aug 2009 to 147.46.136.35. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/

Robust seismic waveform inversion along the offset distance and convolve it with the first derivative of a Gaussian function. To make it consistent with data in terms of amplitude spectrum, we use noise of 20-Hz maximum frequency. To simulate coherent noise, we superimpose the convolved data on the synthetic shot profile 共Figure 6b兲 共Lu et al., 2006兲. We apply both waveform-inversion methods to the data, using 40 frequencies ranging from 0.5 to 20 Hz. The initial velocity model is again a linearly increasing function of depth: V ⳱ V0 Ⳮ kz, where V0 is 1500 m/s and k is 20 m/s. Figure 7a shows the model obtained by least-squares waveform inversion at the hundredth iteration; Figure 7b shows the new algorithm model. Neither inverted velocity model is exactly right because of the coherent noise. Note that the least-squares algorithm does not correctly define the first layer boundary. It also seems to recover a nonexistent layer 共indicated by the arrow兲 between the first and second layers in Figure 7a.

a)

Time (s)

0

1.5

b)

Distance (km) 2.0 2.5

0

0.5

0.5

1.0

1.0

1.5

1.5

Distance (km) 2.0 2.5

1.5

Figure 6. A shot profile 共a兲 without noise and 共b兲 with coherent noise added.

Synthetic data with background noise We perform waveform inversion by our new inversion algorithm for synthetic data 共Versteeg, 1994兲 drawn from the Marmousi model, contaminated by background noise. Figure 8 shows the Marmousi model with a 16-m grid interval. Synthetic data are created using the frequency-domain, finite-element method, with 288 shots and 577 receivers per shot. The source wavelet is a Gaussian first derivative with a maximum frequency of 18.67 Hz. Two different types of random noise were generated. First is the random noise occurring with a uniform distribution. The uniformly distributed random deviates were generated by the intrinsic function of Fortran 90. The other is the random noise occurring with a Gaussian distribution. To generate random deviates with a Gaussian distribution, we use the Box-Muller method 共Box and Muller, 1958兲. The magnitude of the noise term is based on the maximum value of the synthetic data, calculated after excluding the 10 traces nearest to the source. The signal-to-noise ratio is set to five in both cases. By adding these two different types of random noise to our data set, respectively, we make two different data sets. The synthetic data sets are displayed in Figure 9. We performed waveform inversion using 57 discrete frequencies, ranging from 0.33 to 18.67 Hz. The initial velocity model is V ⳱ V0

Figure 8. The Marmousi model.

a)

a)

Distance (km) 0

0

1

2

3

R19

0

4

0

Distance (km) 2 4

b) 0

0

Distance (km) 2 4

4.5

3.5 3.0 2.5

1

0.5

0.5

1.0

1.0

1.5

1.5

2.0

2.0

2.5

2.5

V (km/s)

Depth (km)

4.0

2.0

b)

0

1

2

3

4 4.5 4.0 3.5 3.0

1

2.5

V (km/s)

Depth (km)

0

Time (s)

1.5

2.0 1.5

Figure 7. Coherent noise velocity models obtained by 共a兲 leastsquares waveform inversion and 共b兲 our new inversion algorithm. The initial velocity model is a linear function of depth. Both images represent the hundredth iteration.

Figure 9. A shot profile with random noise occurring 共a兲 with a uniform distribution and 共b兲 with a Gaussian distribution. The signal-tonoise ratio is five in both seismograms.

Downloaded 03 Aug 2009 to 147.46.136.35. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/

R20

Ha et al.

Ⳮ kz, where V0 is 1500 m/s and k is 10 m/s. Figure 10a shows the inverted velocity model by the new method at iteration 200 with uniform random noise, and Figure 10b shows the model with random noise occurring with a Gaussian distribution. The two data sets produce similar results.

Original IFP Marmousi Before applying our new inversion algorithm to real data, which are usually contaminated by noise and lack low-frequency compo-

a)

nents, we performed an inversion of the Institut Français du Pétrole 共IFP兲 Marmousi model 共Versteeg, 1993兲. Because our computational resources are limited, we used low-pass data up to 18.75 Hz, with a 0.32-Hz interval. The grid interval is 16 m, the horizontal distance is 8.67 km, and the depth is 3.04 km. The original Marmousi data set has 240 shots, each with 147 or 148 receivers. Figure 11 shows a seismogram interpolated on our computational grid, where the source is located at a horizontal position of 5120 m. The same initial velocity model is used for the inversion, and we assume that no information on the source is available. We update the amplitude and phase of the source wavelet for every iteration, using the approach described by Shin et al. 共2007兲. Figure 12a shows the velocity model obtained by the least-squares method at iteration 300. Figure 12b shows the velocity model obtained by our new inversion algorithm at the same iteration. Neither algorithm recovers the left-hand region 共0–2 km兲 caused by limitations of the real seismic data. In Figure 12a, we observe a region 共indicated by an arrow兲 where the least-squares algorithm does not recover the correct velocity model. The layers are broken and distorted. In contrast, our method recovers the correct model 共Figure 12b兲.

b) Field data: The continental shelf under the East Sea in Korea

Figure 10. The inverted velocity models obtained from data contaminated with random noise occurring 共a兲 with a uniform distribution and 共b兲 with a Gaussian distribution. The initial velocity model is a linear function of depth. Both velocity models represent iteration 200.

The preceding tests demonstrate that our new algorithm is more robust than least-squares inversion for some types of synthetic data. In this section, we invert a real data set. Real data present many additional problems: the 2D acoustic approximation, source-receiver coupling, grouping of receivers, and missing low-frequency components. Furthermore, classical optimization techniques such as steepest descent are faced with local minima and the possibility of multiple solutions.

a)

b)

Figure 11. An interpolation of the original Marmousi seismogram, used as input for our inversion. The source is located at 5120 m on the surface.

Figure 12. Inverted velocity models obtained using 共a兲 the leastsquares inversion algorithm and 共b兲 our new inversion algorithm. Model 共a兲 is inaccurate at the position indicated. The initial velocity model is a linear function of depth. Both velocity models represent iteration 300.

Downloaded 03 Aug 2009 to 147.46.136.35. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/

Robust seismic waveform inversion Our seismic data set was collected from the continental shelf under the East Sea in Korea. The data were collected by 120 receivers for 235 shots. The receivers were spaced at 25-m intervals, starting 300 m from the source. The maximum recording time was 4 s, and the time interval was 2 ms. We used a 18.75-Hz low-pass filter because of computational resources, inverting the filtered data at 76 frequencies 共0.25-Hz intervals兲. To avoid the effects of elastic wave propagation, which are strongest at large offsets, we only inverted data within 1500 m of the source. The initial velocity model is the same as in previous subsections. The interpolated seismogram at the hundredth shot location is shown in Figure 13. The velocity model obtained after iteration 50 is shown in Figure 14. The inversion is stopped when the rms error 共冑E共pl兲/N, where N is the product of the total frequency, source, and receiver numbers兲 is minimized. Figure 15 shows how the relative rms error 共冑E共pl兲/E共p1兲兲 decreases with iteration number. After iteration 50, the error tends to increase slightly. In Figure 14, a layer boundary is visible in the area indicated. To see how much information our algorithm recovered, we made a synthetic seismogram from the inverted velocity model and compared it to the low-pass-filtered real data. Figure 16 compares two

R21

seismograms obtained in this manner. Figure 16a shows a real seismic trace at 960 m when the source is located at 528 m 共solid line兲 and the trace implied by our inverted velocity model 共dotted line兲. Figure 16b shows a seismic trace obtained at 4800 m when the source was located at 3392 m. Although there are some discrepancies, all traces are compatible. To investigate the demigration effect of our algorithm, we generated prestack-depth-image and common-image gathers 共CIGs兲 in the frequency domain by modifying our inversion program. For our frequency-domain reverse time migration 共RTM兲, we used 37-Hz low-pass-filtered data at 0.25-Hz intervals. Figure 17a shows a prestack depth migration 共PSDM兲 produced from a linearly increasing velocity model 共V ⳱ V0 Ⳮ kz, where V0 is 1500 m/s and k is

Figure 15. The relative rms error as a function of iteration number.

a)

Figure 13. Seismogram obtained from the continental shelf under the East Sea in Korea, interpolated from the hundredth shot.

b)

Figure 14. Velocity model of the real seismogram, obtained at iteration 50 using our new algorithm. The initial velocity model is a linear function of depth. An arrow denotes the visible structure.

Figure 16. Seismic traces obtained at 960 m when the source is located at 528 m. 共b兲 Seismic traces obtained at 4800 m when the source is located at 3392 m. The solid line represents the field seismic data, while the dotted line is derived from the velocity model.

Downloaded 03 Aug 2009 to 147.46.136.35. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/

R22

Ha et al.

10 m/s兲. Figure 17b shows a prestack depth image from the inverted velocity model after 50 iterations. Unfortunately, we see little difference between the two of them. Because we do not have a reliable low-frequency component from 0 to 5 Hz in the real data, we speculate that we could not recover a long-wavelength velocity model, therefore retrieving only a shortwavelength velocity model. Figure 18 shows a CIG at 1.6 km for both velocity models. Figure 18a shows a CIG by using a linearly increasing model, and Figure 18b shows it by using our inverted veloc-

a)

b)

Figure 17. Migrated images obtained by using 共a兲 a linearly increasing model 共1.5–4.5 km/s兲 and 共b兲 the inverted velocity model using our algorithm after 50 iterations.

ity model. From Figure 18, we observe little difference between CIGs, which confirms that we recovered only a short-wavelength velocity model by our algorithm.

CONCLUSION Because the ᐉ1-norm is more robust to noise with outliers than the ᐉ2-norm, we define a complex Huber function for waveform inversion. This objective function takes the form of an ᐉ1-norm when residuals are large and an ᐉ2-norm when the residuals are small. Unlike the ᐉ1-norm, the gradient of the Huber function does not have a singularity. The residual to be back-propagated when using the ᐉ2-norm can be expressed in the usual manner. The residual of the ᐉ1-norm, however, is expressed as a signed function. We propose two kinds of complex Huber functions; however, the final results of these two objective functions are similar to each other. The proposed algorithm is tested on synthetic data contaminated with two different types of random noise, coherent noise and bandlimited spike noise. Our new algorithm improves on the leastsquares approach with coherent noise and band-limited spike noise. There is little difference between the two for data with two different types of random noise. The Huber function is less sensitive to outlier noise and coherent noise than the least-squares objective function. We have used the threshold suggested by Bube and Nemeth. Determination of an appropriate threshold between the ᐉ1-norm and the ᐉ2-norm remains an important issue for future study. However, we could obtain a shortwavelength velocity model when we apply the new algorithm to field data obtained from the continental shelf of the East Sea in Korea. Because we do not have reliable low-frequency components in the real data, we could not recover a velocity model containing longwavelength structure.

ACKNOWLEDGMENTS This work was supported by the Korea Research Foundation Grant funded by the Korean Government 共MOEHRD, Basic Research Promotion Fund兲 共KRF-2007-314-D00320兲, the energy technology innovation 共ETI兲 project funded by the Ministry of Knowledge Economy, and by the Brain Korea 21 project of the Ministry of Education. The work of T. Ha was supported by the National Institute of Mathematical Sciences 共NIMS兲. We are grateful to our reviewers for their comments and encouragement, which greatly improved the paper.

a)

APPENDIX A

b)

NORMALIZATION

Figure 18. CIGs at 16 km obtained by using 共a兲 a linearly increasing model 共1.5–4.5 km/s兲 and 共b兲 the inverted velocity model using our algorithm after 50 iterations.

In making an objective function for waveform inversion in the frequency domain, the weighting function of rms error at each frequency component is often implicitly assumed to be equal at each frequency. In other words, a weighting function is applied uniformly 共i.e., the weighting constant is one兲 or implicitly. In this paper, we use an explicit weighting function, which makes the rms error at each frequency contribute differently. Weighting constants therefore fluctuate at each frequency and in each iteration. The numerical value of the weighting constant does not need to be fixed in advance, and it is decided automatically by normalizing the gradient vector at each frequency.

Downloaded 03 Aug 2009 to 147.46.136.35. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/

Robust seismic waveform inversion Suppose the objective function at each frequency is expressed as

冋兺 兺



Ns Nr

E共p兲 ⳱

s

M ␧,c共rs,r兲 g共␻ 兲,

r

共A-1兲

where rs,r ⳱ uˆs,r ⳮdˆs,r; Ns and Nr are the number of sources and receivers, respectively; M ⑀ ,c共r兲 is the complex Huber function in equation 2; and g共␻ 兲 is the yet undetermined weighting function. To be specific, we express the weighted gradient direction for the entire frequency as

冋兺 冋

⳵E ⳱ ⳵ pk



Ⳮ ␭I兲

Re 兺 共diag共vTk vk兲

ⳮ1

冋 册 册 册 ⳵ uˆ s,r ⳵ pk

冋冋

r* g共␻ 兲 G,

共A-2兲

兺 共diag共vkTvk兲 Ⳮ ␭I兲ⳮ1 s

冋 册 册册 ⳵ uˆ s,r

T

,

r*

⳵ pk

冋 冋兺 冋兺

max abs

Re



1

共diag共vkTvk兲 Ⳮ ␭I兲ⳮ1

s

冋 册 册 册册 ⳵ uˆ s,r ⳵ pk

T

r* g共␻ 兲

where abs共 兲 is the absolute value of an element of the gradient vector and max共 兲 is the maximum value of the absolute value of an element of the gradient vector. Thus, equation A-2 is equivalent to equation 17:

冋兺 冋 冋兺

NRM



⫻ Re

s

共B-2兲

where J is a Jacobian matrix of size N ⫻ M; N is the number of data points; and M is the number of parameters. The JTJ matrix could be singular because of the shadow zone or the geometric spreading of propagating waves 共Chavent and Plessix, 1999兲. Given this, we may better use the Levenberg-Marquardt method. In the Levenberg-Marquardt method, the normal equation B-1 is modified to

共diag共vTk vk兲 Ⳮ ␭I兲ⳮ1

冋 册 册册册 ⳵ uˆ s,r ⳵ pk

T

r*

共B-3兲

where ␭ is a positive damping constant and I is an identity matrix. If we use a very large damping constant, the matrix on the left side can be decomposed as

共JTJ Ⳮ ␭I兲 ⬇ ␭I

1 and ⌬p ⬇ ⳮ JT共uˆ ⳮ dˆ 兲* . ␭

共B-4兲

Equation B-4 tells us that, even in the Gauss-Newton method, the model parameter is updated according to the steepest-descent direction scaled by a positive value 共the inverse value of the damping constant ␭兲. Given this, the arbitrarily chosen small-step length is the Gauss-Newton direction for strong damping. The steepest-descent direction in our paper is a weighted steepest-descent direction, and it can be used in implementing the conjugate gradient algorithm without violating the mathematics of the conjugate gradient method.

MODIFIED CONJUGATE GRADIENT METHOD In general, the algorithm for the conjugate gradient method is as follows:

,

共A-4兲

⳵E ⳱ NRM ⳵p

兺␻ 共JTJ兲⌬p ⳱ 兺␻ JT共uˆ ⳮ dˆ 兲* ,

or

APPENDIX C

共A-3兲 G⳱

共B-1兲

共JTJ Ⳮ ␭I兲⌬p ⳱ JT共uˆ ⳮ dˆ 兲* ,

T

1

max abs Re

共JTJ兲⌬p ⳱ JT共uˆ ⳮ dˆ 兲*

s

where vk is the virtual source vector, ␭ is a positive damping constant, I is an identity matrix, r is the residual vector in equation 9, ⴱ denotes the complex conjugate, and g共␻ 兲 is the weighting function that is the inverse of the maximum absolute value of the gradient vector at each frequency. The numerical value of this function is decided implicitly at each frequency, when the gradient vector is first normalized. In a similar manner, G is the weighting function that fixes the gradient 共summed over all frequencies兲 to lie between ⳮ1 and 1. Of course, we assume that g共␻ 兲 and G are not functions of a parameter such as the velocity. We choose g共␻ 兲 and G as g共␻ 兲 ⳱

R23

.

共A-5兲

d1 ⳱ ⳮ ⵜ E1 ,

共C-1兲

PlⳭ1 ⳱ Pl Ⳮ ␣ dl ,

共C-2兲

dl ⳱ ⳮ ⵜ El Ⳮ ␤ ldlⳮ1

共C-3兲

共Fletcher and Reeves, 1964兲. Here, dl is the conjugate gradient direction, Pl is the model parameter, and ␣ is the step length. The conjugate gradient direction d1 at the first iteration is the same as the steepest-descent direction. From the second iteration, we can calculate the conjugate gradient direction using equation C-3. In the nonlinear conjugate gradient method, there are several different choices for determining ␤ l 共Tarantola, 2005兲. We use Fletcher and Reeves’ 共1964兲 method. Because we use the weighted or modified gradient direction 共which is normalized兲, we also modify the Fletcher and Reeves method to

APPENDIX B SCALING The classical Gauss-Newton method used in a waveform inversion 共Pratt et al., 1998兲 is given as

dl ⳱ ⳮgl** Ⳮ

T gl** gl** T

** g** glⳮ1 lⳮ1

NRM共dlⳮ1兲,

共C-4兲

where gl** is the modified gradient in equation 19 and NRM共 兲 is the normalizing operation. From the second iteration, the previous con-

Downloaded 03 Aug 2009 to 147.46.136.35. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/

R24

Ha et al.

jugate direction is needed to calculate the conjugate gradient direction. In the first iteration, we use the modified gradient direction as the steepest descent direction 共and it is also normalized兲. We call the corresponding direction the modified conjugate gradient direction.

REFERENCES Aster, R., B. Borchers, and C. Thurber, 2004, Parameter estimation and inverse problems: Academic Press Inc. Bednar, J., 1999, Theoretical comparison of equivalent-offset migration and dip moveout pre-stack imaging: Geophysics, 64, 191–196. Bishop, T., K. Bube, R. Cutler, R. Langan, P. Love, J. Resnick, D. Spindler, and H. Wyld, 1985, Tomographic determination of velocity and depth in laterally varying media: Geophysics, 50, 903–923. Bording, R., A. Gersztenkorn, L. Lines, J. Scales, and S. Treitel, 1987, Applications of seismic traveltime tomography: Geophysical Journal of the Royal Astronomical Society, 90, 285–303. Box, G., and M. Muller, 1958, A note on the generation of random normal deviates: Annals of Mathematical Statistics, 29, 610–611. Brandsberg-Dahl, S., B. Ursin, and M. de Hoop, 2003, Seismic velocity analysis in the scattering-angle/azimuth domain: Geophysical Prospecting, 51, 295–314. Bube, K., and R. Langan, 1997, Hybrid ᐉ1 /ᐉ2 minimization with applications to tomography: Geophysics, 62, 1183–1195. Bube, K., and T. Nemeth, 2007, Fast line searches for the robust solution of linear systems in the hybrid ᐉ1 /ᐉ2 and Huber norms: Geophysics, 72, no. 2, A13–A17. Chavent, G., and R. Plessix, 1999, An optimal true-amplitude least-squares prestack depth-migration operator: Geophysics, 64, 508–515. Claerbout, J., and F. Muir, 1973, Robust modeling with erratic data: Geophysics, 38, 826–844. Crase, E., A. Pica, M. Noble, J. McDonald, and A. Tarantola, 1990, Robust elastic nonlinear waveform inversion: Application to real data: Geophysics, 55, 527–538. Deregowski, S., 1990, Common-offset migration and velocity analysis: First Break, 8, 225–234. Fletcher, R., and C. M. Reeves, 1964, Function minimization by conjugate gradients: The Computer Journal, 7, 149–154. Guitton, A., and W. Symes, 2003, Robust inversion of seismic data using the Huber norm: Geophysics, 68, 1310–1319. Ha, T., S. Pyun, and C. Shin, 2006, Efficient electric resistivity inversion us-

ing adjoint state of mixed finite-element method for Poisson’s equation: Journal of Computational Physics, 214, 171–186. Ha, T., and C. Shin, 2007, Magnetotelluric inversion via reverse time migration algorithm of seismic data: Journal of Computational Physics, 225, 237–262. Huber, J., 1973, Robust regression: Asymptotics, conjectures and Monte Carlo: Annals of Statistics, 1, 799–821. Lailly, P., 1983, The seismic inverse problem as a sequence of before stack migrations: SIAM Conference on Inverse Scattering — Theory and Application. Lu, W., W. Zhang, and D. Liu, 2006, Local linear coherent noise attenuation based on local polynomial approximation: Geophysics, 71, V163–V169. Marfurt, K. J., 1984, Accuracy of finite-difference and finite-element modeling of the scalar and elastic wave equations: Geophysics, 49, 533–549. Pratt, P., 1999, Seismic waveform inversion in the frequency domain, part I — Theory and verification in a physical scale model: Geophysics, 64, 888–901. Pratt, R. G., C. S. Shin, and G. J. Hicks, 1998, Gauss-Newton and full Newton methods in frequency-space seismic waveform inversion: Geophysical Journal International, 133, 341–362. Shin, C., S. Jang, and D. J. Min, 2001, Improved amplitude preservation for prestack depth migration by inverse scattering theory: Geophysical Prospecting, 49, 592–606. Shin, C., and D. Min, 2006, Waveform inversion using a logarithm wavefield: Geophysics, 71, no. 3, R31–R42. Shin, C., S. Pyun, and J. B. Bednar, 2007, Comparison of waveform inversion, part 1: Conventional wavefield vs logarithmic wavefied: Geophysical Prospecting, 55, 449–464. Shin, C., K. Yoon, K. J. Marfurt, K. Park, D. Yang, H. Lim, S. Chung, and S. Shin, 2001, Efficient calculation of a partial-derivative wavefield using reciprocity for seismic imaging and inversion: Geophysics, 66, 1856–1863. Tarantola, A., 1984, Inversion of seismic reflection data in the acoustic approximation: Geophysics, 51, 1259–1266. ——–, 2005, Inverse problem theory and methods for model parameter estimation: Society for Industrial and Applied Mathematics. Versteeg, R., 1993, Sensitivity of prestack depth migration to the velocity model: Geophysics, 58, 873–882. ——–, 1994, The Marmousi experience: Velocity model determination on a synthetic complex data set: The Leading Edge, 13, 927–936. Vidale, J., 1990, Finite-difference calculation of traveltime in three dimensions: Geophysics, 55, 521–526. Yilmaz, O., and J. Claerbout, 1980, Pre-stack partial migration: Geophysics, 45, 1753–1779.

Downloaded 03 Aug 2009 to 147.46.136.35. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/

Suggest Documents