Jun 22, 2017 - Full Waveform Inversion. - Part 2 Synthetic Data Applications. 1 Review of the FWT algorithm. 2 Conjugate Gradient and Quasi-Newton l-BFGS.
Lectures on Full Waveform Inversion - Part 2 Synthetic Data Applications Daniel K¨ohn, Denise De Nil, Wolfgang Rabbel
June 22, 2017
Full Waveform Inversion - Part 2 Synthetic Data Applications
1
Review of the FWT algorithm
2
Conjugate Gradient and Quasi-Newton l-BFGS
3
Simple example: A spherical low velocity anomaly
4
The CTS Test Problem
5
The Marmousi-2 model
Review of the FWT algorithm Pure Gradient Method Residual Energy E 250
200
Density ρ ®
150
100
50
P−wave velocity Vp ®
Gradient method: mn+1 = mn − µn Pn
∂E ∂m
n
Review of the FWT algorithm
Final gradients The gradients for the Lam´e parameters λ, µ and the density ρ can be written as X Z ∂ux ∂uy ∂E ∂Ψx ∂Ψy dt =− + + ∂λ(x) ∂x ∂y ∂x ∂y sources Z X ∂E ∂ux ∂uy ∂Ψx ∂Ψy =− dt + + ∂µ(x) ∂y ∂x ∂y ∂x sources ∂ux ∂Ψx ∂uy ∂Ψy +2 + ∂x ∂x ∂y ∂y Z X ∂ 2 uy ∂ 2 ux ∂E = dt Ψx + Ψy ∂ρ(x) sources ∂t2 ∂t2
Conjugate Gradient and Quasi-Newton l-BFGS Gradient method requires 200 iterations Residual Energy E 250
Density ρ →
200
150
100
50
P−wave velocity Vp →
I’m not happy with the far too slow convergence speed ...
Conjugate Gradient and Quasi-Newton l-BFGS Gradient method get stuck in narrow valley Residual Energy E 250
Density ρ →
200
150
100
50
P−wave velocity Vp →
... and then there could be cases like this.
Conjugate Gradient and Quasi-Newton l-BFGS
Conjugate Gradient Minimization of the quadratic form by using conjugate search directions instead of the gradient (Hestenes and Stiefel, 1952) Extension to nonlinear objective functions (Fletcher and Reeves, 1964; Polak and Riebi`ere, 1969) Details, mathematical proofs [Nocedal and Wright, 1999]
Conjugate Gradient and Quasi-Newton l-BFGS Conjugate Gradient Algorithm 1
∂E Calculate the steepest decent direction: ∆xn = − ∂m
2
Compute βn according to
n
Fletcher-Reeves: βnFR = Polak-Riebi` ere: βnPR =
∆xT n ∆xn ∆xT n−1 ∆xn−1
∆xT n (∆xn −∆xn−1 ) ∆xT n−1 ∆xn−1 ∆xT (∆x −∆x
)
Hestenes-Stiefel: βnHS = − sT n (∆xnn −∆xn−1 n−1 ) Dai-Yuan: βnDY = − sT
n−1 ∆xT n ∆xn
n−1 (∆xn −∆xn−1 )
Popular choice βn = max{0, βnPR } which allows an automatic direction reset 3
Update conjugate direction: sn = ∆xn + βn sn−1
4
Estimate step length µn
5
Update material parameters: mn+1 = mn + µn sn
Conjugate Gradient and Quasi-Newton l-BFGS Quasi-Newton l-BFGS Idea: Approximate the product of the inverse Hessian with the gradient by finite-differences.
Quasi-Newton Limited Memory Broyden-Fletcher-Goldfarb-Shanno (l-BFGS) method.
The L-BFGS Algorithm Quasi-Newton L-BFGS Method (loop 1) The Limited-Memory Broyden-Fletcher-Goldfarb-Shanno method (see also Nocedal & Wright (1999), Brossier (2009)) At iteration step n: ∂E 1 Compute g = n ∂m n 2
Compute and store sn = mn+1 − mn Compute and store yn = gn+1 − gn
3
q = gn
4
for i = n-1 to n-m do ρi = y T1s i
i
αi = ρi siT q q = q − αi yi end for
The L-BFGS Algorithm
Quasi-Newton L-BFGS Method (loop 2) T y sn−1 n−1 T y yn−1 n−1
1
Compute Hn0 =
2
Compute z = Hn0 q
3
for i = n-m to n-1 do βi = ρi yiT z z = z + si (αi − βi ) end for
4
Hn gn = z
5
Update model mn+1 = mn − µn Hn gn
Conjugate Gradient and Quasi-Newton l-BFGS Gradient method (200 iterations) Residual Energy E 250
Density ρ →
200
150
100
50
P−wave velocity Vp →
Conjugate Gradient and Quasi-Newton l-BFGS Conjugate Gradient (30 iterations) Residual Energy E 250
Density ρ →
200
150
100
50
P−wave velocity Vp →
Conjugate Gradient and Quasi-Newton l-BFGS Quasi-Newton l-BFGS (20 iterations) Residual Energy E 250
Density ρ →
200
150
100
50
P−wave velocity Vp →
Problems related to local non-linear optimization Uni-modal objective function (1 minimum) Residual Energy E 250
Density ρ →
200
150
100
50
P−wave velocity Vp →
E = (1 − Vp)2 + 100(ρ − Vp 2 )2 (Rosenbrock, 1960)
Problems related to local non-linear optimization Multi-modal objective function (multiple minima) Residual Energy E 250
Density ρ →
200
150
100
50
P−wave velocity Vp →
E = (Vp 2 + ρ − 11)2 + (Vp + ρ2 − 7)2 (Lichtblau, 1972)
Problems related to local non-linear optimization Multi-modal objective function (multiple minima) Residual Energy E 250
Density ρ →
200
150
100
50
P−wave velocity Vp →
E = (Vp 2 + ρ − 11)2 + (Vp + ρ2 − 7)2 (Lichtblau, 1972)
Problems related to local non-linear optimization Multi-modal objective function (multiple minima) Residual Energy E 250
Density ρ →
200
150
100
50
P−wave velocity Vp →
E = (Vp 2 + ρ − 11)2 + (Vp + ρ2 − 7)2 (Lichtblau, 1972)
Problems related to local non-linear optimization Multi-modal objective function (multiple minima) Residual Energy E 250
Density ρ →
200
150
100
50
P−wave velocity Vp →
E = (Vp 2 + ρ − 11)2 + (Vp + ρ2 − 7)2 (Lichtblau, 1972)
Simple example: A spherical low velocity anomaly
Simple example: A spherical low velocity anomaly Pressure wavefield: simple acoustic test problem V [m/s] − True Model
V [m/s] − Starting Model
p
p
2400 50
50
100
100
2300
2100
y [m]
y [m]
2200
150
2000 150 1900
1800 200
200 1700
1600 250
250 20
40
60
80 100 x [m]
120
140
160
20
40
60
80 100 x [m]
120
140
160
Simple acoustic test problem: A spherical low velocity anomaly in a homogeneous full space.
Simple example: A spherical low velocity anomaly Pressure wavefield: simple acoustic test problem
Simple example: A spherical low velocity anomaly Starting model Vp [m/s] − True Model
Vp [m/s] − Start Model
Vp0 = 2000 m/s
Vp = 2000 m/s
50
100
100 Depth [m]
Depth [m]
0
50
Vp = 1700 m/s
150
200
150
200
250
250 50
100 Distance [m]
150
50
100 Distance [m]
150
Simple acoustic test problem: homogenous starting model.
Simple example: A spherical low velocity anomaly Seismic sections: initial model, true model, data residuals True Model uobs y
Initial Data Residuals δ uy = umod −uobs y y 0.04
0.045
0.045
0.045
0.05
0.05
0.05
time [s]
0.04
time [s]
time [s]
Starting Model umod y 0.04
0.055
0.055
0.055
0.06
0.06
0.06
0.065
50
100 trace #
150
0.065
50
100 trace #
150
0.065
50
100 trace #
150
Seismic sections of the y-component for the simple test problem: The starting model (left), the true model (center) and the data residuals (right).
Simple example: A spherical low velocity anomaly Non-linear optimization of P-wave velocity model Minimize objective function by CG for the P-wave velocity vp : n n+1 n n −1 ∂E vp = vp − µ H ∂vp with gradient ∂E/∂vp, Hessian H and step-length µ Efficient gradient calculation by time-domain adjoint method X Z ∂E ∂ux ∂uy ∂Ψx ∂Ψy = −2ρvp dt + + , ∂vp ∂x ∂y ∂x ∂y sources with the forward wavefield u and adjoint wavefield Ψ, respectively.
Simple example: A spherical low velocity anomaly
Forward, adjoint and correlated wavefields (gradient) for shot 45
Simple example: A spherical low velocity anomaly The effect of the preconditioning operator P Gradient − δ λ
−11
x 10 1
Gradient − δ λ (rescale)
−15
x 10
Precond. Gradient − δ λ
−12
x 10 10
−1 0
50
9
−2
8
−3
−1
7
100
−4
y [m]
6 −2
−5
150
5
−6
4
−7
3
−8
2
−9
1
−3 200 −4
250
0
−5 50
100 x [m]
150
50
100 x [m]
150
50
100 x [m]
150
The effect of the preconditioning operator P. The Gradient δλ0 before (left) and after the application of the preconditioning operator (right). Artifacts due to low ray-coverage are more prominent in the rescaled image of the unpreconditioned gradient (center).
Simple example: A spherical low velocity anomaly P-wave velocity model FWT result Vp [m/s] − Iteration No. 155
Vp [m/s] − Iteration No. 10
Vp [m/s] − True Model
2400 50
50
50
100
100
100
2300
2100
y [m]
y [m]
y [m]
2200
2000
150
150
150
200
200
200
1900 1800 1700 1600
250
250 50
100 x [m]
150
250 50
100 x [m]
150
50
100 x [m]
150
Inversion results for the P-wave velocity model of the spherical low velocity anomaly after 10 (left) and 155 FWT iterations (center) compared with the true model (right).
Simple example: A spherical low velocity anomaly Seismic sections: FWT result, true model, data residuals True Model uobs
Final Data Residuals δ u = umod−uobs
y
y
0.04
0.045
0.045
0.045
0.05
0.05
0.05
time [s]
0.04
time [s]
time [s]
Final Model (Iteration 155) umod y 0.04
0.055
0.055
0.055
0.06
0.06
0.06
0.065
50
100 trace #
150
0.065
50
100 trace #
150
0.065
50
100 trace #
y
y
150
Seismic sections (y-component) for the inversion result (left), the true model (center) and the data residuals (right).
The CTS Test Problem
The Cross-Triangle-Square (CTS) model The CTS model by D. De Nil and D. K¨ ohn P−wave velocity [m/s] 2500 500
y [m]
1000 1500
2000
2000 2500 3000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1500
S−wave velocity [m/s] 1400 500 1300 y [m]
1000 1200 1500 1100
2000
1000
2500 3000
900 1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Density ρ [kg/m3] 2200 500
2150
y [m]
1000 2100 1500 2050 2000 2000 2500 3000
1950 1000
[K¨ohn et al., 2012]
2000
3000
4000
5000
6000
7000
8000
9000
10000
The Cross-Triangle-Square (CTS) model
CTS model: acquisition geometry Acquistion Geometry 100 sources
400 receiver
200
400
y [m]
600
800
1000
1200
1400 1000
2000
3000
4000
5000 x [m]
6000
7000
8000
9000
10000
The Cross-Triangle-Square (CTS) model
CTS model: starting model P−wave velocity [m/s] 2500 500
y [m]
1000 1500
2000
2000 2500 3000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1500
S−wave velocity [m/s] 1400 500 1300 y [m]
1000 1200 1500 1100
2000
1000
2500 3000
900 1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Density ρ [kg/m3] 2200 500
2150
y [m]
1000 2100 1500 2050 2000 2000 2500 3000
1950 1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
The Cross-Triangle-Square (CTS) model Influence of frequency filtering P−wave velocity (result) [m/s] 2400
y [m]
500 1000
2200
1500
2000
2000
1800
2500 1600 3000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
S−wave velocity (result) [m/s] 1400 500 1300 y [m]
1000 1200 1500 1100
2000
1000
2500 3000
900 1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
3
Density ρ (result) [kg/m ] 2200 500
2150
y [m]
1000 2100 1500 2050 2000 2000 2500 3000
1950 1000
No frequency filter
2000
3000
4000
5000
6000
7000
8000
9000
10000
The Cross-Triangle-Square (CTS) model Influence of frequency filtering P−wave velocity (result) [m/s] 2400
y [m]
500 1000
2200
1500
2000
2000
1800
2500 1600 3000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
S−wave velocity (result) [m/s] 1400 500 1300 y [m]
1000 1200 1500 1100
2000
1000
2500 3000
900 1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
3
Density ρ (result) [kg/m ] 2200 500
2150
y [m]
1000 2100 1500 2050 2000 2000 2500 3000
1950 1000
2000
3000
4000
5000
Low pass frequency filters: 5.0-10.0 Hz
6000
7000
8000
9000
10000
The Cross-Triangle-Square (CTS) model Influence of frequency filtering P−wave velocity (result) [m/s] 2400
y [m]
500 1000
2200
1500
2000
2000
1800
2500 1600 3000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
S−wave velocity (result) [m/s] 1400 500 1300 y [m]
1000 1200 1500 1100
2000
1000
2500 3000
900 1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
3
Density ρ (result) [kg/m ] 2200 500
2150
y [m]
1000 2100 1500 2050 2000 2000 2500 3000
1950 1000
2000
3000
4000
5000
6000
Low pass frequency filters: 2.0-5.0-10.0 Hz
7000
8000
9000
10000
The Cross-Triangle-Square (CTS) model Influence of the model parametrization Lame parameter λ (result) [Pa]
9
x 10 8
500
7
y [m]
1000 6 1500 5
2000
4
2500 3000
3 1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Lame parameter µ (result) [Pa]
9
x 10 4
500
3.5
y [m]
1000 3 1500 2.5
2000
2
2500 3000
1.5 1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
3
Density ρ (result) [kg/m ] 2200 500
2150
y [m]
1000 2100 1500 2050 2000 2000 2500 3000
1950 1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Lam´e parameters, low pass frequency filters: 2.0-5.0-10.0 Hz
The Cross-Triangle-Square (CTS) model Influence of the model parametrization P−wave impedance (result) [kg/s m2]
6
x 10 5
500 4.5
y [m]
1000 1500
4 2000 2500 3000
3.5 1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
y [m]
S−wave impedance (result) [kg/s m2]
6
x 10
500
2.8
1000
2.6
1500
2.4
2000
2.2
2500 3000
2 1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
3
Density ρ (result) [kg/m ] 2200 500
2150
y [m]
1000 2100 1500 2050 2000 2000 2500 3000
1950 1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Seismic impedances, low pass frequency filters: 2.0-5.0-10.0 Hz
The Marmousi-2 model
The Marmousi-2 model
[Martin et al., 2006] NX = 500 gridpoints × NY = 174 gridpoints → 87000 gridpoints × 3 parameter classes (Vp, Vs, density) → 261000 model parameters
The Marmousi-2 model
Seismic modelling and inversion codes are benchmarked on 1 node of the NEC cluster at Kiel university: 2 Intel Xeon E5-2670 CPUs (16 cores, clock speed 2.6 GHz) 128 GB DDR4 RAM
Marmousi-2 benchmarks (forward problem) First-arrival travel time map
0.0 Depth [km]
RAJZEL Eikonal FD Run-time (1 core): 0.05 s
0.5 1.0 1.5 2.0 2.5 3.0 0.0
2.0
4.0 6.0 Distance [km]
8.0
10.0
8.0
10.0
Pressure wavefield (time = 1.922 s)
0.0
DENISE time-domain FD Run-time (16 cores): 2.1 s
Depth [km]
0.5 1.0 1.5 2.0 2.5 3.0 0.0
0.0
4.0 6.0 Distance [km]
10 Hz monochromatic pressure wavefield
0.5 Depth [km]
GERMAINE frequency-domain FD Run-time (1 core): 1.3 s
2.0
1.0 1.5 2.0 2.5 3.0 0.0
2.0
4.0 6.0 Distance [km]
8.0
10.0
Marmousi-2: acquisition geometry
Depth [km]
0.5 1 1.5 2 2.5 3 1
2
3
4 5 6 Distance [km]
7
8
9
10
100 airgun sources, 40 m below the free-surface Source wavelet: low-pass filtered spike (fmax = 15 Hz) OBC with 400 multi-component receivers (x,y-component)
The Marmousi-2 model Propagation of the Pressure Wavefield Pressure wavefield (time = 1.351 s)
0.0 0.5
Depth [km]
1.0 1.5 2.0 2.5 3.0 3.5 0.0
1.0
2.0
3.0 Distance [km]
4.0
5.0
Click here for fancy 30 fps wavefield movie
6.0
The Marmousi-2 model Preconditioning Operator Gradient δ Vp (no Preconditioning)
−15
x 10
0.5
5
y [km]
1 0
1.5 2
−5 2.5 3
−10 1
2
3
4
5 x [km]
6
7
8
9
10
Gradient δ V (Preconditioning)
−16
p
x 10 1.5
0.5
1
y [km]
1
0.5
1.5
0
2
−0.5
2.5
−1
3
−1.5 1
2
3
4
5 x [km]
6
7
8
9
10
Marmousi-2 (Vp ), Start Model V [m/s]
P−wave velocity (Traveltime Tomography)
p
Depth [km]
0.5
4500
1 1.5
4000
2 2.5
3500
3 1
2
3
4
5
6
7
8
9
10
P−wave velocity (true model)
3000
2500
Depth [km]
0.5 2000
1 1.5
1500
2 2.5 3
1000 1
2
3
4 5 6 Distance [km]
7
8
9
10
Marmousi-2 (Vp ), Freq. 2 Hz, 50 It. V [m/s]
P−wave velocity (Waveform Tomography)
p
Depth [km]
0.5
4500
1 1.5
4000
2 2.5
3500
3 1
2
3
4
5
6
7
8
9
10
P−wave velocity (true model)
3000
2500
Depth [km]
0.5 2000
1 1.5
1500
2 2.5 3
1000 1
2
3
4 5 6 Distance [km]
7
8
9
10
Marmousi-2 (Vp ), Freq. 2-5 Hz, 75 It. V [m/s]
P−wave velocity (Waveform Tomography)
p
Depth [km]
0.5
4500
1 1.5
4000
2 2.5
3500
3 1
2
3
4
5
6
7
8
9
10
P−wave velocity (true model)
3000
2500
Depth [km]
0.5 2000
1 1.5
1500
2 2.5 3
1000 1
2
3
4 5 6 Distance [km]
7
8
9
10
Marmousi-2 (Vp ), Freq. 2-5-10 Hz, 90 It. V [m/s]
P−wave velocity (Waveform Tomography)
p
Depth [km]
0.5
4500
1 1.5
4000
2 2.5
3500
3 1
2
3
4
5
6
7
8
9
10
P−wave velocity (true model)
3000
2500
Depth [km]
0.5 2000
1 1.5
1500
2 2.5 3
1000 1
2
3
4 5 6 Distance [km]
7
8
9
10
Marmousi-2 (Vp ), Freq. 2-5-10-20 Hz, 70 It. V [m/s]
P−wave velocity (Waveform Tomography)
p
Depth [km]
0.5
4500
1 1.5
4000
2 2.5
3500
3 1
2
3
4
5
6
7
8
9
10
P−wave velocity (true model)
3000
2500
Depth [km]
0.5 2000
1 1.5
1500
2 2.5 3
1000 1
2
3
4 5 6 Distance [km]
7
8
9
10
Marmousi-2 (Vs ), Freq. 2-5-10-20 Hz, 70 It. V [m/s]
S−wave velocity (Waveform Tomography)
s
Depth [km]
0.5
2600
1 2400
1.5 2
2200
2.5 2000
3 1
2
3
4
5
6
7
8
9
10
1800 1600
S−wave velocity (true model) 1400
Depth [km]
0.5 1
1200
1.5
1000
2 800
2.5 3
600 1
2
3
4 5 6 Distance [km]
7
8
9
10
Marmousi-2 (Density ρ), Freq. 2-5-10-20 Hz, 70 It. ρ [kg/m3] 2800
Density (Waveform Tomography)
Depth [km]
0.5 1
2600
1.5 2
2400
2.5 3
2200 1
2
3
4
5
6
7
8
9
10 2000
Density (true model) 1800
Depth [km]
0.5 1
1600
1.5 2
1400
2.5 3 1
2
3
4 5 6 Distance [km]
7
8
9
10
1200
The Marmousi-2 model Seismic section for shot 50 (start model) Seismic Section
1 2
Time [s]
3 4 5 6 7 50
100
150
200 channel #
250
300
350
400
The Marmousi-2 model Seismic section for shot 50 (FWT result) Seismic Section
1 2
Time [s]
3 4 5 6 7 50
100
150
200 channel #
250
300
350
400
The Marmousi-2 model Seismic section for shot 50 (true model) Seismic Section
1 2
Time [s]
3 4 5 6 7 50
100
150
200 channel #
250
300
350
400
The Marmousi-2 model
Evolution of the L2-Norm Evolution of the Residual energy
0
10
Normalized Residual energy
1 Hz 2.5 Hz 5 Hz 10 Hz
−1
10
−2
10
10
20
30
40 50 Iteration step No.
60
70
80
90
The Marmousi-2 model Influence of Hessian approximations So far we used a simple linear scaling with depth as Hessian approximation {Ha1 }−1 =depth
More sophisticated: Integrated forward wavefield + approximation of the receiver Greens function (Plessix & Mulder, 2004) {Ha2 }−1 =
R
dt|u(xs
,x,t)|2
asinh
xmax −x r z
−asinh
xmin −x r z
−1
max = minimum and maximum receiver positions xmin r , xr xs = source position
Marmousi-2 - influence of Hessian: PCG + Ha1 Vs [m/s]
S−wave velocity (Waveform Tomography)
Depth [km]
0.5
2600
1 2400
1.5 2
2200
2.5 2000
3 1
2
3
4
5
6
7
8
9
10
1800 1600
S−wave velocity (true model) 1400
Depth [km]
0.5 1
1200
1.5
1000
2 800
2.5 3
600 1
2
3
4 5 6 Distance [km]
7
8
9
10
Marmousi-2 - influence of Hessian: PCG + Ha2 Vs [m/s]
S−wave velocity (Waveform Tomography)
Depth [km]
0.5
2600
1 2400
1.5 2
2200
2.5 2000
3 1
2
3
4
5
6
7
8
9
10
1800 1600
S−wave velocity (true model) 1400
Depth [km]
0.5 1
1200
1.5
1000
2 800
2.5 3
600 1
2
3
4 5 6 Distance [km]
7
8
9
10
Marmousi-2 - influence of Hessian: l-BFGS + Ha2 Vs [m/s]
S−wave velocity (Waveform Tomography)
Depth [km]
0.5
2600
1 2400
1.5 2
2200
2.5 2000
3 1
2
3
4
5
6
7
8
9
10
1800 1600
S−wave velocity (true model) 1400
Depth [km]
0.5 1
1200
1.5
1000
2 800
2.5 3
600 1
2
3
4 5 6 Distance [km]
7
8
9
10
References
K¨ ohn, D., De Nil, D., Kurzmann, A., Przebindowska, A., and Bohlen, T. (2012). On the influence of model parametrization in elastic full waveform tomography. Geophysical Journal International, 191(1):325–345. Martin, G., Wiley, R., and Marfurt, K. (2006). Marmousi2 - An elastic upgrade for Marmousi. The Leading Edge, 25:156–166. Nocedal, J. and Wright, S. (1999). Numerical Optimization. Springer, New York.