SPECIAL S e i s m SECTION: i c i n v e Sresi si m oin c if no v err s r i on esf e or r vr oeis r e rvpoi rr o p reop r te rt i e ise s
Full-waveform inversion of seismic data with the Neighborhood Algorithm MORITZ M. FLIEDNER, SVEN TREITEL, and LUCY MACGREGOR, Rock Solid Images
S
tochastic (Monte Carlo) optimization methods like the Genetic Algorithm (GA) and Simulated Annealing (SA) have become increasingly popular for the inversion of geophysical data. In contrast to deterministic gradientdescent methods that search for the local minimum of the misfit function near a given starting guess, stochastic methods search for the global minimum of the misfit function even in the absence of a good starting model. Stochastic methods do not require the calculation of gradients of error surfaces. Only forward modeling is needed to evaluate the objective function. In addition to a single “best” model, some stochastic methods yield statistical information about the range of acceptable models for a given error tolerance by estimating Bayesian integrals of the posterior probability density distribution (PPD). Having a statistically significant sampling of the model space and the associated error surfaces rather than a single “best” model allows us to assess the reliability and resolving power of different inversions given the available data and prior knowledge of geologically reasonable constraints on the expected solution. The SEG SEAM Earth model, along with its seismic synthetic data simulations, provides a convenient test case to evaluate our approach.
Method The problem to be addressed is to approximate an observed geophysical data set d with a theoretical model whose response approximates the data in some sense. This model is a function of n parameters; in other words, a particular model m occupies a unique point in model space. In the present case, our model is a 1D stratified medium, described by a series of layer thicknesses, layer velocities, and layer densities. Established equations can be chosen to solve the forward problem; that is, to calculate the theoretical seismic response of a given stratified medium, which can then be compared to the actual data, d. Gradient-based deterministic approaches tend to work well only if one has a good idea of the solution. Such methods provide a single, “best” model satisfying the data to a certain degree of fit. However, it is too often unclear whether this is the only model that agrees with the data. In fact, there may be a large number of other models which agree equally well with the data. This is a manifestation of the uniqueness problem, or of the “curse of nonuniqueness”, as it is often called. In contrast, stochastic methods provide us with a formal-
Figure 1. (a) Two-layer P-wave velocity reservoir model. The four panels show the seismic misfit function for bulk reservoir properties (single reservoir layer with thickness and VP varying; outside the variable layer, the true model is assumed) generated by exhaustive, dense grid search. Triangles indicate correct solution (global minimum). (b) Error surface for data modeled with 25-Hz Ricker wavelet. On the right is a zoomed and rescaled view to enhance detail. Note different color scales to improve contrast. (c), (d) Comparison of error surfaces for data with different frequency content: (c) 25-Hz Ricker wavelet, same as (b) right; (d) same model space, 50 Hz. 570
The Leading Edge
May 2012
Downloaded 15 Jun 2012 to 216.198.85.26. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
Seismic
inversion
for
reservoir
properties
fact that every model selected for fitness testing requires the solution of a forward problem. These two factors limit the practical usefulness of many stochastic inversion methods. Among the algorithms proposed to significantly reduce the number of forward model response evaluations, we chose one developed by Sambridge (1999). This is a derivative-free approach, which samples model space in such a way that sampling is densest in regions where the responses of the trial models best fit the observed data, and sparsest where such fits are poor. The method, called the Neighborhood Algorithm (NA), makes use of Voronoi cells (Sambridge, 1998), which allow the n-dimensional model space to be divided uniquely into regions based on a particular distance measure to the nearest evaluated forward model. Where samples are close together in model space, the cells are smaller, and where the sampled models are further apart, the cells are larger. The calculation begins by defining the bounds of the model space. In this paper, lower and upper bounds for seismic velocities and density are set. In contrast to local deterministic methods, there is no need for a starting guess. Initially, the model space is seeded with a collection of randomly chosen model n-vectors, and from these the model space is partitioned into Voronoi cells. Each cell is linked to its particular forward model, and the assumption is now made that all models lying within each cell share the same misfit value. The next step is to rank the Voronoi cells by misfit and select a subset with the lowest misfits, then perform uniform random walks (Gibbs sampler) within each selected Voronoi cell in model space. The endpoint of each walk determines a model whose forward response and hence its misfit value will be calculated. This iterative procedure is repeated until a given convergence criterion has been reached. Sambridge describes his NA in his 1999 paper. The algorithm can be summarized as follows: Figure 2. (a) East-west section through SEAM P-wave velocity model at 20 km north. Location of 1D extraction at east 28 km marked by yellow line. Position of lower Pleistocene reservoir indicated by arrow. (b) Elastic parameters (density = red; P-wave velocity = green; S-wave velocity = blue) in the zone of interest bracketing two reservoirs. (c) Detail of 1D seismic response. The two Pleistocene reservoirs are marked by arrows pointing to the vertical traveltimes of their respective tops and bases (blue reservoir, SEAM index = 9.4; orange reservoir, SEAM index = 9.7). The model is parameterized with 10-m thick layers of constant elastic properties.
ism to explore the range of models satisfying the observations within a given degree of fit, yet they cannot usually reveal a single “correct” answer. However, what we can do is study the behavior of such sets of models, perform statistical calculations on them, and make inferences about a region (or regions) within which we may hope that useful solution(s) actually lie. We may visualize an n-parameter model to occupy a point in a model space of dimension n; with growing n, the size of this model space grows exponentially. At this point purely random search methods to locate well-fitting models become an exercise in futility, a state of affairs known as the “curse of dimensionality.” The problem is exacerbated by the
1) Generate an initial set of ns models uniformly (or other-
wise) in model space;
2) Calculate the misfit function for the most recently gener-
ated set of ns models and determine the nr models with the lowest misfit of all models generated so far; 3) Generate ns new models by performing a uniform random walk in the Voronoi cell of each of the nr chosen model cells (i.e., (ns/nr) trial models are created in each of the nr retained cells); 4) Go to step 2. As this calculation evolves, a progressively larger number of smaller Voronoi cells will tend to isolate the interesting parts of model space where misfit is low. Where misfit is high, model space will be sampled more coarsely. Within each cell, the misfit is assumed to be constant (nearest-neighbor approximation), resulting in an estimate of the entire error surface. In addition to finding a best-fitting model, NA can be used to appraise the confidence in the result by evaluating Bayesian probability integrals (Sambridge, 1999): an ensemble of random “walkers” selects points from the Voronoi cell approximation that will contribute to the Bayesian integral. May 2012
The Leading Edge
Downloaded 15 Jun 2012 to 216.198.85.26. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
571
Seismic
inversion
for
reservoir
properties
Figure 3. Full-waveform inversion with the Neighborhood Algorithm given only bounds on the velocity range within the SEAM index = 9.7 reservoir as prior knowledge (uniform or flat prior probability distribution). 1D marginal posterior probability densities (PPD) for VP in the reservoir above (top left) and below (top right) the oil-water contact at 3000 m depth. (bottom) 2D PPDs of the two velocities (darker shade means higher probability density) and their confidence level (contoured).
In our study, we use 1D inversion of surface seismic data to estimate elastic parameters within a reservoir. Thus, in contrast to more regional 3D tomographic studies for seismic depth-imaging purposes, the number of parameters (dimensions) is manageable and the forward modeling is fast with 572
The Leading Edge
a semi-analytical solution of the elastic wave equation. Dimensionality and modeling speed are crucial for the viability of stochastic optimization methods. Seismic forward modeling was carried out with Kennett’s reflectivity code (Kennett, 1983).
May 2012
Downloaded 15 Jun 2012 to 216.198.85.26. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
Seismic
inversion
for
reservoir
properties
Figure 4. Best models from stochastic NA inversion using 150,000 convolutional (cyan) and 3300 full-waveform (black) forward modeling evaluations. True model in red and model space bounds outlined with green dashes.
Figure 5. Plot of 900 VP models generated by NA sampling within the green a priori bounds (±200 m/s around the smoothed true model) that have a data misfit smaller than the data misfit of the AVO synthetic (convolution of the true wavelet with the P-wave reflectivity calculated with the full Zoeppritz equations) generated from the true elastic model (“spaghetti plot”, after Lomax and Snieder). 574
The Leading Edge
Examples To set the scene, we illustrate the shape of a seismic misfit surface for a simple, two-parameter case (Figure 1), which is based on the Luva gas field case study by Du and MacGregor (2010). Even within reasonably narrow bounds on the variable parameter, the error surface does not have a single minimum corresponding to the best (closest to the true model) solution. The periodic nature of the seismic wavelet generates a misfit surface with quasi-periodic local minima that correspond to combinations of Vp and layer thickness (transit time) that mimic the kinematics of the true solution with the wavelet shifted by integer multiples of the dominant wavelength. This effect is especially clear because the reservoir is only about one wavelength thick at the dominant frequency of 25 Hz. Increasing the wavelet frequency reduces the effect of wavelet aliasing (Figure 1d). For a seismic inversion that uses a deterministic, local gradient-descent method to find the misfit minimum, this implies that only a starting guess close to the correct solution will converge toward the true global minimum. Note that some of the neighboring local minima are nearly as deep (have similar misfit) as the global minimum. For the SEAM test case, we use SEAM model location north 20 km and east 28 km, away from the salt body in an area with low dips (Figure 2a) and therefore appropriate for a 1D experiment. From this model we generated 1D synthetic full-waveform (Figure 2c) seismic data. This model contains two Pleistocene oil reservoirs (SEAM geoindices 9.4 and 9.7, Figure 2b, marked by arrows on Figure 2c).
May 2012
Downloaded 15 Jun 2012 to 216.198.85.26. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
Seismic
inversion
for
reservoir
properties
Figure 6. “Caterpillar” plots of inverted elastic model parameters, P-wave (a) and S-wave (b) velocities, and bulk density (c). Allowable 900 models sampled with “flattened” misfit criterion of Lomax and Snieder. Color-coded hit count per parameter bin normalized by number of models. A priori model space bounds are delineated in green and the true models are shown as red lines.
The oil-bearing reservoir is only 80 m thick, and thus extends over roughly one wavelength of the 25-Hz Ricker wavelet used in this simulation. Recovery of a given velocity structure poses a challenge even in the case of synthetic data, where the seismic wavelet is assumed to be known, and where the impulse response (primaries plus all multiples) of a postulated 1D layered model is generated with a reflectivity code. Use of the same reflectivity code for both forward modeling and inversion clearly biases the results; this effect can be mitigated, but not eliminated, by using one reflectivity code to generate the synthetic data, and another to carry out the forward modeling in the NA algorithm. It is important to have some prior knowledge of the ranges of the expected velocities and densities for a successful inversion: this allows us to put tight bounds on the model space to be explored. We first concentrate on resolving the P-wave velocity structure of this reservoir, specifically, the velocity contrast between the oil- and water saturated parts of the reservoir, leaving just two parameters to be evaluated. The statistical evaluation of the stochastic inversion of this two-parameter test can then be visualized with greater ease (Figure 3). The posterior probability density function (PPD) shown in Figure 3 represents the probability that a model is correct given the observed data (see Ulrych et al., 2001). If the model is defined by n parameters, the resulting PPD is ndimensional. Because an n-dimensional volume is difficult to visualize if n>3, one often computes the so-called “mar-
ginal” PPDs, which are obtained from a given n-dimensional PPD by integrating over all parameter ranges (i.e., between the prior lower and upper bounds specified for each model parameter), except for one or two whose joint behavior one wishes to observe in the PPD. For 2D marginals, one chooses two particular inverted parameters, say V1 and V2. The PPD plot of (V1,V2) is the result of integration over the remaining variables of the complete n-dimensional PPD. In this way, we obtain the 1D and 2D marginals shown in Figure 3. Such marginals are useful, for example, to study the resolving power of an inversion algorithm, (i.e., its ability to provide reliable estimates of a given model parameter without interference, or contamination by other model parameters). For the 2D marginals, a given percentage confidence contour delineates a region containing the true solution with 60% or 90% probability. For higher-dimensional model spaces, such visualizations become increasingly difficult, if not impossible. In the rest of this section, the model space comprises the three (isotropic) elastic parameters in the zone of interest outlined in Figure 2b, namely P-velocity, S-velocity, and density. This choice leads to 144 parameters, meaning that our model space is of dimension n = 144. Layer thicknesses are fixed to a chosen model depth sampling interval of 10 m. At this minimal layer thickness, inversion results from nearby layers tend to be highly correlated, meaning the velocity or density estimates in a given layer may not be independent of the velocity May 2012
The Leading Edge
Downloaded 15 Jun 2012 to 216.198.85.26. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
575
Seismic
inversion
for
reservoir
properties
Figure 7. (a) “Caterpillar” plot of 7000 VP models generated from NA sampling with unmodified amplitude misfit criterion (Tarantola, 2005). Comparison of selected individual histograms from (b) unmodified misfit and (c) “flattened” misfit (Figure 6). True model shown as red line.
and density estimates of nearby layers. We therefore impose a three-point depth average smoothing filter on each elastic parameter during the NA sampling. Stochastic inversion will not usually give the best possible solution. Each realization provides a best solution that is the best-fitting model in the ensemble of trial models that it generated (Figure 4). We show the results of stochastic inversions using two different forward modeling algorithms: full waveform (Kennett, 1983), and much faster, but less accurate, convolutional (standard AVO modeling = convolution of a wavelet with the primary PP reflectivity calculated with Zoeppritz’s equations). Both methods produce comparably good approximations of the true velocity model; the interval of interest is not contaminated with strong multiples and, therefore, the convolutional approximation of modeling only primaries is justified. In addition, the stochastic exploration of the (geologically plausible) model space can outline the range of models that are consistent with the seismic data within a given data and modeling error (acceptable models). Figure 5 shows the P-velocity depth curves for 900 forward full-waveform synthetics sampled from the bounded model space delineated by bold green lines. The figure shows only those models whose misfit is less than or equal to the data 576
The Leading Edge
misfit that one would achieve by modeling only the PP reflectivity of the true model rather than the full elastic response. Not surprisingly, a large number of models satisfy the data equally well, occupying the entire area within the given Pvelocity bounds (the results for the other elastic parameters are similar). At first sight, this result is rather disturbing, but a plot showing all models satisfying the data within a given misfit is misleading. Not all parameter combinations result in acceptable models. Instead, we might want to know with what frequency models occur which fall within given P-velocity deviations from their true values. Figures 6 and 7 show three collections of color-coded histograms plotted about their true values as a function of depth for Vp, Vs, and density ρ. These histograms are plotted in the form of horizontal color bars at 10-m depth intervals (the depth parameterization of the true model). The vertical color bar is a measure of the number of models falling within a particular 10-m/s bin of each histogram (normalized by the total number of models). For their construction, we used a criterion introduced by Lomax and Snieder (1994), which involves labeling all models whose misfit is at or below a predetermined misfit value (tolerance or acceptance level) with this same predetermined misfit value. This “flattened” misfit
May 2012
Downloaded 15 Jun 2012 to 216.198.85.26. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
Seismic
inversion
for
reservoir
properties
Figure 8. “Caterpillar” plot of histogram difference between acceptable and all models at a tolerance level just above the unmodified misfit minimum reached by the NA ensemble. Gray scale codes the expression [(a/n)-(b/m)], where a = histogram height of models with misfit< = tolerance, b = histogram height of all models, n = 2 = number of models with misfit< = tolerance, m = 3240 = total number of models. Selected individual differential histogram for layer at 2900 m depth on the right. True model in red.
criterion in Figures 6 and 7c ensures that the models within the allowable model space are sampled uniformly. Inside the acceptable space the error surface is flat and all Voronoi cells are equally probable. Using an unmodified amplitude misfit criterion will (by design) bias the NA sampling toward the regions of model space with the lowest misfit, which is the desired strategy when searching only for the “best” model (Figure 7). Inspection of the Vp histograms now reveals that these often cluster around their true values. We see a similar, but less pronounced effect for Vs. On the other hand, density recovery is poor. Subsurface density estimation from seismic data is a difficult task. A different, but equivalent parameterization of the elastic properties, e.g. Vp, Vp/Vs, and P-wave impedance can sometimes give better results. Density recovery is often facilitated by the availability of longer-offset data, but in this case the maximum illumination angle of 56° is reached well within the synthesized offset range. 578
The Leading Edge
Our unmodified misfit criterion, in effect, forces the result toward a unimodal distribution around the best-fitting solution (Figure 7b). The width of the distribution gives an indication of the uncertainty. By contrast, the multipeaked distribution resulting from the “flattened” misfit criterion in Figure 7c more clearly maps out alternative solutions that, most of the time, bracket the true solution in a narrower range than the width of the best-fitting unimodal quasi-Gaussian distributions that Figure 7b would indicate. We believe that these pictures, which we call “caterpillar” plots, can convey valuable information about the confidence we may have in the results produced by our present approach. We can emphasize the aspect of “acceptable range” by subtracting the normalized histograms (i.e., histogram height divided by number of models; each bar then represents the fraction of models in a particular bin) for all models (irrespective of misfit value) from the ones for the acceptable models
May 2012
Downloaded 15 Jun 2012 to 216.198.85.26. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
Seismic
inversion
for
whose responses lie within a suitably low tolerance level (Figure 8). Misfits in this calculation are unmodified rather than the “flattened” misfits used to drive the NA inversion. This will pick out as positive bars those parameter bins that are more likely in acceptable models than in the a priori distribution, while the original distribution shows up as its negative mirror image in the same display. Setting the tolerance level as low as possible, leaves just one or two of these positive bars at each depth level. These tend to be the ones at the outer edges of the acceptable model space because the volume of a hypercube of more than five dimensions is increasingly concentrated at its boundary (for an accessible discussion of this counter-intuitive phenomenon see Hayes, 2011). In the case of the NA realization shown in Figure 8 (2500 models generated), the two best-fitting models fall into bins which bracket the true Vp value for 70% of the layers (roughly analogous to the confidence contours in Figure 3), and 50% for Vs. The five best-fitting models would bracket 100% of the true Vp (and 85% of Vs). Eventually it becomes too time-consuming to generate even more models stochastically to find a better fitting result. It is more efficient to use a deterministic, quasi-Newton, local gradient-descent inversion with the small number of “good enough” models selected in the stochastic inversion step as starting models to generate the final model. This approach has been advocated for some time (e.g., Cary and Chapman, 1988). Conclusions Deterministic inversion provides a computationally efficient means to find a “best” model compatible with a seismic dataset given a good starting model. It is often hard to assess the suitability of this result, or the uncertainty in the resulting parameter estimations because a range of possible models can be found to fit the data equally well. Global stochastic inversion methods such as the Neighborhood Algorithm not only provide a single “best” model, but also illustrate the range of possible models that are compatible with the data, and provide a means of determining the confidence with which key inversion parameters may be determined. They offer the possibility to explore alternative scenarios where the data are
reservoir
properties
compatible with several significantly different Earth models. However, this benefit comes at the expense of increased computational cost, because significantly more forward model evaluations are required. Among the elastic parameters, velocities can generally be recovered reliably from the inversion of surface seismic data. The same is true for impedances, although we have not shown such calculations here. References Cary, P. W. and C. H. Chapman, 1988, Automatic 1-D waveform inversion of marine seismic refraction data: Geophysical Journal International, 93, no. 3, 527–546, http://dx.doi.org/10.1111/j.1365246X.1988.tb03879.x. Du, Z. and L. M. MacGregor, 2010, Reservoir characterization from joint inversion of marine CSEM and seismic AVA data using Genetic Algorithms: a case study based on the Luva gas field: 80th Annual International Meeting, SEG, Expanded Abstracts, 737– 741, http://dx.doi.org/10.1190/1.3513888. Hayes, B., 2011, An adventure in the nth dimension: American Scientist, 99, 442, http://dx.doi.org/10.1511/2011.93.442. Kennett, B. L. N., 1983, Seismic wave propagation in stratified media: Cambridge University Press. Lomax, A. and R. Snieder, 1994, Finding sets of acceptable solutions with a genetic algorithm with application to surface wave group dispersion in Europe: Geophysical Research Letters, 21, no. 24, 2617–2620, http://dx.doi.org/10.1029/94GL02635. Sambridge, M., 1998, Exploring multidimensional landscapes without a map: Inverse Problems, 14, no. 3, 427–440, http://dx.doi. org/10.1088/0266-5611/14/3/005. Sambridge, M., 1999, Geophysical inversion with a neighborhood algorithm—I. Searching the parameter space: Geophysical Journal International, 138, no. 2, 479–494, http://dx.doi.org/10.1046/ j.1365-246X.1999.00876.x. Ulrych, T. J., M. D. Sacchi, and A. Woodbury, 2001, A Bayes tour of inversion: a tutorial: Geophysics, 66, no. 1, 55–69, http://dx.doi. org/10.1190/1.1444923.
Acknowledgments: We thank Malcolm Sambridge for the Neighborhood Algorithm code and Zhijun Du for his previous work on the two-layer gas reservoir model. Corresponding author: moritz.fl
[email protected]
May 2012
The Leading Edge
Downloaded 15 Jun 2012 to 216.198.85.26. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
579