Solutions, algorithms and inter-relations for local minimization search

INSTITUTE OF PHYSICS PUBLISHING

JOURNAL OF GEOPHYSICS AND ENGINEERING

doi:10.1088/1742-2132/3/2/001

J. Geophys. Eng. 3 (2006) 101–113

Solutions, algorithms and inter-relations for local minimization search geophysical inversion Stewart A Greenhalgh1, Zhou Bing1 and Alan Green2 1 2

Department of Physics, The University of Adelaide, Adelaide, Australia Institute of Geophysics, ETH Z¨urich, Switzerland

Received 23 August 2005 Accepted for publication 16 February 2006 Published 21 March 2006 Online at stacks.iop.org/JGE/3/101 Abstract This paper presents in a general form the most popular local minimization search solutions for geophysical inverse problems—the Tikhonov regularization solutions, the smoothest model solutions and the subspace solutions, from which the inter-relationships between these solutions are revealed. For the Tikhonov regularization solution, a variety of forms exist—the general iterative formula, the iterative linearized scheme, the Levenberg–Marquardt version, the conjugate gradient solver (CGS) and local-search quadratic approximation CGS. It is shown here that the first three solutions are equivalent and are just a specified form of the gradient solution. The local-search quadratic approximation CGS is shown to be a more general form from which these three solutions emerge. The smoothest model solution (Occam’s inversion) is a subset of the Tikhonov regularization solutions, for which a practical algorithm generally comprises two parts: (1) determine the subset of the Tikhonov regularization solutions which satisfy the desired tolerance of data fit, (2) choose the solution which best fits the data in the subset. The solution procedure is actually an adaptive Tikhonov regularization solution. The subspace solution is mathematically no more than a dimension-decreased transform of model parameterization in the Tikhonov regularization solution or the smoothest model solution. The crucial step is to choose the basis vectors which span the whole model space in the parameterization transform. A simple form of the transform is developed which yields greater flexibility for geophysical inverse applications. Keywords: geophysical inversion, algorithms, Tikhonov regularization solution, subspace methods, smoothest model solution

1. Introduction Geophysical inversion seeks to apply a model reconstruction algorithm to image the subsurface from observational data. In the mathematical literature, the inverse problem is formulated as an optimization scheme to find the model (medium structure) that gives the best fit between the synthetic and measured (geophysical and geological) data. There have been several excellent books, such as Menke (1984), Tarantola (1987), Scales and Smith (1994), Parker (1994), and Aster et al (2005), as well as many journal articles, which treat geophysical inversion. In reviewing the various inversion algorithms, they can be broadly classified into two classes. 1742-2132/06/020101+13$30.00

One is based on a global minimization search, such as the genetic algorithm (Stoffa and Sen 1991) and the simulated annealing method (Dittmer and Szymenski 1995). The other is based on a local minimization search, such as the generalized least-squares algorithm (Tarantola and Vallette 1982), the iterative linearized algorithm (Pelton et al 1978, Tripp et al 1984, Smith and Vozoff 1984, Dabas et al 1994), the conjugate gradient algorithms (Ellis and Oldenburg 1994b, Zhang et al 1995), the smoothest model algorithms (Constable et al 1987, Ellis and Oldenburg 1994a, Oldenburg and Li 1994, LaBrecque et al 1996) and the subspace algorithms (Skilling and Bryan 1984, Kennett and Williamson 1988, Oldenburg et al 1993). Although the global minimization methods are

© 2006 Nanjing Institute of Geophysical Prospecting

Printed in the UK

101

S A Greenhalgh et al

non-linear and independent of the initial model, they usually consume a large amount of computer time to find the final model in the random model space. They are generally applied only to some simple models involving a restricted number of parameters (Chunduru et al 1995, 1996) because they are very expensive for large-scale model-parameter inversion. The local minimization search algorithms offer advantages in terms of effectiveness and computational efficiency by which to reconstruct an image of the subsurface model with a good initial guess (which can be drawn from the integration of basic geological and geophysical observations), and ease of incorporating a priori information to reduce the multiple solutions. Thus, they are often applied to real (field data) situations. This paper is not concerned with the first kind (global minimization) but rather with the second kind (local minimization) of inversion algorithms. In particular, we give the general forms of solution for the five most popular algorithms: the general iterative formula, the iterative linearized scheme, the Levenberg–Marquardt version, the conjugate gradient solver (CGS) and the local-search quadratic approximation CGS for the Tikhonov regularization solution. We also present general algorithms for the smoothest model solution and the subspace solution. From these forms, the inter-relationships between the various solutions are established and found to be easily understood. To the best of our knowledge, such inter-relationships have not been previously published. The remainder of this paper is organized into four sections. The very next section defines the two most popular solutions for geophysical inversion: the Tikhonov regularization solution and the smoothest model solution. The following three sections develop the general forms of equations and algorithms for the two kinds of solutions, as well as the subspace solution. The inter-relations for these three kinds of solutions and algorithms are presented and explained.

2. Definition of the inverse solution In a practical sense, geophysical data are discretized and incomplete due to the limited number of sampling points in both space and time. Furthermore, the observations are contaminated by various forms of noise, so that the geophysical inverse problem is ill-posed (Kirsch 1996). In other words, the geophysical inverse problem has multiple solutions—there exist many models that produce a satisfactory fit within the uncertainty of the observed data. This must be taken into account when interpreting any geophysical measurements. In order to reduce the number of possible solutions and to find a realistic model, which at least incorporates some plausible geological or physical features (rather than being a pure mathematical solution), criteria that describe what kind of model or solution to seek, and which utilizes a priori information on the model, must be first established. Then an optimization algorithm is employed to find the ‘best’ model, subject to these criteria. Normally, the criteria are covered by a misfit function or objective function, or cost function, that gives a quantitative measure of the fit between the calculated and observed geophysical data and the roughness of the model 102

variation. In general, the lp (e.g. p = 1, 2, ∞)-norm may be used for the definition of the misfit function. It has been realized that the usage of the l1 -norm (least absolute deviation) has the merit of fitting most of the data and ignoring some the outliers of the data, but it leads to non-linear normal equations, even if applied to a simple linear problem and is not uniquely solvable because of the loss of strict convexity of the norm (Scales and Smith 1994). Among all the norms, the least-squares (l2 -)norm is the one that allows the easiest computations, so it is most frequently applied to geophysical inversion. However, it is particularly vulnerable to any outliers in the data. There are two l2 -normed misfit functions for evaluating the goodness of fit in both the data space and the model space, respectively, d (m) = Wd [d0 − d(m)]2 d 1 [d0i − di (m)]Wijd [d0j − dj (m)], 2 i,j

N

=

(1)

m (m) = Wm (m − m0 )2 m 1 (mi − m0i )Wijm (mj − m0j ). 2 i,j

N

=

(2)

where d0 and d(m) are vectors of the observed and computed (theoretically calculated) data. Here d(m) may be calculated with a numerical (simulation) method, such as finite difference or finite element methods, when the model m = m1 , m2 , . . . , mNm is given. The quantities m and m0 are the unknown (to be determined) model and the initial, primary estimated model-parameter vectors, respectively. Wd and Wm are two symmetric matrices which are called weighting matrices of the data set and the model parameters, respectively. They can contain a priori information which is very useful in reducing the multiple solutions. In practice, there are several choices for Wd and Wm according to the definitions of the solution. For instance, Wd may take the form of a unit matrix Wd = I (common least squares) or an inverse covariance matrix of the data: Wd = Cd−1 (generalized least squares), and Wm may be an inverse covariance matrix of the model parameters, e.g. Wm = Cm−1 (Tarantola and Vallette 1982). In order to obtain the least or minimum roughness of the model-parameter variation, the first or second difference operator may be chosen for Wm , that is, Wm = ∂ T ∂ or Wm = (∂∂)T (∂∂) (Menke 1984, Constable et al 1987). A general form for such an operator for two-dimensional (2D) inversion was given by Ellis and Oldenburg (1994a), who suggested a combination of the difference operators: Wm = α0 I + αx ∂xT ∂x + αz ∂zT ∂z , where α0 , αx and αz are three constants, and ∂x and ∂z are the difference operators in the two spatial (x and z) directions. Sasaki (1994) proposed a flatness filter operator for Wm , namely Wm = C T C, where C = {C1 , C2 , . . . , CM }T is assembled in terms of a shifting four- or six-point flatness filter, e.g. W S N ˆ = αi δmE mi = Ci δ m i + δmi + δmi + δmi − 4δmi . The superscripts E, W, S and N denote the four immediate neighbours of the ith model parameter and αi is called the gradient amplifying factor.

Solutions for geophysical inversion

In addition, as an approach to performing an l1-normed inversion, the weighting matrices Wd and Wm can be specified, e.g., Wd = diag(|dj − dj (m(k) )|), Wm = diag m(k) j − m0j (Wolke and Schwetlick 1988, Farquharson and Oldenburg 1998, Aster et al 2005). Such l1-normed inversions can be accomplished with the algorithms for the l2-normed inversion and yield sharp boundaries or so-called ‘blocky models’ of the subsurface. The l1 - inversions are robust to outliers of the data, as mentioned previously. Such features of the l1-normed inversion may be very useful in practical applications (Loke et al 2003, Zhou and Dahlin 2003). Accordingly, the algorithms described in this paper are equally applicable to the l1-normed inversions, and from now on we do not discriminate between them. Apart from soft constraints introduced through the Wm matrix, one may also impose hard constraints on the allowable values of the model parameters, by setting the values of mi after each iteration to be either some lower limit (m = ai if mi < ai ) or upper limit (m = bi if mi > bi ). Otherwise, if the value of m returned by the inversion lies within the acceptable range it is retained (m = mi if ai < mi < bi ) (see Carrion 1989 and Zhou et al 1992). A popular definition of the objective function for solution of the ill-posed geophysical inverse problem is the Tikhonov regularization function (Tikhonov and Arsenin 1977): (m) = d (m) + λm (m).

(3)

Here λ is termed a regularization parameter that depends on the error in d0 and is normally chosen or set before one starts to compute the solution of equation (3). With the linearized approximation, it can be shown that the optimal regularization parameter λ depends on the bounds of the exact solution, but these are not generally known in advance (Kirsch 1996). It is called the trade-off parameter or damping parameter, because it balances or ‘trades off’ the two undesirable properties of the model: when λ is very large, (m) effectively measures the model norm m (m); when λ is very small, (m) is proportional to the data misfit d (m). The aim of the inversion is to find the model through solving the optimization problem with a given λ: min{(m)} = min{d (m) + λm (m)}. Cm−1 ,

(4)

Cm−1

model is that one does not wish to be misled by features that appear in the model but are not essential in matching the noisecontaminated observations. In other words, of all the possible solutions, we seek the simplest (smoothest) one in the sense that it requires the least spurious features not required by the data. This is the principle of parsimony (Occam’s razor). The advantage of inverting for a maximally smooth model is to obtain a specified model whose features we have chosen by taking the different weighting matrix Wm in the model norm m (m). This constrained optimization problem can be rewritten as an unconstrained optimization problem using a Lagrange multiplier λ−1 : min{m (m) + λ−1 [d (m) − χd ]}.

(6)

It should be noted that λ−1 is an unknown parameter and must be determined in solving equation (6), unlike the tradeoff parameter λ, to be chosen in advance of attempting a solution of equation (4). So, the two optimization problems (4) and (6) theoretically give two kinds of model solutions. The relationships and the similarities between these practical algorithms for the two solutions will be shown in the following sections.

3. Tikhonov regularization solution From the definitions of the data and the model misfit functions (1) and (2), the Tikhonov function (3) is differentiable. The solution must be a stationary (or minimum) point having zero derivative, leading to

∂d T λWm (m − m0 ) = Wd [d0 − d(m)]. (7) ∂m ∂d is the so-called sensitivity (Jacobian) matrix formed Here ∂m by the Fr´echet derivatives of the synthetic data d(m) with respect to the model parameters, and whose solution is difficult to directly find because of the non-linear nature of the function d(m). We can only evaluate the derivatives for the current (to be updated) model parameters. So, an iterative scheme will be used to compute the solution.

3.1. General iterative formula

Wd = and λ = 1, the For the specification Wd = solution of equation (4) is called the generalized least-squares solution (Tarantola and Vallette 1982). Taking different weighting operators with λ = 1 yields the weighted leastsquares solution (Menke 1984). In general, the solution of equation (4) may be called the Tikhonov regularization solution. Another popular definition of the inverse solution is the so-called smoothest model solution or Occam’s inversion (Constable et al 1987, Ellis and Oldenburg 1994a, Oldenburg and Li 1994, LaBrecque et al 1996). It involves solving min{m (m)}, (5) d (m) = χd ,

A general iterative approach to the solution of equation (7) can be constructed in the following simple way. Firstly, by ∂d T ∂d simultaneously adding the term ∂m Wd ∂m (m − m0 ) to both sides of equation (7) and multiplying by Wm−1 , assuming the operator Wm is invertible (this can be satisfied for most of the choices of Wm ), it follows that

∂d T ∂d −1 Wm + λI (m − m0 ) Wd ∂m ∂m

∂d ∂d T −1 = Wm Wd [d0 − d(m)] + (m − m0 ) . ∂m ∂m (8)

instead of equation (4). Here χd is the desired tolerance of d (m). The basic motivation for seeking the smoothest

This is equivalent to equation (7). Next, considering the matrix −1 ∂d T ∂d Wm ∂m Wd ∂m + λI to be generally invertible (so long 103


as an appropriate positive value is chosen for the parameter λ), equation (8) can be rewritten as follows:

−g

∂d T ∂d ∂d T −1 −1 Wd · Wm m = m0 + Wm + λI ∂m ∂m ∂m

∂d (m − m0 ) . × Wd [d0 − d(m)] + (9) ∂m Finally, an iterative approach to solving equation (9) is suggested,

−g

∂d T ∂d −1 Wd + λI mk+1 = m0 + Wm ∂m k ∂m k

∂d ∂d T −1 × Wm Wd {[d0 − d(mk )] + (mk − m0 )}, ∂m k ∂m k (k = 0, 1, 2, 3, . . .), (10) where the superscript −g denotes the general inverse matrix. By subtracting mk from both sides of the above equation and then combining the terms (mk − m0 ), an alternative form for the solution is obtained:

−g

∂d T ∂d −1 mk+1 = mk + Wm Wd + λI ∂m k ∂m k

∂d T −1 × Wm Wd [d0 − d(mk )] − λ(mk − m0 ) , ∂m k (k = 0, 1, 2, 3, . . .).

(11)

The iterative formulae (10) and (11) are similar to the total inversion formula given by Tarantola and Vallette (1982), where Wd = Cd−1 , Wm = Cm−1 and λ = 0, and Tarantola (1984) equation (17), where Wd = Cd−1 , Wm = Cm−1 and λ = 1, who used the fixed-point theory for solving the generalized leastsquares solution. They are also similar to the general solution developed by Carrion (1989) equation (41), where Wd = Cd−1 and Wm = Cm−1 , for unconstrained non-linear inversion. 3.2. Iteratively linearized scheme Another simple scheme for the Tikhonov regularization solution is the iteratively linearized inversion. Taking a Taylor series expansion and retaining only terms to first order, the synthetic data can be approximated by d(m) ≈ ∂d (m − m0 ). The stationary point equation (7) d(m0 ) + ∂m m0 now becomes

∂d T ∂d Wd + λWm (m − m0 ) ∂m m0 ∂m m0

∂d T Wd [d0 − d(m0 )]. (12) = ∂m m0 A recurrence procedure is suggested in terms of the above equation:

∂d T ∂d Wd + λWm (mk+1 − mk ) ∂m k ∂m k

∂d T Wd [d0 − d(mk )], = ∂m k (k = 0, 1, 2, 3, . . .). (13) 104

In the simple case Wd = I and Wm = I , the above equation reduces to the Levenberg–Marquardt version, which has been widely used for geophysical inversion (Pelton et al 1978, Tripp et al 1984, Smith and Vozoff 1984, Dabas et al 1994). Another iteratively linearized approach may be chosen in the following way. Setting mk+1 = mk + δmk and d(mk+1 ) ≈ d(mk ) ∂d T + ∂m (mk+1 − mk ) and replacing m with mk+1 in (7), we k have the following form:

∂d T ∂d Wd + λWm (mk+1 − mk ) ∂m k ∂m k

∂d T = Wd [d0 − d(mk )] − λWm (mk − m0 ) ∂m k (k = 0, 1, 2, 3, . . .). (14) There is a difference between formulations (13) and (14). The former gives the sequence of solutions whose variation Wm (mk+1 − mk )2 at each iteration is as small as possible under a given trade-off parameter; the latter yields another sequence of solutions, all of which are not going to be far away from the initial guess m0 , because for any k the quantity Wm (mk − m0 )2 is kept as small as possible under the current trade-off with the data misfit function. The λWm (mk − m0 ) term of equation (14) makes the resulting model smoother than the one produced by equation (13), for which the only constraint is that the changes (mk − mk−1 ) must be smooth. After several iterations the changes could add up so that the overall model can be significantly rougher than that obtained via equation (14). To solve equation (13) or (14), some linear-equation solvers, e.g. singular value decomposition (SVD), can be employed. A better algorithm has been suggested for equation (13) (Lines and Treitel 1984, Sasaki 1994), because the solution of equation (13) is equivalent to the least-squares solution of the following rectangular system, √ ∂d √

Wd ∂m k Wd [d0 − d(mk )] (mk+1 − mk ) = , (15) √ 0 λWm whose solution is well known to be more accurate than the solution obtained via equation (13). Equation (15) may be solved with SVD or conjugate gradient least squares (CGLS) (Scales and Smith 1994). However, it should be noted that to solve equation (15) with any matrix method, it is necessary to perform the operations on a large dimensional matrix when a large-scale geophysical inversion is involved. The size of the matrix in equation (15) is much larger than the original in equation (13). If Wm is invertible, the coefficient matrix in equation (13) or (14) is invertible too provided the parameter λ is a large enough positive value. So, we prefer to rewrite equations (13) and (14) in the following recurrence form:

−g

∂d T ∂d −1 mk+1 = mk + Wm Wd + λI ∂m k ∂m k

∂d T × Wm−1 Wd [d0 − d(mk )], (16) ∂m k


mk+1 = mk + ×

Wm−1

Wm−1

∂d ∂m

T

∂d ∂m

T

Wd

k

∂d ∂m

−g

type (Luenberger 1984). The Polak–Ribiere algorithm is implemented by the following steps (hereafter referred to as algorithm-CG1): ∂d T = ∂m Wd [d(m0 ) − Step 1. Given m0 compute gˆ 0 = ∂ ∂m 0 0 d0 ] and set dˆ0 = −gˆ 0 .

+ λI k

Wd [d0 − d(mk )] + λ(m0 − mk ) , k

(k = 1, 2, . . . ,).

(17)

One can see that equation (17) is the same as the general iterative formula (equation (11)) for the Tikhonov regularization solution. It follows that the general iterative solution (including Tarantola and Valette’s total inverse formula and Carrion’s general solution) is just the iteratively linearized solution. Furthermore, Tarantola (1984) pointed out that the total inversion formula is a form of the steepest descent method, so the same conclusion can be reached; namely, the solution (11) or (17) is equivalent to that obtained by the steepest descent method mk+1 = mk − αˆ k · gˆ k ,

(18)

whose maximum descent direction is given by the vector

∂ ∂d T −gˆ k = −Wm−1 = Wm−1 ∂m k ∂m k × Wd [d0 − d(mk )] + λ(m0 − mk ), (19) and the step lengths for correcting the model parameters are scaled by the operator

−g

∂d T ∂d −1 αˆ k = Wm Wd + λI . (20) ∂m k ∂m k So, the general form of the steepest descent algorithm may be written as

T −1 ∂d Wd [d0 − d(mk )] + λ(m0 − mk ) , mk+1 = mk + αˆ Wm ∂m k (21) where αˆ is a step length scaling operator. A simple case is αˆ = αI , which may be determined by the line search for min{(mk − α gˆ k )}. In practical applications, the line search for the optimal step length α will consume much computer time in a large-scale inverse problem, because in checking the descent of the objective function with a guessed value of α, it is necessary to implement the numerical calculation of the synthetic data d(mk − α gˆ k ) many times to obtain the optimal step length. However, the step length scaling operator given by equation (20) efficiently finds the optimal value of the step length in the steepest descent method. Daily and Owen (1991) used a simple form of this formula (with λ = 0, Wd = I and Wm = I ) in their experiments for crosshole resistivity tomography.

Step 2. For k = 0, 1, 2, . . . , n − 1: (a) Set mk+1 = mk + αk dˆk where αk minimizes (mk + αk dˆk ). ∂d T = ∂m W [d(mk+1 ) − d0 ] + (b) Compute gˆ k+1 = ∂ ∂m k+1 k+1 d λWm (mk+1 − m0 ). (c) Stopping test |gˆ k+1 | ε. If satisfied, END, otherwise unless k = n − 1, set dˆk+1 = −gˆ k+1 + βk dˆk , where T gˆ T gˆ k+1 βk = (gˆ k+1 −gˆ Tgˆgkˆ) gˆ k+1 = k+1 . gˆ T gˆ k

k

k

k

Step 3. Replace m0 with mn and go back to step 1. Here n is the number of model parameters. The main advantages of this algorithm are the requirement of less memory storage for the three vectors gˆ k+1 , gˆ k and dˆk and a significant reduction in computations since the Hessian matrix (the second derivatives) does not need to be found at each iteration. The global convergence feature of the method is assured with a periodic restart after n iterations from the last point obtained. This algorithm has been applied to resistivity inversion (Ellis and Oldenburg 1994b) and acoustic waveform tomography (Reiter and Rodi 1996). The drawback of the algorithm is that it is still computer-time expensive because, like the general steepest descent method, the search of the optimal step factor αk is done during each iteration. But, the algorithm can be directly applied to solving a linear-equation system like equations (13) and (14) (Luenberger 1984), in which the search of the optimal αk is not required. 3.4. Quadratic approximation CGS If (m) is twice differentiable and the second derivative of (m) can be computed with reasonable computer time, the line search in the CGS can be avoided in the manner of the quadratic approximation, because one can directly calculate the optimal step factor αk with the Hessian derivatives,

∂ 2 ∂d ∂d T = Wd (22) + Q + λWm , ∂ 2m ∂m ∂m where

  Q = (Qij )n×n , Nd ∂ 2 dk  [dl (m) − d0l ]Wlkd . Qij = l,k ∂mi ∂mj

(23)

2

3.3. Conjugate gradient solver (CGS) The conjugate gradient method is a common optimization method which has better convergence properties than the iteratively linearized or steepest descent method (Bazaraa and Sherry 1979). Empirical evidence seems to favour the Polak–Ribiere algorithm over other versions of the general

dk and ∂m∂ i ∂m is the second derivative of d(m). Using the j quadratic approximation (which is superior to a linear one),

∂ T (m) ≈ (mk ) + (m − mk ) ∂m k

2 ∂ 1 (m − mk ), (24) + (m − mk )T 2 ∂ 2m k

105


and substituting the recurrence formula m = mk + αk dˆk into the above, then setting the derivative with respect to αk to zero, we obtain the optimal step factor, T dˆ − ∂ rˆkT rˆk k k αk = T ∂m = T , 2 T ∂ qˆ Wd qˆ k + dˆ Qk dˆk + λdˆT Wm dˆk dˆ 2 dˆk k

∂ m k

k

k

k

(25) ∂ ˆ where qˆ k = ∂m k d k , rˆk = − ∂m k . Here we have used the identity rˆkT dˆk = rˆkT (ˆrk + βk−1 dˆk−1 )= rˆkT rˆk due to the conjugate property rˆkT dˆk−1 = 0. Using the approximation (24) again, we obtain the expressions

∂d T Wd qˆ k + (Qk + λWm )dˆk , (26) rˆk+1 = rˆk − αk ∂m k ∂d

βk =

T rˆk+1 rˆk+1 . T rˆk rˆk

(27)

Replacing the formulae for αk , gˆ k+1 and βk with equations (25), (26) and (27) in the Polak–Ribiere algorithm, the quadratic approximation CGS reduces to the following algorithm (referred to as algorithm-CG2): ∂d T Step 1. Given m0 compute rˆ0 = −gˆ 0 = ∂m Wd [d0 −d(m0 )] 0 and set dˆ0 = rˆ0 . Step 2. For k = 0, 1, 2, . . . , n − 1: ∂d (a) Calculate qˆ k = ∂m dˆ , αk = k k

rˆkT rˆk

qˆ kT Wd qˆ k + dˆTk Qk dˆk + λdˆTk Wm dˆk

and mk+1 = mk + αk dˆk . ∂d T Wd qˆ k + (Qk + λWm )dˆk . (b) Compute rˆk+1 = rˆk − αk ∂m k (c) Stopping test |ˆrk+1 | ε. If satisfied, END, otherwise unless k = n − 1, set dˆk+1 = rˆk+1 + βk dˆk , where rˆ T rˆk+1 βk = k+1 . rˆ T rˆ k

k

Step 3. Replace m0 with mn and go back to step 1. An attractive feature of the algorithm is that no line searching is required at any iterative stage. Also the algorithm converges in a finite number of steps for a quadratic problem (Minoux 1986). The undesirable feature is that the method requires the evaluation of Qk , which involves the calculation 2 d of the second derivatives of the data ∂m∂i ∂m (see equation (23)). j Therefore, a fast and efficient scheme to approach the second derivative must be employed in the algorithm, otherwise, the algorithm will spend much computer time on such computations. A practical method to compute the second 2 d derivatives ∂m∂i ∂m for 2.5-D Helmholtz equation inversion j (namely, crosshole resistivity and acoustic velocity inversion) is given by Zhou and Greenhalgh (1999). Note that in the above procedure the first and second 2 derivatives of the objective function, ∂ and ∂∂ 2 , need ∂m m k k∂ 2 d ∂d to be updated by the computation of ∂m k and ∂mi ∂mj k during each iteration. This means that the algorithm searches for the minimum of (m) in the global model space subject to the quadratic approximation, and the local minimum is directly obtained by the optimal step length αk in the conjugate direction, namely mk+1 = mk + αk dˆk . The Hessian matrix 106

∂2

must be positive definite at any kth iteration so as to ∂2m k successively find the local minimum sequence approaching the global minimum. In exact terms, this algorithm may be called the global-search quadratic approximation CGS. Strictly speaking, if (m) deviates from a standard quadratic or if there is violation of the positive definite property, the algorithm is not globally convergent in this form. On the other hand, in practical applications the positive definite property is unknown in advance. Examination of the expression for the Hessian matrix (22) shows that appropriate choices for the regularization parameter λ and the operators Wd and Wm may be helpful to ensure the required condition. At the very least, this is expected to improve the property of the Hessian matrix so that the algorithm converges to the global minimum. To this end, another iterative scheme based on the quadratic approximation is outlined below; that is, to improve the property of the Hessian matrix by making proper choices for Wd , Wm and λ and thus successively find the local minimum sequence, we proceed as follows: seek {m∗1 , m∗2 , m∗3 . . .} satisfying (m∗1 ) > (m∗2 ) > (m∗3 ) . . . with the quadratic approximation until a desired solution ((m∗n ) < ε) is obtained. We search the local minimum around m∗l (an instantaneous guess for the model), using the quadratic approximation (24) again and set ∂ = 0. The local ∂m minimum becomes the solution of the following equation,

∂d T ∂d −1 Wm Wd + Qm∗l + λI (m − m∗l ) ∂m m∗l ∂m m∗l

∂d T = Wm−1 Wd [d0 − d(m∗l )] − λ(m∗l − m0 ), (28) ∂m m∗l from which one can see that the coefficient matrix on the RHS can be regularized by an appropriate λ and it is invertible for the solution m∗l+1 :

−g

∂d ∂d T ∗ ∗ −1 ml+1 = ml + Wm Wd + Qm∗l + λI ∂m m∗l ∂m m∗l

∂d T −1 ∗ ∗ × Wm Wd [d0 − d(ml )] + λ(m0 − ml ) , ∂m m∗l (l = 0, 1, 2, . . .).

(29)

This is nothing other than the pure form of Newton’s iterative method (Minoux 1986). Comparing equation (29) with the general iterative solution (11) or the iteratively linearized solution (17), one can conclude that the local-search quadratic approximation solution (29) is a natural extension of the general iterative solution or the iteratively linearized solution in which the effect of the Hessian matrix Q associated 2 dl with the second derivatives of the data ∂m∂i ∂m is considered. j Consequently, expression (29) is the general form of the l2 -norm Tikhonov regularization solution. Any numerical method to calculate the inverse matrix in equation (29), such as SVD or the LU decomposition algorithm, may be employed. But such matrix inversion algorithms will consume much computer time and memory space when the dimension of the matrix is very large. Actually, the local minimum m∗l+1 (solution of equation (29)) can be found by applying


General Iterative Fomula A Specified Steepest Descent Solution

Iteratively Linearized Scheme Tikhonov Regularization Solution

Local-search Quadratic Approximation CGS Levenberg–Marquardt Version

Conjugate Gradient Solver (CGS)

An Iterative , Newton s Method

Figure 1. Algorithms and inter-relations for the Tikhonov regularization solution.

the quadratic approximation CGS and the following iterative procedure (referred to as algorithm-CG3): Step 1. Start at m∗0 , for l = 0, 1, 2, . . . , lmax and compute ∂d T ˆ ∗l ) = Wm−1 ∂m rˆ0 = −g(m W [d − d(m∗l )] + λ(m0 − m∗l ) m∗0 d 0 and set dˆ = rˆ0 . 0

Step 2. For k = 0, 1, 2, . . . , kmax T

rˆ rˆ and m ˆ k+1 = (a) calculate αk = T T k−1 k qˆ k Wd qˆ k + dˆk Wm Qm∗ dˆk + λdˆTk dˆk l ∂d −1 T ∂d ˆ k + αk dˆk , qˆ k = ∂m m∗ Wm dˆk , qˆ k = ∂m m dˆ ; m∗l k l −1 ∂d T −1 (b) compute rˆk+1 = rˆk − αk Wm ∂m m∗ Wd qˆ k + Wm Qm∗l + l λI )dˆk ; ˆ k+1 and go to (c) test |ˆrk+1 | ε. If satisfied, set m∗l+1 = m step 3, otherwise set dˆk+1 = rˆk+1 + βk dˆk , where βk = T rˆk+1 rˆk+1 , go to step 2 for next k. rˆ T rˆ k

k

Step 3. Stopping test |(m∗l+1 )| < ε, if satisfied, stop the procedure, otherwise, go back to step 1 for next l. Here lmax and kmax are the maximal iterations for l and k respectively. The main difference between this algorithm and the quadratic approximation CGS is that the derivatives 2d ∂d and ∂m∂i ∂m ∗ are not updated with the iteration ∂m m∗l j ml k because only the local minimum m∗l+1 is searched in the neighbourhood of m∗l . To discriminate this algorithm from the other we call it the local-search quadratic approximation CGS. The convergence of the algorithm depends upon whether ∗ ∗ the local minimum sequence, {m∗1 , m 2 , m3, . . .} satisfying (m∗1 ) > (m∗2 ) > (m∗3 ) · · · > m∗lmax is successfully found or not. Fortunately, the regularization parameter λ enables us to control the procedure, because an appropriate positive value of the regularization parameter λ makes the coefficient matrix invertible so that there is only one nonzero solution to be found. The parameter can be determined by several tests so that the misfit function (m∗l ) (l = 0, 1, 2, . . . ) keeps decreasing as the iterations proceed. If the effect of Qm∗l is ignored in the algorithm, the solution reverts

to the general iterative solution or the iteratively linearized solution, because equation (29) then reduces to equation (11) or (17). The main advantages of this algorithm are (1) easy control of the convergence by appropriate setting of λ, and (2) the computational efficiency for obtaining the local minimum sequence. A simple form of the algorithm (Wm−1 = Cm , Wd = Cm−1 and Qm∗l ≈ 0) has been applied to nonlinear seismic travel-time tomography by Zhou et al (1992), who demonstrated the effectiveness and the computational efficiency of the algorithm for a large, sparse matrix inverse problem. The above discussion has drawn out the connections or inter-relationships for these Tikhonov regularization solutions. They are shown in schematic form in figure 1. The diagram shows that the three solutions—general iterative, iteratively linearized and Levenberg–Marquardt solutions— are equivalent to each other, and all of them may be considered as a specified steepest descent solution and solved by the conjugate gradient (CGS) algorithm. Meanwhile, the localsearch quadratic approximation CGS is a pure form of Newton’s iterative method and forms a generalized iterative algorithm for the Tikhonov regularization solution because except for the solution by CGS all other solutions can be obtained by just ignoring the second derivative term, namely, Q ≈ 0 (see equation (22)) in the procedure.

4. The smoothest model solution As mentioned before, an alternative approach to geophysical inversion is to seek the solution of the smoothest model (Occam’s inversion), that is, to mathematically solve the nonconstrained optimization problem (6), whose solution must satisfy the stationarity conditions: ∂ m d = ∂ + λ−1 ∂ = 0, ∂m ∂m ∂m (30) ∂ = d (m) − χd = 0. ∂λ 107


(a)

Tikhonov Regularization Solution

(b)

Algorithm for the Smoothest Solution Model

The Smoothest Solution Model

Φd ( m) = χ d

1. Tikhonov Regularization Solution: m λ

2. min{ Φd

( mλ ) − χ d }

Figure 2. Illustration of (a) the relationship between the Tikhonov regularization solution and the smoothest model solution, and (b) a general practical algorithm for the smoothest model solution.

This can be recast in the form ∂d T λWm (m − m0 ) = ∂m Wd [d0 − d(m)], [d0 − d(m)]T Wd [d0 − d(m)] = χd .

(31)

By considering the effect of the Lagrange multiplier λ−1 as being equivalent to the Tikhonov regularization parameter, one finds that the first of the above equations is the same as equation (7) which defined the Tikhonov regularization solution. This shows that the smoothest model solution is just one subset of the Tikhonov regularization solution, which must also satisfy the second equation in (31). In principle, the Lagrange multiplier λ−1 must be first determined from the two equations in (31). This differs from the Tikhonov regularization solution where λ must be assigned a specific value in advance. Unfortunately, due to the non-linear property of d(m) it is very difficult to find the explicit form or exact value for λ from (31). But, by exploiting the close relationship of this solution to the Tikhonov regularization solution, a practical algorithm to find the smoothest model solution can be broken down into the two part sequence: find the Tikhonov regularization solutions, then check the criterion of data fit (see figure 2(a)). For example, some such schemes were proposed by Constable et al (1987), Ellis and Oldenburg (1994a), Oldenburg and Li (1994) and LaBrecque et al (1996). In fact, a general procedure is suggested from figure 2(b). It comprises the following steps (hereafter referred to as algorithm-SMCG): Step 1. Start at m∗l = m∗0 , for l = 0, 1, 2, . . . , lmax . Step 2. (a) Search for an optimal λi ∈ [λmin , λmax ], i = 1, 2, . . . , Nλ . ∂d T = ∂m Wd d0 − d m(l) (b) Solve: λi Wm m(l) λi − m 0 λi for m(l) λi . (c) Calculate d m(l) λi . Step 3. Obtain solution: the instantaneous − χ m∗l = min d m(l) d , i = 1, 2, . . . , Nλ . λi Step 4. Stopping test: d (m∗l ) ε < χd , if satisfied, stop the procedure, otherwise, go back to step 1 for next l. 108

Here ε stands for the stopping criterion. Obviously, the major part of the procedure (step 2) is to search for the optimal regularization parameter and solve the same equation as in the Tikhonov regularization problem. Any of the one-dimensional minimization schemes, such as Brent’s method and the golden section search, or the L-curve and cross-validation schemes (Farquharson and Oldenburg 2004) may be applied to this procedure, and any of the algorithms discussed in the previous sections for the Tikhonov regularization solution may be directly used for m(l) λi . Actually, step 2 is to detect the misfit function of the data with the guessed model and the trial value of λi . As the number (Nλ ) of trial values of the parameter λi is increased, the greater the number of times the algorithm for the Tikhonov regularization solution must be implemented. An efficient scheme would be to firstly evaluate the choices of λ by which the computation of step 2 consumes the least amount of computer time as possible and the misfit function d (mλ ) is made as small as possible, so that it quickly reaches the desired tolerance χd . As mentioned earlier, due to the non-linear nature of the function d(mλ ) there is no analytical method for finding the range of λ. It depends upon the error level of the data, the practical choice which is made for the weighting operators Wd and Wm , the nature of the model parameterization, and the bounds on the true model variation. Consequently, the one-dimensional minimization scheme is necessary in step 2. Step 3 implies the constraint condition specified by equation (5) and yields the approximation to the smoothest model solution that has the minimal data misfit function. The solution procedure seems like an adaptive Tikhonov regularization solution, because in each iteration an optimal regularization parameter λ or Lagrange multiplier λ−1 is determined. From the above discussion, it can be concluded that the smoothest model solution is a subset of the Tikhonov regularization solutions, but is perhaps more reasonable and practicable due to the employment of an optimal regularization parameter and removal of the undesired solutions: {m∗λ |d (m∗λ ) > χd } in the procedure. The downside is that it costs much computer time in the one-dimensional minimization search; that is, one must carry out multiple (Nλ) searches for the regularization parameter λ with the smoothest model solution, whereas the Tikhonov solution uses a fixed (one λ) value for each iteration. There is thus a factor of Nλ difference in the computer time between the two approaches (see Constanble et al 1987). From a methodology viewpoint, the algorithm for the Tikhonov regularization solution is the basis for obtaining the best approximation to the smoothest model solution.

5. Generalized subspace solution So far, we have discussed various algorithms for two kinds of solutions to geophysical inverse problems: the Tikhonov regularization solution and the smoothest model solution. It has been shown that the Tikhonov regularization solution incorporates the smoothest model solution and that most of the algorithms for the former can be directly applied to the latter. So, the most basic algorithm for local-search geophysical


inversion can be considered to be the Tikhonov regularization solution. There exists another method, called the subspace method or subspace solution. It was presented by Skilling and Bryan (1984), who maximized an entropy norm suitable for image reconstruction, and by Kennett and Williamson (1988) and Sambridge et al (1991) for seismic inversion. It was further developed by Oldenburg et al (1993) for large-scale inverse problems. The principal advantages of the method lie in the outstanding computational efficiency and the practical efficacy achieved by judicious choice of the basis vectors spanning the model subspace. Actually, this method is not only valid for the Tikhonov regularization solution, but also for the smoothest model solution. In earlier sections, we showed that the algorithm for the Tikhonov regularization solution is easily extended to the smoothest model solution, and that the localsearch quadratic approximation CGS may be considered a general algorithm for the Tikhonov regularization solutions. For simplicity, we firstly present the generalized subspace method in the Tikhonov regularization case, then make the extension. The primary thrust of the method is to express the model perturbation: δm = m − mk ∈ R Nm in the form of a q-dimensional subspace R q (⊂ R Nm ) spanned by the assigned basis vectors {bi }i = 1, 2, . . . , q, that is, q δm = αi(k) bi = Bαˆ k , (32) i=1

where B = (b1 , b2 , . . . , bq ) and αˆ k = α1(k) , α2(k) , . . . , αq(k) . Substituting the model perturbation given by equation (32) into the quadratic approximation form of the Tikhonov function (24), we obtain

2 ∂ T 1 ∂ Bαˆ k + αˆ kT BT Bαˆ k . (αˆ k ) ≈ (mk ) + ∂m k 2 ∂ 2m k (33) With the subspace approach we seek the vector αˆ k by minimizing (αˆ k ). Thus by setting ∂∂ = 0 we arrive at αˆ k the equation

2

∂ ∂ T T B Bαˆ k = −B . (34) ∂ 2m k ∂m k Substituting for the derivative terms into the above equation yields T Ak Wd Ak + Qk + λWm αˆ k = ATk Wd [d0 − d(mk )] − λB Wm (mk − m0 ), T

where

Ak =

(35) ∂d ∂m

B,

(36)

k

Qk = BT Qk B,

(37)

Wm = BT Wm B.

(38)

From equations (32) and (35) the Tikhonov regularization solution can be written in the subspace approximation form: −g mk+1 = mk + B ATk Wd Ak + Qk + λWm × ATk Wd [d0 − d(mk )] + λBT Wm (m0 − mk ) . (39)

Obviously, when B = I (unit matrix, identical transform) the subspace approximation goes back to the whole-space solution (29). From equations (36)–(38), we see that if the basis vectors {bi }, i = 1, 2, . . . , q, are properly chosen with a small q in a large-scale inversion situation (Nm is very large), then the dimensional size (q × q) of the general inverse matrix in equation (39) may be much less than that the (Nm × Nm ) matrix in the original form (29), where the solution is arrived at in the whole model space R Nm . From the result given in (39), we see that the subspace formulation requires us to compute the general inverse of the matrix. But the matrix may be singular if the basis vectors are linearly dependent. So, ortho-normalizing the vectors is necessary before using them. After choosing the basis vectors, the dimensions of the transformed matrices ATk Wd Ak , Qk and Wm in equation (39) become much smaller and the inverse matrix may be more efficiently obtained by SVD with a given λ than that in the whole-space solution (29). Also, it can be efficiently solved by the CGS algorithm. Recalling the general procedure for the smoothest model solution (previous section) and using the result (39), the subspace approach to finding the smoothest model solution can be reduced to the following steps (referred to as algorithmSUBCG): Step 1. Choose the basis vectors B = (b1 ,b2 , . . . ,bq ) and start at m∗l = m∗0 for l = 0, 1, 2, . . . , lmax. Step 2. (a) Search for an optimal λi ∈ [λmin , λmax ], i = 1, 2, . . . , Nλ . (b) Obtain the subspace approximation:

∂d T ∂d (l) ∗ m λi = m l + B B T Wd B + BT Ql B ∂m l ∂m l

−g

∂d T T T + λ i B Wm B · B Wd [d0 − d(m∗l )] ∂m l + λi BT Wm (m0 − m∗l ) . (c) Calculate d m(l) λi . Step 3. Obtain solution: the instantaneous m∗l = min d m(l) λi − χd , i = 1, 2, . . . , Nλ . Step 4. Stopping test: d (m∗l ) ε < χd , if satisfied, stop the procedure, otherwise, go back to step 1 for next l. It should be clear from the above analysis and discussion that the subspace solution is mathematically no more than the introduction of a dimension-decreased transform (32) in the formulation for the Tikhonov regularization and the smoothest model solutions (see figure 3). The key step is to choose the basis vectors B = (b1 , b2 , . . . , bq ). Oldenburg et al (1993) demonstrated a method to choose the basis vectors, which associates them with the gradients of the data and model misfit functions, bk = Wm−1 ∇dk (m), bk =

Wm−1 ∇mk (m)

(k = 1, 2, . . . , q1 )

(40)

(k = 1, 2, . . . , q2 ),

(41) 109


Tikhonov The Regularization Smoothest Solution Model

mm==Bm B ′$

The Subspace Solution

Figure 3. Illustration of the relationships for the three solutions: the Tikhonov regularization solution, the smoothest model solution and the subspace solution.

δ m1 δ m2

δ m1'

δ m3 δ m4

δ m2'

Figure 4. Schematic for reduction from the 4-block parameterization to the 2-block parameterization, in the subspace solution.

where dk and mk are the misfit functions of the data subset dk ⊂ d0 and model subset mk ⊂ m respectively, and q1 + q2 = q. The vectors given by (40) and (41) are actually the steepest ascent directions of the subsets. The segmental misfit q1 functions fabricate the total misfit q2 functions (m) and (m) = d (m) = m k=1 dk k=1 mk (m). Different choices may yield different solutions. So a judicious choice is the pre-condition for successful application of the method. A simple way to choose the basis vectors is to conduct the transform from a fine-grid (normally for forward modelling) model parameterization to a coarse-grid (for inversion) model parameterization. For instance, the 4-block parameterization may be reduced to the 2-block parameterization (see figure 4) if the sparse approximation is reasonable in the inversion. The relationship between the two sets of model parameters can be written in the matrix form as     1 0

δm1 δm2  1 0 δm 1 = Bδm . =  (42) δm =  δm3  0 1 δm2 δm4 0 1 From this simple example, a general form of B can be obtained: (δmi ) ∩ (δmj ) B = [bij ]Nm ×q , . (43) bij = q j =1 (δmi ) ∩ (δmj ) Here (δmi ) stands for the spatial area of the block δmi . Obviously, the column vectors (the basis vectors) in B are independent of each other with this method. This simple subspace method may be useful for adapting the inversion algorithms to different real cases, because a fine model parameterization is normally employed in the forward modelling for reasons of accuracy, but the fine model parameterization is hardly ever reconstructed completely by the inversion algorithms due to the limited number of data 110

samples and the data error contamination (noise). Actually, the data have only limited spatial resolution of the model due to the spatial and angular undersampling. So, in addition to applying the Tikhonov regularization or Occam’s criterion, a proper model parameterization, which matches with the number and quality of the data, may be employed in the inversion procedure (Loke and Barker 1996). The simple matrix (43) gives the transform from a fine model parameterization (in forward modelling) to a proper modelparameter set for inversion. Another possible subspace transform is a polynomial interpolation scheme (Chunduru et al 1996), which allows a smooth variation of the model with a small number of model parameters.

6. Imaging examples From discussions of the previous sections, it should be apparent that equation (39) and the corresponding algorithm (SUBCG) can be considered as the general expression and algorithm for local minimization search geophysical inversion. Such a solution has the features of the Tikhonov regularization solution, the smoothest model solution and the subspace solution, and is applicable to any geophysical inversion problem (be it using an l1 - or l2 -norm). In order to show the capability of this algorithm, we now give two imaging examples: one for a synthetic resistivity model (figure 5) and the other for a seismic crosshole field experiment (figure 6). In figure 5, the original 2D model (figure 5(a)) comprises a layered structure having an embedded high resistive layer (2300 m) and a buried low-resistivity block (300 m), which has a sharp vertical boundary in the middle of the section. For this model, we first calculated the apparent resistivities for three common electrode arrays: Wenner (Wn), Schumberger (Sch) and dipole–dipole (DD), using 50 electrodes at a 1.5 m interval to fully cover the ground surface as shown. Gaussian noise of maximum amplitude 5% was added to the synthetic data (2223 data points) before an inversion was undertaken. Figure 5(b) shows the resistivity pseudosection of the noise-contaminated data. It is not possible to recognize the true structure from this plot. The lateral contrast is completely masked, and the overlying layers are also distorted. Three l2 -norm inversions were carried out with different starting models: homogenous (figure 5(c)), three-layered (figure 5(e)) and a complicated 2D model incorporating three wedges of alternating resistivity (figure 5(g)). The recovered final solutions are shown by figures 5(d ), (f ) and (h), respectively. In these inversions, we employed a cell size of 0.75 m × 0.75 m for the forward modelling, and variable cell sizes in depth (increasing by a factor of 1.4) for the inversion. The golden section search method was used for the optimal selection of the regularization parameter. From these experiments one can see that the first and second starting models (figures 5(c) and (e)) yield satisfactory images of the subsurface (figures 5(d ) and (f )). The homogeneous starting model used a resistivity which was roughly the average of the entire model. The layered starting model had the right trend sequence of low, high and


(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure 5. Inversion results for a synthetic resistivity model experiment: (a) true model; (b) pseudosection of apparent resistivity for Wenner, Schumberger and dipole–dipole arrays; (c) a homogeneous starting model; (d ) inverted model from (c); (e) a three layer starting model; (f ) inverted model from (e); (g) a complicated starting model; (h) inverted model from (g).

0.2

(a)

(b)

(c)

model_A

0.19

RMS of Data

model_B 0.18

model_C

0.17

model_D model_E

0.16

model_F

0.15

(g) 0.14 0

5

10

15

Iteration

(d )

(e)

(f)

Legend for geology structure Matrix Ore Mafics M afics Porphyry Massive Ore Ultramafics

Figure 6. Inversion results for a crosswell seismic tomography experiment: (a) starting model 4.5 km s−1, damping factor 0.1, velocity range 3.5–6.3 km s−1; (b) starting model 4 km s−1, damping factor 0.1, velocity range 3.5–6.3 km s−1; (c) starting model 5 km s−1, damping factor 0.1, velocity range 3.5–6.3 km s−1; (d ) starting model 5 km s−1, damping factor 0.1, velocity range 4.5–6.5 km s−1; (e) starting model, 4.5 km s−1, damping factor 0.2, velocity range 4.5–6.5 km s−1; ( f ) starting model 4 km s−1, damping factor 0.2, velocity range 4.5–6.5 km s−1; (g) convergence curves for the six models.

intermediate resistivities, although the actual values were quite different. But the third starting model (figure 5(g)), which

is two dimensional, produced a seriously distorted image of the subsurface. The deep parts of this starting model depart 111


significantly from the true structure. The l1 -norm inversions were also undertaken and produced no improvement over the results shown in figure 5. We now turn to the seismic example of a crosswell experiment to delineate nickel sulfide mineralization in mafic– ultramafic rocks (see Cao and Greenhalgh 1997). The survey entailed having two wells 30 m apart, and shots fired at 2 m intervals in one well and hydrophone receivers placed every 1 m in the other, over the depth range 140 m to 200 m. First arrival times from 2005 ray paths were inverted in a tomographic sense to recover the P-wave velocity distribution. Ultrasonic measurements made on rock core provided some idea of the expected velocity values. Tikhonov regularized type l2 -inversions (algorithm-SUBCG) were carried out using a different cell size of either 1 m or 2 m, different regularization parameters (λ in the range 0.01–0.2), various homogeneous starting models (having velocities 4 km s−1, 4.5 km s−1, 5 km s−1), and a permitted velocity range (hard constraints) of either 3.5–6.3 km s−1, 4.5–6.5 km s−1 or 5.8–6.5 km s−1. We found that two regularization parameters (0.1 and 0.2), and a cell size of 1 m yielded acceptable results in terms of data fit and convergence. Figure 6 shows the inverted models and the corresponding convergence curves (diagram (g)) for six different situations (labelled as models A to F): • A: starting model 4.5 km s−1, damping factor 0.1, velocity range 3.5–6.3 km s−1; • B: starting model 4 km s−1, damping factor 0.1, velocity range 3.5–6.3 km s−1; • C: starting model 5 km s−1, damping factor 0.1, velocity range 3.5–6.3 km s−1; • D: starting model 5 km s−1, damping factor 0.1, velocity range 4.5–6.5 km s−1; • E: starting model 4.5 km s−1, damping factor 0.2, velocity range 4.5–6.5 km s−1; • F: starting model 4 km s−1, damping factor 0.2, velocity range 4.5–6.5 km s−1. The ‘known’ geology from the two wells plus a third borehole mid-way between is superimposed on the tomograms. There are some differences between the inversion results, but the following conclusions can be drawn. The upper part of the section (mainly komatiite rocks) shows lower velocities of 4500 to 5000 m s−1 (coloured green and yellow) whereas the lower parts, mainly basalts, have higher velocities of 5300 to 6000 m s−1 (coloured red). The bottom of the model shows the velocity of the starting model (4500 m s−1) because no ray paths were available for calculating the velocity distribution below the base of the receiver array. The ore body, which is mainly massive sulfides, lies between the mafics and ultramafics. It is characterized by fairly high seismic velocity and is the dominant and consistent feature on all six images. Several faults which are believed to have displaced the ore, and another prominent one sloping down from left to right, have been interpreted by the geologists and are supported by the sudden changes in the velocity pattern. It is difficult to choose between the various inverted model results. We simply present all six to show the range of non-uniqueness inherent with any real seismic data inversion. 112

7. Conclusions We have presented the general iterative forms of the Tikhonov regularization solution, the smoothest model solution and the subspace solution for geophysical inversion. From the practical implementation viewpoint, the iterative local optimization algorithms for the three solutions have the following inter-relations: (a) the general iterative formula, the iteratively linearized scheme and the Levenberg–Marquardt version are three equivalent solutions for the Tikhonov regularization solution, all of which are a specified form of the steepest descent method; (b) the smoothest model solution is a subset of the Tikhonov regularization solutions, and a practical scheme for implementing the smoothest model solution consists in using the algorithm for the Tikhonov regularization solution plus incorporating the data-fit tolerance; (c) the subspace solution can be obtained from both the Tikhonov regularization solution and the smoothest model solution with the model-parameter transform so long as the chosen basis vectors span the whole model space, which is reasonable for inversion. The generalized iterative algorithms for all the three kinds of solutions may be obtained with the local quadratic approximation and the conjugate gradient method, which is a pure form of Newton’s method.

Acknowledgments This work was supported by grants from the Australian Research Council and the Swiss Federal Institute of Technology.

References Aster R C, Borches B and Thurber C H 2005 Parameter Estimation and Inverse Problems (Amsterdam: Elsevier) Bazaraa M and Sherry C M 1979 Nonlinear Programming, Theory and Algorithms (New York: Wiley) Cao S and Greenhalgh S A 1997 Cross-well seismic tomographic delineation of mineralisation in a hard-rock environment Geophys. Prospect. 45 449–60 Carrion P M 1989 Generalized non-linear elastic inversion with constraints in model and data spaces Geophys. J. 96 151–62 Chunduru R, Sen M K, Stoffa P L and Nagendra R 1995 Non-linear inversion of resistivity profiling data for some regular geometrical bodies Geophys. Prospect. 43 979–1003 Chunduru R, Sen M K and Stoffa P L 1996 2-D resistivity inversion using spline parameterization and simulated annealing Geophysics 61 151–61 Constable S C, Parker R L and Constable C G 1987 Occam’s inversion: a practical algorithm for generating smooth models from electromagnetic sounding data Geophysics 52 289–300 Dabas M, Tabbagh A and Tabbagh J 1994 3-D inversion subsurface electrical surveying—I Theory Geophys. J. Int. 119 975–90 Daily W and Owen E 1991 Cross-borehole resistivity tomography Geophysics 56 1228–35 Dittmer J K and Szymenski J E 1995 The stochastic inversion of magnetics and resistivity data using the simulated annealing algorithm Geophys. Prospect. 43 397–416 Ellis R G and Oldenburg D W 1994a Applied geophysical inversion Geophys. J. Int. 116 5–11


Ellis R G and Oldenburg D W 1994b The pole-pole 3-D-resistivity inverse problem: a conjugate-gradient approach Geophys. J. Int. 119 187–94 Farquharson C G and Oldenburg D W 1998 Non-linear inversion using general measured of data misfit and model structure Geophys. J. Int. 134 213–27 Farquharson C G and Oldenburg D W 2004 A comparison of automatic techniques for estimating the reqularization parameter in nonlinear inversion problem Geophys. J. Int. 156 411–25 Kennett B L N and Williamson P R 1988 Subspace methods for large-scale non-linear inversion Mathematical Geophysics, a Survey of Recent Developments in Seismology and Geodynamics ed N J Vlaar, G Nolet, M J R Wortel and S A L Cloetingh (Dordrecht: Reidel) pp 139–54 Kirsch A 1996 An Introduction to the Mathematical Theory of Inverse Problems (Basel: Springer) LaBrecque D, Miletto M, Daily W, Ramirez A and Owen E 1996 The effects of ‘Occam’ inversion of resistivity tomography data Geophysics 61 538–48 Lines L R and Treitel S 1984 Tutorial: a review of least-squares inversion and its application to geophysical problems Geophys. Prospect. 32 159–86 Loke M H and Barker R D 1996 Rapid least squares inversion of apparent resistivity pseudosections using a quasi-Newton method Geophys. Prospect. 44 131–52 Loke M H, Dahlin T and Acworths I 2003 A comparison of smooth and blocky inversion methods in 2D electrical imaging surveys Explor. Geophys. 34 182–7 Luenberger D G 1984 Linear and Nonlinear Programming (Reading, MA: Addison-Wesley) Menke W 1984 Geophysical Data Analysis: Discrete Inverse Theory (New York: Academic) Minoux M 1986 Mathematical Programming Theory and Algorithms (New York: Wiley) Oldenburg D W and Li Yaoguo 1994 Inversion of induced polarization data Geophysics 59 1327–41 Oldenburg D W, McGillicray P R and Ellis R G 1993 Generalised subspace methods for large-scale inverse problems Geophys. J. Int. 114 12–20 Parker R 1994 Geophysical Inverse Theory (Princeton, NJ: Princeton University Press) Pelton W H, Rijo L and Swift C M Jr 1978 Inversion of two-dimensional resistivity and induced-polarization data Geophysics 43 788–803

Reiter D T and Rodi W 1996 Nonlinear waveform tomography applied to crosshole seismic data Geophysics 61 902–13 Sambridge M S, Tarantola A and Kennett B L N 1991 An alternative strategy for non-linear inversion of seismic waveforms Geophys. Prospect. 39 723–36 Sasaki Y 1994 3-D resistivity inversion using the finite element method Geophysics 59 1839–48 Scales J A and Smith M L 1994 Introductory Geophysical Inverse Theory (Golden, CO: Samizat) Skilling J and Bryan R K 1984 Maximum entropy image reconstruction, general algorithm Mon. Not. R. Astron. Soc. 211 111–24 Smith N C and Vozoff K 1984 Two-dimensional DC resistivity inversion for dipole–dipole data IEEE Trans. Geosci. Remote Sens. 22 21–8 Stoffa P and Sen 1991 Nonlinear multiparameter optimization using a genetic algorithm Geophysics 56 1794–810 Tarantola A 1984 Inversion of seismic reflection data in the acoustic approximation Geophysics 49 1259–66 Tarantola A 1987 Inverse Problem Theory, Methods for Data Fitting and Model Parameter Estimation (Amsterdam: Elsevier) Tarantola A and Vallette B 1982 Generalized non-linear inverse problem solved using the least-squares criterion Rev. Geophys. Space Phys. 20 219–32 Tikhonov A N and Arsenin V Y 1977 Solution of Ill-Posed Problems (New York: Winston) Tripp A C, Hohmann G W and Swift C M Jr 1984 Two-dimensional resistivity inversion Geophysics 49 1708–17 Wolke R and Schwetlick H 1988 Iterative reweighted least-squares, algorithms, convergence analysis, and numerical comparisons SIAM J. Sci. Stat. Comput. 9 907–21 Zhang J, Mackie R and Madden T 1995 3-D resistivity forward modeling and inversion using conjugate gradients Geophysics 60 1313–25 Zhou B and Dahlin T 2003 Properties and effects of measurement errors on 2D resistivity imaging surveying Near Surf. Geophys. 1 105–17 Zhou B and Greenhalgh S A 1999 Explicit expressions and numerical calculations for the Frechet and second derivatives in 2.5-D Helmholtz equation inversion Geophys. Prospect. 47 443–68 Zhou B, Greenhalgh S A and Sinadinovski C 1992 Iterative algorithm for the damped minimum norm, least squares and constrained problem in seismic tomography Explor. Geophys. 23 497–505

113

Solutions, algorithms and inter-relations for local minimization search

Solutions, algorithms and inter-relations for local minimization search

Suggest Documents

Local minimization algorithms for dynamic programming equations

Local Search Algorithms for Reserved Delivery Subnetwork ...

Effective Local Search Algorithms for Routing and Scheduling ...

Genetic Algorithms With Guided and Local Search Strategies for ...

Combining Cellular Genetic Algorithms and Local Search for Solving ...

Local Search Algorithms on Graphics Processing

Integrating Local Search into Genetic Algorithms - CiteSeerX

Sequential unconstrained minimization algorithms for constrained ...

Multipath Routing Algorithms for Congestion Minimization

Reinforcement Learning algorithms for regret minimization in ...

Majorization-Minimization algorithms for nonsmoothly ... - Project Euclid

Multidimensional Minimization Training Algorithms for ...

Alternating Iteratively Reweighted Minimization Algorithms for ... - arXiv

Local Search-Based Hybrid Algorithms for Finding ... - CiteSeerX

Efficient Multi-Start Strategies for Local Search Algorithms - Journal of ...

Two Phase Stochastic Local Search Algorithms for the Biobjective ...

Reactive Stochastic Local Search Algorithms for the Genomic Median ...

Stochastic Local Search Algorithms for the Graph Colouring ... - ULB

Effective Stochastic Local Search Algorithms For Bi-Objective ... - ULB

Stochastic Local Search Algorithms for Graph Set T ... - CiteSeerX

Two Efficient Local Search Algorithms for Maximum Weight Clique ...

Two Phase Stochastic Local Search Algorithms for the Biobjective ...

Effective Local Search Algorithms for the Vehicle ... - Semantic Scholar

Local Search Scheduling Algorithms for Maximal ... - IEEE Infocom 2004