Hopfield neural networks, and mean field

GEOPHYSICS, VOL. 62, NO. 3 (MAY-JUNE 1997); P. 992–1002, 12 FIGS.

Hopfield neural networks, and mean field annealing for seismic deconvolution and multiple attenuation

Carlos Calderon-Mac´ ´ ıas,∗ Mrinal K. Sen,‡ ∗ and Paul L. Stoffa

ABSTRACT

energy function of a Hopfield network. The mapping of a problem onto a Hopfield network is not a precondition for using MFA, but it makes analytic calculations simpler. The seismic deconvolution problem can be mapped onto a Hopfield network by parameterizing the source and the reflectivity in terms of binary neurons. In this context, the solution of the problem is obtained when the neurons of the network reach their stable states. By minimizing the cost function of the network with MFA and using an appropriate cooling schedule, it is possible to escape local minima. A similar idea can also be applied to design an operator that attenuates surface related multiple reflections from plane-wave transformed seismograms assuming a 1-D earth. The cost function for the multiple elimination problem is based on the criterion of minimum energy of the multiple suppressed data.

We describe a global optimization method called mean field annealing (MFA) and its application to two basic problems in seismic data processing: Seismic deconvolution and surface related multiple attenuation. MFA replaces the stochastic nature of the simulated annealing method with a set of deterministic update rules that act on the average value of the variables rather than on the variables themselves, based on the mean field approximation. As the temperature is lowered, the MFA rules update the variables in terms of their values at a previous temperature. By minimizing averages, it is possible to converge to an equilibrium state considerably faster than a standard simulated annealing method. The update rules are dependent on the form of the cost function and are obtained easily when the cost function resembles the

Tank, 1985). MFA replaces the stochastic nature of simulated annealing with a set of deterministic update rules that need to be solved iteratively. This deterministic relaxation procedure exhibits fast convergence toward the solution for complicated optimization problems. The MFA method has been adopted to a variety of problems such as in the restoration of images and noise removal problems (Bilbro et al., 1989; Bilbro et al., 1991). Zhou et al. (1995) applied an edge detection algorithm based on MFA for restoring tomographic crosshole images. But in particular, MFA has received major attention in the field of artificial neural networks (ANNs). In fact, some of the most important neural network models are related very closely to the principles of MFA (Ingman and Merlis, 1991; Hopfield and Tank, 1985). The most successful application of these neural networks has been in

INTRODUCTION

Mean field annealing (MFA) is a variant of the simulated annealing algorithm (e.g., Rothman, 1985) that has been applied with considerable success to many optimization problems. The principles of MFA come from a well known approximation, the mean field approximation, in the field of statistical mechanics. This theory studies systems of spin elements that interact with each other in a magnetic field. Conceptually, the mean field approximation replaces a spin variable that occurs in an energy field by its expectation or mean value when evaluating the probability distribution of any other variable. This approximation allows the statistical mechanics of complicated magnetic systems to be described by a closed set of equations relating the expected values of the variables (Hopfield and

Manuscript received by the Editor February 5, 1996; revised manuscript received August 26, 1996. ∗ Department of Geological Sciences and Institute for Geophysics, The University of Texas at Austin, 8701 North Mopac Expressway, Austin, Texas 78759. ‡Institute for Geophysics, The University of Texas at Austin, 8701 North Mopac Expressway, Austin, Texas 78759. ° c 1997 Society of Exploration Geophysicists. All rights reserved. 992

Deconvolution and Multiple Attenuation

solving optimization problems (Peterson and Soderberg, ¨ 1989; Peterson and Hartman, 1989; Cichocki and Unbehauen, 1993). In general terms, ANNs are dynamic systems built from a large number of interconnected units or variables that interact with each other in some prescribed way. In particular, attention has been focused toward ANNs with neurons that have nonlinear responses because of their capacity to make decisions or relate complex input-output data. In the pioneering work by Hopfield and Tank (1985), the traveling salesman problem is formulated on a highly interconnected neural network, the Hopfield network, in which the units of the network being “on” represent a certain decision to be made and correspond to the unknown variables of the problem. Minimizing the cost function of the network is equivalent to finding the lowest cost path. The state of each neuron changes in time determined by the state of neurons to which it is connected. An initial state of the neurons is required to start the minimization. Subsequent iterations can be carried out using a simple local update rule. Following the ideas of the traveling salesman problem, Wang and Mendel (1992) used a Hopfield network to solve the seismic deconvolution problem by mapping the prediction error into the cost function of a Hopfield network. In their work, the deconvolution problem is subdivided into two major parts with a different Hopfield network associated with each part—a reflectivity estimator and a wavelet extractor. Both parts are combined for simultaneously solving source and reflectivity with an initial wavelet as prior information. The major advantages of the Hopfield-network–based deconvolution method is that it requires weak modeling assumptions such as the fact that the wavelet can be nonminimum phase (Wang and Mendel, 1992). One of the main interests of mapping a problem into a Hopfield neural network is that these models are suitable for VLSI circuit design (Hopfield and Tank, 1985; Hertz et al., 1991) and thus they can be implemented on hardware. To escape from local solutions, uncorrelated noise is added into the neurons and gradually decreased using a prescribed annealing schedule (Cichocki and Unbehauen, 1993). An alternative approach to this hardware design is the MFA method. Moreover, the analytical derivation of the mean field equations for the cost function of a Hopfield network is extremely simple. An advantage of using MFA relates to its fast convergence toward the global solution compared to other global optimization methods (Bilbro et al., 1989). We document two applications in which it is possible to construct a Hopfield network from a given cost or error function and then apply MFA to reach nearly optimal solutions. Following the approach in Wang and Mendel (1992), we illustrate how to map the deconvolution problem into a Hopfield network and then to minimize the cost function using the Hopfield local update rule and the MFA algorithm. Second, we develop an approximate technique to design a filter that can be used to attenuate free surface multiple reflections from plane-wave transformed seismograms. The method uses the minimum energy criterion to decide whether an optimum filter has been found. We show how to adapt this criterion to construct a Hopfield network and then use MFA to find one of the deepest minima from the energy function in relatively few iterations. We make a comparative analysis of MFA and simulated annealing for this particular problem.

993

In the following sections, we describe the principles of Hopfield networks and their relationship with the mean field annealing equations. We also describe how to obtain an estimate of the starting temperature that is of key importance for a fast convergence of the method. OPTIMIZATION BY HOPFIELD NETWORKS

A Hopfield network is a single-layer feedback network whose dynamics are governed by a system of nonlinear ordinary differential equations and by an energy function. A basic artificial neuron has many inputs and weighted connections. If xi (t), i = 1, 2, . . . , n, is defined as the network input, φi an externally supplied input to neuron (also called threshold or bias), and wi j the weight connection from neuron i to neuron j, then the response of a particular neuron xi at time (t + 1) is given as

Ã

xi (t + 1) = H

n X

!

wi j x j (t) + φi ,

(1)

i6= j=1

where xi can take the values 1 (on) or 0 (off) and H is the step function known as the activation function of the network. This particular form of Hopfield model is known as the discrete Hopfield network. The energy function of the discrete Hopfield model is defined as

E =−

n n X n X 1X wi j xi x j − φi . 2 i=1 i6= j=1 i=1

(2)

The term energy function comes from a physical analogy to magnetic systems. In physics, equation (2) is described as the Ising Hamiltonian of n interacting spins. The net input represents the field acting on a spin, the threshold acts as the external field, and the connection or weight matrix corresponds to the interactions between spins. Minimization of equation (2) through (1) describes the dynamics of the spins aligning with their local fields (Hertz et al., 1991). A central property of the Hopfield network is that given a starting point for the neurons, the energy of equation (2) will never increase as the states of the neurons change provided that w is a symmetric matrix with zero diagonal elements. Thus, one ˜ the most important uses of a Hopfield network is in an opof timization problem in which the cost function of the optimization problem is related to the energy function of equation (2) such that one can identify the connection weights, neurons, and biases. Hopfield and Tank (1986) first explored this idea with the well known traveling salesman problem. In another application, Wang and Mendel (1992) were able to map the seismic deconvolution problem by expressing the error function in a form that the binary neurons could be identified with the problem variables. The minimum energy of the network corresponded to the optimal configurations of the network and thus gave the solution of the deconvolution problem. The iterative update rule of equation (1) only guarantees convergence to one of the local minima and only for the case that the starting point is sufficiently close to the globally optimum solution, the step update rule will be a good choice. One way of avoiding this dependency is to replace the update rule in equation (1) with a stochastic rule such as the Metropolis rule

994

Calderon-Mac´ ´ ıas et al.

in simulated annealing. That is, at any future time the output of a particular neuron will be given by

(

xi (t + 1) =

1

with probability P(h i )

0

with probability 1 − P(h i ),

Introducing the integral representation of the delta function, we have

δ(v) =

(3)

Z

i∞

euv du,

(9)

−i∞

and applying it to equation (8) we obtain

where

hi =

n X

Z ∞ Z i∞ 1 X e = dv du e−E(x/T ) eu(xi −v) 2πi x =0,1 −∞ −i∞ xi =0,1 i Z i∞ Z 1 ∞ = dv du e−E(x/T ) e−uv+log(cosh u) . (10) πi −∞ −i∞ X

e−h i /T P(h i ) = , 1 + e−h i /T

and

(4)

wi j x j + φi .

(5)

j=1

In equation (4), T is the familiar control parameter known as temperature. The Metropolis criterion will determine the state of the neuron by drawing a random number from a uniform distribution and comparing it with the probability P. Instead of continuing with this procedure, we adopt the mean field approximation. In this new procedure, discrete valued neurons are replaced with continuous outputs constrained between 0 and 1. By using this method, the quality of the final solution will be improved compared to the local update rule of equation (1) with slightly more computational effort. In the following section, we summarize the fundamentals of this method. A complete description of the MFA method can be found in Peterson and Anderson (1987) or Sen and Stoffa (1995). A theoretical study relating MFA and simulated annealing can be found in the work by Bilbro et al. (1992).

First assume that we have a set of n neurons such that each neuron xi is allowed to assume values of 0 and 1 and assume that the vector x represents a particular configuration of neurons or a state with energy E(x). Assume also that the probability of state x is given, as in the simulated annealing algorithm, by the following Gibbs’ probability distribution function

e−E(x/T ) p(x) = P −E(x/T ) , xe

Z =C

X

e−E(x/T ) =

X

···

x1 =0,1

x

X

xi =0,1

···

X

e−E(x/T ) ,

where each element of vector x can take a value of 0 or 1 and the number of elements in the vector is given by the number of neurons n. Thus we have a total of 2n states. Each discrete sum in equation (7) can be expressed by an integral over the continuous variable v as xi =0,1

e

−E(x/T )

=

X Z

∞

xi =0,1 −∞

e−E(x/T ) δ(x − v) dv.

∞

−∞

(8)

Z

i∞

e−E

0 (v,u,T )

−i∞

du i dvi ,

(11)

where C is a complex constant and the new function E 0 is given by

E 0 (v, u, T ) =

E(v) X + [u i vi − log (cosh u i )]. (12) T i

This new energy-like term [equation (12)] is smoother than E because of the presence of the right most term. As a result of this, the probability of being trapped in local minima is reduced. The saddle point method is used to analyze the integral in the partition function. The saddle-points are determined by taking the derivative of E 0 with respect to the continuous variables u i and vi and equating them to zero to obtain

and

∂ E0 = vi − tanh u i = 0, ∂u i

(13)

1 ∂E ∂ E0 = + u i = 0, ∂ Evi T ∂vi

(14)

which can be rewritten as

vi = tanh u i

(15)

and

ui = −

1 ∂ E(v) . T ∂vi

(16)

This is the general form of the mean field equations. Notice that the derivation of equations (15) and (16) does not depend on the functional form of the energy function. Applying the mean field equations to the energy function of a Hopfield network [equation (2)] we obtain

· PN

vi (t + 1) = tanh

xn =0,1

(7)

X

YZ i

(6)

where the term in the denominator is the partition function that requires summing over all possible neuron configurations. The key to deriving the mean field equations is based on the analysis of the partition function. To evaluate the partition function, it is necessary to have an estimate of the state of each variable (Peterson and Anderson, 1987; Peterson and Anderson, 1988), something that usually we want to avoid. We start by expanding the partition function in nested sums as follows:

−E(x/T )

Equations (7), (8), and (10) lead to the new form of the partition function

MEAN FIELD ANNEALING

Z=

1 2πi

i6= j=1

wi j v j (t) + φi T

¸

.

(17)

Notice that the discrete variables x have now been replaced by the new continuous variables v. Furthermore, notice that at T = 0 equation (17) reduces to the step function in equation (1). Similar to the simulated annealing method, the temperature is decreased gradually according to a specific cooling schedule. Decreasing T is equivalent to changing the slope of the sigmoid-shaped function (the activation function) of equation (15). The cooling process time can be reduced greatly by choosing an initial temperature that is close enough to the


critical temperature. At the critical temperature, the continuous variables v start converging to the limits of this function. Peterson and Soderberg ¨ (1989) derived an approximate expression for the critical temperature using the energy function of equation (2). However, the computation of the critical temperature is not trivial since it requires calculating the maximum eigenvalue of a Hessian matrix (the connection matrix), which can be considerably large in size for specific applications. In this work, we make a crude approximation for the starting temperature and then use a fast cooling schedule. Figure 1 describes the algorithm to minimize a Hopfield network energy function by MFA.

r (ti )s(t − ti ),

(18)

i=1

where ti is the time delay and n the number of samples in the seismic trace. A definition of the misfit or error between the observed trace d(t) and the computed trace z(t) is given by

Ã !2 n n X 1X E= rk−i si . dk − 2 k=1 i=1

(19)

To use binary neurons to represent the source wavelet, assuming that we have some knowledge of the amplitude and phase distribution of the reflectivity, the following approximate binary representation is introduced

Ã

si =

m X 1 x j−1 i j 2 j=1

!

− 1,

(20)

where xi j is either 0 or 1, ` = 1, 2, . . . , n, and j = 1, 2, . . . , m, where m gives the discretization or resolution in the range

1) Initialize the temperature T to the initial temperature T0 . 2) Initialize mean field variables and add random value: vi = 1/2+ rand (i) for i = 1, n 3) Start loop: i = 1, n

. . . ui =

X

wi j v j + φi

j

. . . vi = tanh

µ

ui T

¶

4) Decrease T and repeat step (3) until error remains constant for a number of iterations. FIG. 1. Mean field annealing algorithm.

( "Ã ! #)2 n n m X X 1X 1 E= ri−k xi j − 1 , (21) dk − 2 i=1 2 j−1 k=1 j=1

which can be simplified to the following form

E=− −

m X n X m n X 1X wi j,k` xi j xk` 2 i=1 j=1 k=1 `=1 m n X X

where

In this section, we follow Wang and Mendel (1992) closely to describe how a problem such as seismic deconvolution can be mapped into a Hopfield network. In this problem, neurons are used to represent the amplitudes of the reflectivity and source wavelet at every time sample. The task of the method is to find an optimal configuration of the neurons for which the error is minimum. Consider a single seismic trace given by the convolution of a source wavelet s(t) with a reflectivity r (t) as n X

[−1, 1]. Using this discrete representation of the source in equation (19), we arrive at the following equation

φi j xi j + constant,

(22)

i=1 j=1

HOPFIELD NETWORK MAPPING OF SEISMIC DECONVOLUTION

z(t) =

995

wi j,k` =

n X b=1

and

φi j =

n X k=1

"

1 2 j−1

Ã dk +

1

rb−i rb−k , 2 j+`−2

n X b=1

! rk−b rk−i

(23)

# 1 r2 . − 2(4 j−1 ) k−i (24)

At this stage, the description of the problem in terms of a Hopfield network is complete. We can either use the update rule in equation (1) or use equation (17) for optimization. Notice that we can also replace the reflectivity term with its binary representation and define a similar Hopfield network to act as a reflectivity estimator given a source function. This can be done by defining r (ti ) in terms of a 0–1 sequence yi multiplied by an amplitude ai, , i.e. r (ti ) = ai yi. . Wang and Mendel (1992) built a Hopfield network to solve first for yi by studying a single amplitude α, r (ti ) = αyi in equation (18). Once a set of likely events for this particular amplitude have been detected, those positions for which yi = 1, a second Hopfield network refines the amplitudes ai for the detected positions. This second network has a similar form to the one described in equations (20) to (22), but in this case we solve for reflectivity amplitudes ai instead of the wavelet amplitudes si . The next step of the algorithm is to subtract from the original trace the contribution of the estimated reflectivity by convolving the estimated amplitude sequence with the known source. This process is repeated for a sequence of decreasing α’s and finishing above the noise level that acts as a threshold. In this way, amplitudes that have already been extracted do not contribute in the minimization process. To summarize, the reflectivity estimator is defined by two Hopfield networks—the first network detects time positions of a specific amplitude in the trace and the second refines the amplitude of these positions. The equations that describe these two Hopfield networks are very similar to the ones already described by equations (19) through (24). To illustrate how this reflectivity estimator works, we carry out a simple exercise. Figure 2a shows a wavelet that is convolved with a spike series (Figure 2b) to generate a synthetic trace (Figure 2c). Random noise generated from a uniform distribution of magnitude 8% from the maximum amplitude in the trace has been added to this trace. The objective here is to estimate the spike series from the seismogram given that the source function is known. Figure 2d shows the spike locations

996


using the ‘+’ symbol when solving for spike positions yi for the maximum positive amplitude in the trace (α = 1). The method used here to minimize the cost function is the MFA method. Figure 2e, using the ‘◦’ symbol, shows the refined amplitudes for these particular locations, also obtained with MFA. The convolution of the source function with the computed amplitudes is subtracted from the original trace to obtain an updated trace that will be used to detect the next set of reflectivity amplitudes. The algorithm is run in several passes by gradually decreasing values of α, first to locate spike positions, and then to refine the amplitudes at those positions. By removing the modeled part of the trace at every amplitude level, the cost function is rescaled to the next amplitude level simplifying the search in model space. Figures 2f and 2g compare the final results, after sweeping amplitudes from −1 to 1 in increments of 0.2, when using MFA and the local update rule respectively. The starting model for the local update rule is a time sequence with zero amplitude, while the starting model for the MFA method are random numbers in the range [0 1]. Superior results are obtained when using MFA rather than the local update rule. We find that the cost function for the reflectivity estimator has many minima and that the final solution is highly dependent on the values of the starting variables when using the step update rule. Similar results can be observed when trying to find a source wavelet given an initial reflectivity sequence.

FIG. 2. Synthetic experiment: (a) Wavelet; (b) reflectivity series; (c) synthetic seismogram from convolving (a) and (b) with 8% of noise added; (d) detected spike positions (+ symbol) for an amplitude of one using MFA. In solid line the true reflectivity is plotted and is the same in all subsequent plots; (e) computed amplitudes from (b) using MFA (◦ symbol); (f) computed reflectivity series obtained with MFA (◦ symbol) after sweeping amplitudes from [−1 1]; (g) final reflectivity series obtained with the Hopfield local update rule (∗ symbol).

Real data example With real data, we are required to compute both reflectivity and source function with little prior information. This prior information could be in the form of reflectivity, information from well logs, or from the source function such as the phase of the wavelet. In the work of Wang and Mendel (1992), an initial source function was obtained using a least-squares method. From this initial source, the reflectivity estimator computes a reflectivity series based on the most prominent events. The computed reflectivity is used as feedback for the process of obtaining a new source function. This sequence is repeated until the amplitudes left in the trace can be interpreted as noise. Here we use MFA to minimize the cost function of the Hopfield networks. Although a priori information is crucial in solving the source-reflectivity ambiguity, MFA makes it possible to escape from possible local minima in the minimization of the error. We tested the method with a data set collected at sea. This data set was provided as part of an experiment to test and evaluate different multiple elimination methods (EAEG multiple elimination workshop, 1995). The data consist of shot gathers with 240 traces each with a 0.025 km receiver separation and 4 ms of sampling interval. The nearest receiver offset is 0.0275 km. The data were arranged in CMP gathers and transformed to the (τ - p) domain for processing. Figure 3 shows a transformed CMP gather from this data set that is used to test the deconvolution algorithm. The method is applied to each p-trace in the CMP gather obtaining a source wavelet and a reflectivity series per trace. As prior information, we use an initial wavelet that is an average minimum-phase wavelet obtained from windowing the autocorrelation of each trace. The initial minimum-phase wavelet is presented in Figure 4a. The algorithm is run in two iterations where one iteration includes the computation of the reflectivity and the use of the estimated reflectivity as the starting point to compute a new wavelet. In the first iteration, the same starting wavelet is used in all traces, while in the second iteration, the resulting wavelets from the first iteration are used as the starting wavelets. Figure 4b shows

FIG. 3. CMP gather in the (τ - p) domain used to test the deconvolution algorithm.


the result of averaging the output wavelets from the first iteration, and Figure 4c shows the final average wavelet to which the method converged after the second iteration. Although these wavelets show great similarity, the resulting wavelets show more energy in the tail and an increase in amplitude in the second and third ripples. Figure 5 shows the computed reflectivity series, and Figure 6 shows the residuals obtained from convolving the final wavelets with this series. The residuals mainly show incoherent energy with small amplitudes that can be interpreted as noise. Thus, the algorithm accommodates the energy of the trace in the reflectivity series and source wavelet based on the initial information. The algorithm is flexible for guiding the solution since it can concentrate on the more prominent events and then gradually include more amplitude information. A practical application for the perfect impulses obtained by the method is to convolve this series with a short zero-phase wavelet and obtain new seismograms without the small amplitudes discarded by the method. Figure 7 shows the result of convolving a zero-phase wavelet obtained from the trace autocorrelation of the original seismogram with the spike series from Figure 5. The new data facilitates interpretation and it provides a preprocessing step for a more detailed analysis such as a waveform inversion study (e.g., Sen and Stoffa, 1996).

997

deconvolution method is used commonly for multiple reduction in marine seismic data. Part of its success relates to the fact that it is computationally fast compared to other methods and requires only minimum user interaction. Recently, wave-equation–based algorithms (e.g., Wiggins, 1988; Verschuur et al., 1992) have become popular. Here, we derive a method for surface-related multiple attenuation based on the form of the cost function of the Hopfield network. The method uses the fact that in (τ - p) domain multiples are periodic with respect to intercept time assuming a simplified 1-D

MULTIPLE ATTENUATION

The elimination of free-surface multiples remains as a very important topic in seismic data processing. The predictive

FIG. 5. Reflectivity series computed with the final wavelet results shown in the previous figure.

FIG. 4. (a) Starting minimum-phase wavelet obtained from averaging minimum-phase wavelets from trace autocorrelation of the seismograms shown in Figure 3; (b) result after the first iteration of the MFA deconvolution method, where one iteration includes computation of reflectivity and new wavelet for each trace. The wavelet is an average from each computed p-wavelet; (c) final result obtained after two iterations of the method.

FIG. 6. Data residuals obtained from subtracting the convolution of the final source functions and the reflectivity series in Figure 5 from the original seismogram.

998


earth model. The method is applied as a single-channel process similar to predictive deconvolution. The aim is to remove the reverberations from the water layer. A first-order operator mapped into a Hopfield network is obtained based on the criterion of minimum energy. The operator is defined in terms of a prediction distance, size of the operator, and the water-bottom reflection coefficient. Energy is minimized using MFA. We derive the equations starting in the angular frequency and ray parameter domain or (ω- p) domain. To derive the MFA multiple suppression method, we use the following equation to describe surface CMP data that include intrabed multiples but no surface related multiples

dm (ω, p) = rd (ω, p)s(ω, p),

(25)

where rd is the impulse response of the earth including intrabed multiples, and s is the p-dependent source wavelet. We now introduce the interaction with the free surface and define the observed data as

dobs (ω, p) = [1 + rd (ω, p)]−1 dm (ω, p).

dm (τ, p) = dobs (τ, p) + rd (τ, p) dobs (τ, p),

Ã

rwi =

(28)

m X 1 z j−1 i j 2 j=1

!

− 1,

(29)

where z i j are the variables to solve. Using equation (29) and after some algebra, the cost function can be rewritten as

E=− −

m X n X m n X 1X wi j,k` z i j z k` 2 i=1 j=1 k=1 `=1 m n X X

φi j z i j + constant,

(30)

i=1 j=1

where

(27)

where the ∗ symbol represents convolution in time. At this point, we assume that the ghost reflections are part of the p-dependent source function and replace the reflection coefficient rd with the water-bottom reflection coefficients rwi . Based on the last equation, we now define a cost function that minimizes the energy of a trace as

and q ¿ n,

where n is the total number of samples in the trace, and q defines the size of the operator to be computed. From equation (28), it is possible to construct the set of weights w and the bias term φ ˜ Hopfield network. in a way that rw imitates the neurons of the To do this, first we use the following binary representation for rwi similar to equation (20)

(26)

In (τ - p) domain, we compute the data without surface multiples as ∗

" #2 q n X 1X dobsk + E =− rwi dobsk−i 2 k=1 i=1

wi j,k` =

n X b=1

and

φi j =

n X k=1

"

1 2 j−1

Ã dk +

1 dobsb−i dobsb−k , 2 j+`−2 n X b=1

! dobsk−b rk−i

(31)

# 1 2 d − . 2(4 j−1 ) k−i (32)

Now that the energy function, weight matrix, and threshold have been defined, we can use the MFA equations to find an approximation for the global minimum. After solving for z i j in equation (30) a first-order operator of the form (1 + rwi ) is applied to the observed data. Synthetic experiment

FIG. 7. Synthetic seismograms computed using the results from Figure 5 and a zero-phase wavelet extracted from a time window defined by the second zero crossing lag of the trace autocorrelation of the original seismogram.

To illustrate the method, we computed synthetic seismograms in the (τ - p) domain for the 1-D acoustic velocity model shown in Figure 8a. The model imitates a shallow sea floor and low velocity sedimentary layers. The data shown in Figure 8b shows a synthetic (τ - p) gather that includes primary as well as intrabed multiple events but no free-surface multiples. Figure 9a shows the synthetic data with the free-surface included and the source and receiver positioned at the surface. In this figure, the data are strongly affected by surface multiples from the first layer. We applied our method as a single channel process, computing one operator per trace separately. The information required initially by the method are the time of the first event for near normal incidence, the velocity of the first layer, the number of samples of the operator, and size of the data window that will be used in the inversion. In addition to this, we also need to input the parameter that defines the resolution in the discretization of the coefficient to compute, and a starting temperature. The entire record length is used to compute the operator defined by eight samples. The method to obtain a starting temperature is explained in the following section.


In addition to this, six bits (m = 6) are used to represent the amplitude of the operator resulting in 48 variables to estimate. Figure 9b shows the results of removing the free-surface multiples. Comparing the final seismograms and those of Figure 8b, we find that the energy contribution from the free-surface has been attenuated and primary events can now be identified. Estimation of the starting temperature To determine how MFA compares to other optimization schemes, we solved the synthetic experiment described in the previous section using a simulated annealing algorithm known as the very fast simulated annealing (VFSA) method (e.g. Ingber, 1989; Sen and Stoffa, 1995). To apply VFSA, we developed a method that works in the frequency domain that is equivalent to the one presented in the previous section. The model parameters of the VFSA-based method correspond to phase shifts and amplitude coefficients. The search ranges, starting temperatures, and cooling schedules, and an exponentially decreasing function are the same for both methods. Figures 10a and 10b show the error history of each of these methods when T0 = 100 and T0 = 1, respectively. It is important to notice that the MFA error curve [equation (30)] represents the true energy of the computed trace [equation (28)] only at the end of the optimization, that is, when neuron variables are two-valued with an 0 or 1 output. Nevertheless, these curves show the convergence of the model parameters toward the solution. Little discrepancies are observed in the final error between MFA and VFSA with a slightly faster convergence for the MFA method. Also, less computation time is required by MFA for a single error realization. Observe also that less iterations are required by both methods when starting at a low temperature (Figures 10a and 10b). The temperature associated with the high to low error change

a)

999

or phase transition is described as the critical temperature of the system, and it corresponds to the temperature at which variables tend predominantly to the 0 or 1 states in the MFA algorithm. The changes in the variable averages cause the energy to decrease rapidly in the vicinity of the critical temperature. Starting right at the critical temperature will cause the system to reach equilibrium without the need of a cooling schedule while starting at a very high temperature will require unnecessary error computation. Note that Basu and Frazer (1990) and Sen and Stoffa (1991) made several short simulated annealing runs to estimate the critical temperature that results in a computationally intensive approach. For some cases, it is possible to compute the critical temperature based analytically in the MFA equations (Peterson and Soderberg, ¨ 1989). In our case, we make a crude approximation of the critical temperature and anneal the system with a fast exponential cooling schedule. The following assumptions are made to estimate the starting temperature (Hertz et al., 1991; Bilbro et al., 1989): 1)

P hwi, j i =

i, j

n

wi, j

,

(33)

where n is the total number of variables or neurons. 2) The effect of the threshold term (φi = 0) is negligible compared to the effect of weight connections. 3) Making V = hvi i and following equation (17), we arrive at an approximate expression that describes the entire system

¶ hwi, j iV . V = tanh T µ

(34)

The last equation has two different solutions. When Tcritical > T (or hwi, j i ≤ 1) there is only the trivial solution V = 0. For

b)

FIG. 8. (a) Acoustic velocity model; (b) synthetic free surface multiple seismograms computed in the (τ - p) domain. The data include intrabed multiples.

1000


T < Tcritical the solution is Tcritical . = hwi, j i. We consider the absolute value of the weights to compute the mean. We have observed from several experiments that this value overestimates the critical temperature by, at most, one order of magnitude, which is desirable. This simple expression provides a systematic way of choosing an initial temperature. To determine how many iterations are required during optimization, two strategies are suggested. The first one is to stop the optimization after a certain number of iterations if the error does not change by a significant amount. The second strategy is to check whether the neuron variables have reached their equilibrium state or 0–1 output. Real data example To test the MFA algorithm for real data, we use the marine data set that was described in the deconvolution section. Figure 11a shows a CMP gather transformed to the (τ - p) domain by means of a cylindrical slant-stack, and Figure 11b

a)

shows the results after applying the MFA operator. Error was minimized for approximately 200 iterations per trace before obtaining the final result. Figure 11c shows the residuals obtained from subtracting the computed data from the original data. As it is observed, a major part of the multiple energy has been attenuated by the MFA method. Figure 12a shows a CMP gather that is located at the end of the line where the slope of the sea bottom increases, and Figures 12b and 12c show the MFA results and final residuals respectively. By comparing results from Figures 11 and 12, we see that more energy is predicted in the first CMP gather where the structure is nearly flat. One factor not considered by our method which may result in a less than optimum performance is the simplicity of a first order operator, and the deviation from the 1-D assumption. Second, the algorithm here presented does not account for the depths of the source and receiver and they are not part of the optimization process. An appropriate deghosting operator applied prior to the multiple attenuation method may improve results. CONCLUSIONS

When applying global optimization methods to solve geophysical problems, one is usually concerned with the long computation time. Two seismic processing problems that can be

a)

b) b)

FIG. 9. (a) Synthetic seismograms with free surface multiples included; (b) results using the MFA multiple attenuation method. The arrows mark the location of primary events.

FIG. 10. (a) Comparison of error histories between VFSA (discrete) and MFA (solid) for a starting temperature of 100; (b) comparison of error histories for a starting temperature of 1.


mapped and solved using a neural network model known as the Hopfield network are presented. Model parameters in this new approach are described in terms of binary neurons. Minimization of the cost function is carried out using the mean field equations derived from the energy function of a Hopfield network. MFA provides a simple technique to avoid local solutions. The algorithm exhibits fast convergence toward the solution while preserving similar quality in the result as the simulated annealing method. We showed how to apply the mean field annealing algorithm to solve the deconvolution problem described by a set of Hopfield networks. The method is flexible for accommodating a

a)

b)

1001

priori information in the form of reflectivity or source wavelet. MFA tries to find the best solution given the starting information and an estimate of the noise level. The lack of good a priori information could lead to a physically unrealistic estimation of the wavelet and reflectivity, while still producing a low data error. The method could serve as a routine preprocessing step for prestack data or before a more quantitative waveform procedure. We also described an approximate technique to design a first-order operator to remove free-surface multiple reflections from plane-wave transformed seismograms. We find that an error function for this problem can also be mapped into a

c)

FIG. 11. (a) CMP 228 in (τ - p) domain. The arrow marks the location of the first surface-related multiple; (b) results from the MFA multiple attenuation method; (c) data residuals between (a) and (b).

a)

b)

c)

FIG. 12. (a) CMP 340 in (τ - p) domain; (b) result from the MFA multiple attenuation method; (c) data residuals between (a) and (b).

1002


Hopfield network. The method is fast computationally and applied on a single trace basis. Good results are obtained when applying the method to synthetic data computed for a simulated shallow water environment and to a marine data set with small lateral heterogeneities. Use of deterministic update rules and the possibility of obtaining an approximation to the critical temperature makes the MFA method attractive for geophysical applications. However, the choice of the error function is crucial for proper implementation of the algorithm. ACKNOWLEDGMENTS

This research was supported by the National Science Foundation grant EAR-9304417. C. Calderon-Mac´ ´ ıas would like to thank CONACYT for the financial support enjoyed during the development of this work. Constructive reviews by Robert Essenreiter, Jerry Mendel, and an anonymous reviewer were helpful to improving this manuscript. The University of Texas at Austin Institute for Geophysics contribution no. 1216. REFERENCES Basu, A., and Frazer, L., N., 1990, Rapid determination of critical temperature in simulated annealing inversion: Science, 249, 1409–1412. Bilbro, G., Mann, R., Miller, T., Snyder, W., Van den Bout, D., and White, M., 1989, Optimization by mean field annealing: Advances in neural information processing systems, Morgan Kauffman. Bilbro, G., Snyder, W., Garmer, S., and Gault, J., 1992, Mean field annealing: A formalism for constructing GNC-like algorithms: IEEE Trans. on Neural Networks, 3, 131–138. Bilbro, G., Snyder, W., and Mann, R., 1991, Mean-field approximation minimizes relative entropy: J. Opt. Soc. Am., 8, 290–294. Cichocki, A., and Unbehauen, R., 1993, Neural networks for optimization and signal processing, John Wiley & Sons, Inc.

Hertz, J., Krogh, A., and Palmer, R. G., 1991, Introduction to the theory of neural computation: Addison-Wesley Pub. Co. Hopfield, J. J., and Tank, D. W., 1985, Neural computations of decisions in optimization problems: Biological Cybernetics, 52, 141–152. ——— 1986, Computing with neural circuits: A model: Science, 233, 625–633. Ingber, L., 1989, Very fast simulated reannealing: Math. Comp. Modeling, 12, 967–993. Ingman, D., and Merlis, Y., 1991, Local minimum escape using thermodynamic properties of neural networks: Neural Networks, 4, 395– 404. Peterson, C., and Anderson, J. R., 1987, A mean field theory learning algorithm for neural networks: Complex Systems, 1, 995–1019. Peterson, C., and Anderson, J. R., 1988, Neural networks and NPcomplete optimization problems; A performance study of the graph bisection problem: Complex Systems, 2, 59–89. Peterson, C., and Hartman , E., 1989, Explorations of the mean field theory learning algorithm: Neural Networks, 2, 475–494. Peterson, C., and Soderberg, ¨ B., 1989, A new method for mapping optimization problems onto neural networks: Int. J. Neural Sys., 1, 995–1019. Rothman, D. H., 1985, Nonlinear inversion, statistical mechanics, and residual statics estimations: Geophysics, 50, 2784–2796. Sen, M. K., and Stoffa, P. L., 1991, Nonlinear one-dimensional seismic waveform inversion using simulated annealing: Geophysics, 56, 1624–1638. ——— 1995, Global optimization methods in geophysical inversion, Elsevier Science Publishing Co. ——— 1996, Bayesian inference, Gibb’s sampler and uncertainty estimation in geophysical inversion: Geophys. Prosp., 44, 313–350. Verschuur, D. J., A. J., and Wapennar, C. P. A., 1992, Adaptive surfacerelated multiple elimination: Geophysics, 57, 1166–1177. Wang, L. X., and Mendel, J. M., 1992, Adaptive minimum predictionerror deconvolution and source wavelet estimation using Hopfield neural networks: Geophysics, 57, 670–679. Wiggins, J. W., 1988, Attenuation of complex water-bottom multiples by wave-equation–based prediction and subtraction: Geophysics, 53, 1527–1539. Zhou, C., Cai, W., Luo, Y., Shuster, G. T., and Hassanzadeh, S., 1995, Acoustic wave-equation traveltime and waveform inversion of crosshole seismic data: Geophysics, 60, 765–773.