First Break Detection in Seismic Reflection Data with Fuzzy ARTMAP Neural Networks Charalabos(Harry) D. DIMITROPOULOS, James F. BOYCE Wheatstone Laboratory, King’s College London, Strand, London WC2R 2LS, U.K., Tel/Fax: +44 [71] 873 2894 / 873 2716, E-Mail:
[email protected],
[email protected] Abstract. In this paper we investigate the use of a supervised, but self-organizing, Adaptive Resonance Theory type of neural network (Fuzzy-ARTMAP), for first break picking in seismic reflection data. First break picking is the accurate location of the leading energy pulse received by a geophone in response to a seismic shot. The performance of Fuzzy-ARTMAP is compared to our previous work with multi-layer perceptron and cascade-correlation neural nets[1]. Although the predictions of FuzzyARTMAP are less accurate by 2–8% for this problem, it has many features that make it a desirable candidate for a neural net implementation for first break detections: it learns quickly, efficiently and flexibly; it can be used in both on-line and off-line settings; it is easy to use, with few parameters; does not get trapped in local minima, and the fuzzy rules for mapping the input to the output can be extracted from the network.
1 Introduction 1.1
Objective
Neural nets are being applied to a variety of problems in geophysical exploration and they have been claimed to be highly successful for first-break prediction, necessary for static correction of the seismic data. A comparison and assessment of the reported methods as regards efficacy with respect to traditional methods, optimization of the neural net architecture, and the capacity both for learning and for generalization, has been carried out in Dimitropoulos and Boyce [2]. Furthermore, we proposed a new method for first-break picking with neural nets giving improved results in [1]. Our objective now has been to assess the performance of Fuzzy-ARTMAP for first-break picking and compare it to our previous work.
1.2
1.3
Reported Methods
First-break picking by neural nets has been reported by McCormack [3], Veezhinathan et. al. [4], Kusuma and Brown [5], by Murat and Rudman [6], and by An and Epping [7]. In Dimitropoulos and Boyce [1], we were able to achieve a performance of 100% on a dynamite data set with well defined first breaks, and 75% on a Vibroseis set with noisy first breaks, by using a cascade-correlation net and a modified set of attributes.
2 Net Simulations The nets were implemented on a Meiko transputer surface which combines transputers and i860’s, apart form the FuzzyARTMAP nets which had very short running times, and therefore a DECstation 5000/133 was sufficient.
Statics Correction
The objective of statics correction is to transform a set of shot records into that which would have been recorded on a horizontal surface, the datum; the transformed data is consequently free of elevation effects. The early part of each shot record, a collection of timetraces from a linear array of geophones, is composed of signals due to three modes of energy transfer from source to geophone: direct transmission, reflection from the most shallow seismic horizon, partial transmission as a surface wave along that horizon. For small shot-geophone distances the direct mode yields the earliest energy, the first-break. The significant information is at what displacement one of the other modes begins to dominate, as evidenced by a change in slope of the first-break time value as the shot-geophone displacement increases. As its name implies, first-break picking is the accurate location of the leading energy pulse received by a geophone is response to a seismic shot.
2.1
The Neural Net Algorithms
Cascade-correlation was developed by Fahlman and Lebiere [8] to overcome some of the limitations of learning algorithms designed for multi-layered networks, such as the backpropagation algorithm. The main difference with cascadecorrelation is that the topology of the network is not fixed: it starts with a minimal net and trains automatically, adding new hidden units one by one, as they are needed. Each new unit, which forms a single-node hidden layer, receives a connection from each of the network’s inputs and also from all pre-existing hidden units. Once a hidden unit has been created its input weights are frozen and only the output connections are trained. In this way, powerful high-order feature detectors are created. Fuzzy-ARTMAP is a neural network architecture that performs incremental supervised learning of recognition categories and multidimensional maps in response to arbitrary
map field F(ab)
2.3
x(ab) Wj(ab)
ρ(ab)
ART(a) F2(a)
ART(b) match tracking
y(a)
F2(b)
y(b)
reset
reset
Wj(a)
F1(a)
Wk(b)
F1(b)
x(a)
F0(a) A = (a,a^)
ρ(a)
a
F0(b)
x(b)
B = (b,b^)
ρ(ab)
b
Figure 1: Fuzzy ARTMAP architecture
sequences of analogue or binary input vectors. It includes a pair of Fuzzy Adaptive Resonance Theory modules (ARTa and ARTb ) and a mapfield module, F(ab), as shown in Figure 1. For classification tasks, Fuzzy-ARTMAP formulates recognition categories of input patterns in the F2(a) field, and associates each category, through F(ab), with its respective prediction. Fuzzy-ARTMAP realizes a Minimax Learning Rule that conjointly minimizes predictive error and maximizes generalization. As a result, the system automatically creates the minimal number of recognition categories (hidden units) needed to meet accuracy criteria. Category proliferation is prevented by a normalization procedure called complement coding. Learning is stable because all adaptive weights can only decrease in time. Decreasing weights correspond to increasing sizes of category ‘boxes’. The reader is referred to [9], for a complete description of Fuzzy-ARTMAP.
2.2
Seismic Data Sets
Two different types of seismic reflection data were considered1 : dynamite, a representative example of which is shown in Figure 4, and Vibroseis2. The dynamite data, as is generally the case with impulsive source data, has well defined first breaks, unlike the Vibroseis data with its emergent or noisy first breaks; typical of non-impulsive source data. In the data sets that we used, each trace consisted of 250 samples, with a sampling interval of 4msec. There were 48 traces per shot in the dynamite data set, and 240 traces per shot in the Vibroseis set. For the dynamite data, we trained the nets on the 48 traces of a shot record and tested on two other shots (96 traces), from the same survey. For the Vibroseis data, we trained on half of the traces from a shot record (120 traces), and tested on the remaining 120 from the same shot. 1 Data
supplied by Simon Petroleum Technology Ltd. Mark of Conoco
2 Trade
Coding the Data
In [2] we investigated three different methods of presenting the data to the net, namely: Amplitude, Peak Amplitude and Attribute methods. In [1] we presented four new methods, based on the Attribute method, and concluded that the best method from the ones we investigated was the PTDA method, which is the only method we are considering in this paper. With the PTDA method (Peaks-Troughs-Distances & Adjacent RMS) a sliding time window is applied to each trace. The window is scanned down each trace and the data within the window is pre-processed to extract attributes. Only windows centred around peaks in the trace are considered. The number of attributes extracted per window determine the number of input neurons (nodes) of the net; in our case 18 attributes. A single output neuron is required, as the net’s response is an indication of whether the current peak is before (0) or after (1) the first break. The predicted first break for a given trace is the point at which the output vector changes from (0) to (1). Alternatively, an output of (1) can indicate the location of the first break peak; all other locations producing a (0) output response. To extract the attributes needed for the PTDA method we do the following: we get the window centred about a peak and we use the value of the peak as an attribute, together with the values of the two peaks preceding and the two peaks following the current peak. This gives us 5 Peak attributes. We then use the amplitude values of the troughs between the peaks, which gives us 4 Trough attributes. We next include the relative distances between the peaks and troughs, which gives us 8 Distance attributes. Finally, we calculate the root-mean-square(rms) amplitude ratio on adjacent traces, which provides the net with a check for spatial coherence in the occurrence of first breaks. In order to calculate this attribute we need to add the two rms amplitude ratios of the adjacent traces. Now, the rms ratio is the rms amplitude in a window of n samples before the centre-peak, to that after the peak. For each window
rms amp =
r Pn
x2
i=1 i
n
(1)
where xi is the ith data sample. In our data we used two n = 15 windows per trace.
3 Results With cascade-correlation and back-propagation nets, training is done while validating, to avoid overtraining. For each network configuration a few runs were made, with different initial random weights. With Fuzzy-ARTMAP, only one run is required for each parameter configuration, since the initial weights have to be set to 1. There are three parameters which determine the dynamics of each ART module: a choice parameter > 0; a learning rate parameter 2 [0; 1]; and a vigilance parameter
2 [0; 1].
In the conservative limit with fast learning ( = 1) and normalized inputs, one-shot stable learning occurs; that is, no weight change or search occurs in a Fuzzy ART module after each item of an input set is presented just once. However, in Fuzzy-ARTMAP, where vigilance can vary when predictive errors are made, repeated input presentations can lead to new learning, so one-shot learning does not necessarily occur.
For the output module, ART(b), the choice parameter was set to a value close to zero; learning rate was set to 1 for fast learning, and the vigilance to 1, as we wanted each output value, (0) or (1), to form a distinct ART(b) category. For ART(a), we tried various parameter combinations, as reported in the next section. It must be noted, that for some traces, more than one first breaks are picked per trace, in which case a simple postprocessing step selects the one most appropriate: for example, the first of the picks for Fuzzy-ARTMAP, or the one with the highest confidence value (output value) for cascadecorrelation.
98 96
Performance
The limit ! 0 is called the conservative limit because small values of tend to minimize recoding during learning. Parameter a calibrates the minimum confidence that ART(a) must have in a recognition category, or hypothesis, activated by an input a in order for ART(a) to accept that category, rather than search for a better one through an automatically controlled process of hypothesis testing. Lower values of a lead to broader generalization and higher code compression. A predictive failure in ART(b) increases a by the minimum amount needed to trigger hypothesis testing at ART(a), using a mechanism called match tracking, which sacrifices the minimum amount of generalization necessary to correct a predictive error. Hypothesis testing leads to a selection of a new ART(a) category, which focusses attention on a new cluster of a input features that is better able to predict b.
94 92
alpha = 0.001 alpha = 0.1 alpha = 1
90 88 86 0
0.1
0.2 0.3 0.4 0.5 0.6 Baseline Vigilance ART(a)
0.7
0.8
Figure 2: Dynamite Performance 3.1.2 Fast Commit/Slow Recode The learning rate of ART(a) is now set to 1 for the first time a node is committed, and thereafter to 0.8. This training option can be sometimes useful for efficient coding of noisy input sets, but with an increase in the number of epochs required for training. For the dynamite case, the results obtained were very similar to the ones achieved with fast learning.
3.2
Vibroseis Data Sets
3.2.1 Fast Learning The best configuration was = 1; = 1; = 0:6, giving a 67% performance, in about half a minute (Figure 3). In 10 epochs, 62 ART(a) categories were created. In [1], the best results were obtained with a cascade-correlation net: a 75% accuracy, but in 3 minutes. For the other ART configurations, ART(a) categories ranged from 22 to 64, in 6–11 epochs. 3.2.2 Fast Commit/Slow Recode
3.1
Dynamite Data Sets
3.1.1 Fast Learning With fast learning, the learning rate of ART(a) is set to 1. Three settings of the choice parameter (alpha), and five baseline vigilance values of a were tested. The results are compared in Figure 2. The best ART(a) configuration was: = 0:1; = 1; = 0:4, giving a performance of 98% correctly picked traces, in a few seconds. That is, about 2% less accurate than Cascade-Correlation.
The same performance was achieved when = 1 and = 0:4, but in 13 epochs, after creating 26 ART(a) categories. The number of categories formed in F2(a) ranged from just 3 (in 1 training epoch) when a = 0 and = 0:001, to 80 (in 2 epochs) with a = 0:8 and = 1. For baseline vigilance 0:8, we would also get ‘I don’t know responses’ for a few patterns.
On average, we would achieve 3% less accuracy than with fast learning. Only in one instance did we have an increase of 2% in performance, but still under the 67% performance obtained with the best fast learning configuration.
4 Discussion and Conclusions Comparing Fuzzy-ARTMAP to the other nets we can observe that it learns to make accurate predictions quickly, in the sense of using relative little computer time; efficiently, in the sense of using relatively few training trials; and flexibly, in the sense that its stable learning permits continuous new learning, on one or more databases, without eroding prior knowledge, until the full memory capacity of the network is exhausted. Fuzzy-ARTMAP leads to favourable levels of code compression in both on-line and off-line settings. It is also easy to use, as it has a small number of parameters, requires no
67 66
Performance
65 64 63 62 61
alpha = 0.001 alpha = 0.1 alpha = 1
60 59 58 0
0.1
0.2 0.3 0.4 0.5 0.6 Baseline Vigilance ART(a)
0.7
0.8
Figure 3: Vibroseis Performance Figure 4: Dynamite Picks problem-specific system crafting of choice of initial weight values, and does not get trapped in local minima. Furthermore: (a) ARTMAP can be made to respond with an ‘I don’t know reply’ if that is required, or with an ‘educated guess’ if an answer is forced. This is controlled by selecting an appropriate baseline vigilance. (b) It performs autonomous hypothesis testing, and therefore acts as a selforganizing expert system. (c) A voting strategy can also be used to assign probability estimates to competing predictions given small, noisy, or incomplete training sets, as demonstrated in [9], where in all cases voting performance was significantly better than performance of any of the individual simulations. (e) Finally, the user has access to the feature templates created by the net for distinguishing between different categories. In addition, knowledge, in the form of fuzzy rules, can be derived from the network, as shown in [10]. In conclusion, although Fuzzy-ARTMAP gives slightly lower performance than cascade-correlation, it has many attractive features which make it worth considering for this application. The authors wish to thank Simon Petroleum Technology Ltd. for supplying the data for the investigation and for many useful discussions with Mr. P. Haskey, Dr. R. L. Silva and Mr. R. Holden.
References [1] C.H. Dimitropoulos and J.F. Boyce. Applications of neural nets to seismic signal analysis. In Int. Conf. on Acoustic Sensing and Imaging. IEE No.369, pages 62– 67, 1993. [2] C.H. Dimitropoulos and J.F. Boyce. Neural nets for first break detection in seismic reflection data. In 3rd. Int. Conf. on Artificial Neural Networks. IEE No.372, pages 153–157, 1993.
[3] M.D. McCormack. Seismic trace editing and first-break picking. In 60th. Ann. Internat. Mtg. Soc. Expl. Geophys, pages 321–324, 1990. Expanded Abstracts. [4] J. Veezhinathan, D. Wagner, and J. Ehlers. First break picking using a neural network. In F. Aminzadeh and H. Simann, editors, Expert Systems in Exploration, chapter 8, pages 179–202. Society of Exploration Geophysicists, 1991. ISBN 0-56080-023-2. [5] T. Kusuma and M.M. Brown. First break picking and trace editing using cascade-correlation learning architecture. Int. Conf. on Petroleum Exploration and Production, 1992. [6] M.E. Murat and A.J. Rudman. Automated first arrival picking: a neural network approach. Geophysical Prospecting, 40:587–604, 1992. [7] G. An and W.J.M. Epping. Seismic first-arrival picking using neural networks. In World Congress on Neural Net. INNS, pages I–174,I–177, 1993. [8] S.E. Fahlman and C.Lebiere. The cascade-correlation learning architecture. In D.S. Touretzky, editor, Advances in Neural Information Processing Systems 2, pages 524–532. Morgan Kaufmann,San Mateo,USA, 1990. [9] G.A. Carpenter, S. Grossberg, N. Markuzon, J.H. Reynolds, and D.B. Rosen. Fuzzy ARTMAP: A neural network arhcitecture for incremental supervised learning of analogue multidimensional maps. Technical Report CAS/CNS-TR-91-016, Boston University, 1991. [10] G.A. Carpenter and A-H Tan. Rule extraction, fuzzy ARTMAP, and medical databases. In World Congress on Neural Net. INNS, pages I–501,I–506, 1993.