Applying Waveform Correlation to Three ... - GeoScienceWorld

Bulletin of the Seismological Society of America, Vol. 103, No. 2A, pp. 675–693, April 2013, doi: 10.1785/0120120058

Applying Waveform Correlation to Three Aftershock Sequences by Megan E. Slinkard, Dorthe B. Carr, and Christopher J. Young

Abstract

For nuclear explosion seismic monitoring, major aftershock sequences can be a significant problem because each event must be analyzed. Fortunately, the high degree of waveform similarity expected within aftershock sequences offers a way to more quickly and robustly process these events than is possible using traditional methods (e.g., short-term average/long-term average detection). We explore how waveform correlation can be incorporated into an automated event detection system to improve both the timeliness and the quality of the resultant bulletin. With our Waveform Correlation Detector we processed three aftershock sequences: the 1994 Northridge earthquake, the 2005 Kashmir earthquake, and the 2008 Wenchuan earthquake. Our system compared incoming waveform data to a library of known master events and identified incoming waveform data that correlated well with a master event as a repeating event. We break down our results to show how many master events found matches, the distribution in family size, and the effect of distance and fault characteristics on the results. Between 24% and 92% of the events in each sequence were recognized as similar events.

Introduction The size of the geographic region correlates to the magnitude of the mainshock: the larger the magnitude, the larger the region of aftershocks. Aftershocks are triggered by stress changes in the region due to the rupture of the mainshock; thus, their geographic distribution effectively outlines the fault rupture zone. Moreover, if the mainshock is large enough, it can affect nearby faults systems as well (Dieterich, 1994; Kilb et al., 2000; Felzer et al., 2004). The majority of the aftershocks should have similar source mechanisms because they are produced by the same overall stress changes. These characteristics of aftershock events—limited geographic sampling and source mechanism similarity—make them ideally suited for waveform correlation processing, because it has been well established that events from essentially the same source location, with the same source physics, recorded by the same receiver, have highly similar waveforms (e.g., Geller and Mueller, 1980; Israelsson, 1990). This is because the combination of the source mechanism and the path from source to receiver through the earth’s geology effectively determine the received signal at a given station. Waveform correlation processing—the mathematical comparison of incoming waveform data at one or more stations against an archive of known events—can be used for any type of waveform data, but it is particularly useful for seismic data processing because seismicity is so repetitive and geographically limited (e.g., Thorbjanrardottir and Pechmann, 1987; Harris, 1991; Withers et al., 1999). This simple approach is remarkably robust, simultaneously lowering both detecting thresholds and false alarm rates (Schaff

A large earthquake aftershock sequence can increase the normal rate of occurrence of events observed by regional or global seismic monitoring systems several fold, for days or weeks. For nuclear explosion monitoring, aftershocks are not of interest, but unfortunately each must be carefully processed to ensure that no signals from an event of interest are intermingled. Thus, the production of a comprehensive event bulletin is often significantly delayed during a major aftershock sequence, which is problematic because it is essential to identify potential nuclear events as quickly as possible. Thus, a more efficient way of processing aftershock sequences is highly desirable. Figure 1 shows a histogram of the number of seismic events worldwide from 29 April 2008 through 28 June 2008 reported in the Reviewed Event Bulletin (REB) from the International Data Center (IDC), which processed data from the global International Monitoring System (IMS). Both the IDC and the IMS are part of the protocol for the proposed Comprehensive Test-Ban Treaty (CTBT). There are typically between 50 and 100 events every day in the REB. The M S 8.0 Wenchuan earthquake occurred on 12 May 2008, and its effect on the bulletin is easy to spot: almost 400 events were recorded from this earthquake and its aftershocks on that day. This increased the number of worldwide events on that date to 437, over four times the typical number of events seen in a day. The unexpected occurrence of this large sequence must have represented a significant challenge to both the IDC software and analysts. Aftershocks generally occur in close geographic proximity to the mainshock and to each other (Utsu, 1961). 675

676

M. E. Slinkard, D. B. Carr, and C. J. Young

Figure 1. The number of earthquakes per day from the IDC Reviewed Event Bulletin (REB) starting thirteen days before the 2008 Wenchuan Earthquake. The earthquake occurred on 12 May 2008 (day 135) and increased the number of events in the REB on that day to over four times the usual number.

et al., 2004; Gibbons and Ringdal, 2006; Schaff and Waldhauser, 2010). In many active seismic regions, if a sufficient archive is available, a significant portion of the seismic events detected each day can be recognized as similar events, with waveforms closely matching those from a previous event (Schaff and Waldhauser, 2005). Ordinarily, building up the archive may take many years, but during an aftershock sequence, a large set of similar events for a focused region may occur very quickly, over the course of hours or days. Thus, the challenge in using waveform correlation to process an aftershock sequence is to develop a system that can “learn” the new events as they occur and immediately apply that knowledge to the incoming data. In this paper, we introduce a waveform correlation system to process aftershock sequences and evaluate its effectiveness against three major aftershock sequences.

Data We looked for aftershock sequences that would challenge seismic monitoring systems, have been well characterized, and had easily available data at a number of stations around the epicenter of the mainshock. Based on these criteria, we selected the 1994 Northridge, 2005 Kashmir, and 2008 Wenchuan sequences. We used two earthquake catalogs to get our ground truth. For the 1994 Northridge sequence, we used the Earthquake Data Report (EDR) from the United States Geologic Survey (USGS). For the Kashmir and Wenchuan sequences we used the Reviewed Event Bulletin from the International Data Center (IDC-REB). Relevant information for each sequence is summarized below.

Northridge earthquake: • occurred on 17 January 1994 in southern California; • M w 6.7; • thrusting fault focal mechanism without surface rupture (Hauksson et al., 1995); • produced the strongest ground motions ever instrumentally recorded in an urban setting at that time; • aftershocks span an area of about 20 × 30 km2 (Thio and Kanamori, 1996); • aftershocks used in our analysis limited to a latitude– longitude box of 33.9–34.6° N and 118.8–118.2° W (Fig. 2); farthest apart events were 72-km apart; • according to the EDR catalog, there were 238 recorded events in the first 24 hours and 471 recorded events in the first 7 days; • time period used in our study was 17 January, 12:30, to 21 January, 14:35, 1994; there were 412 aftershocks in the EDR catalog for this period; • we retrieved data from over 20 stations in southern and northern California. The main stations used were PAS and MHD, 27- and 348-km away, respectively, from the mainshock. Pakistan (Kashmir) earthquake: • occurred on 8 October 2005 in northern Pakistan; • M w 7.6; • 70-km, northwest-trending, thrust surface rupture (Kaneda et al., 2008); • strongest earthquake in area for 100 years; • more than 75% of the aftershocks occur in a cluster around 30 km southwest of the strike of the main rupture (Bendick et al., 2007);

Applying Waveform Correlation to Three Aftershock Sequences

Figure 2.

677

Aftershocks from the 1994 Northridge earthquake in the time period of 17–21 January 1994 (412 total aftershocks from

EDR catalog). We show the latitude–longitude box we used, and the mainshock (orange). Waveforms from stations PAS and MHD were

processed.

• aftershocks used in our study were limited to a latitude– longitude box of 33–35° N and 72–74° E (Fig. 3); the diameter of the cluster is ∼150 km; • according to the IDC-REB catalog, there were 282 recorded events in the first 24 hours and 502 recorded events in the first 7 days; • time period used in our study was 8 October, 02:00, to 13 October, 02:00, 2005; there were 462 aftershocks in the IDC-REB catalog for this period; • we retrieved data from stations NIL and AAK, 99 and 907-km away, respectively, from the mainshock. Wenchuan earthquake: • occurred on 12 May 2008 in the Sichuan Province of China; • M w 7.9; • thrust fault mechanism: 240-km-long surface rupture along the Benichuan fault and 72-km-long surface rupture along the Pengguan fault (Xu et al., 2009); • most devastating earthquake in China in over three decades; • aftershocks used in our study were limited to a latitude– longitude box of 30–34° N and 102–106° E (Fig. 4); farthest apart events were 380-km apart;

• according to the IDC-REB catalog, there were 435 recorded events in the first 24 hours and 874 recorded events in the first 7 days; • time period used in our study was 12 May, 06:00, to 17 May, 06:00, 2008; there were 802 aftershocks in the IDC-REB catalog for this period; • we retrieved data from stations CD2 and XAN, 39 and 621-km away, respectively, from the mainshock. The most important information about the earthquake aftershock sequences for our testing is summarized in Table 1. We collected waveforms for these events from the Incorporated Research Institutions for Seismology (IRIS) data center, the Northern California Earthquake Data Center (NCEDC), and Los Alamos National Laboratory (LANL).

Methods Dendrogram Analysis To verify the feasibility and potential value of largescale waveform correlation detection, we first used existing tools and catalogs to establish the amount of repeated

678


Figure 3. Aftershocks from the 2005 Kashmir earthquake in the time period of 8–13 October 2005 (462 total aftershocks from IDC catalog). We show the latitude–longitude box we used and the mainshock (orange). Waveforms from stations NIL and AAK were processed.

Figure 4. Aftershocks from the 2008 Wenchuan earthquake in the time period of 12–17 May 2008 (802 total aftershocks from IDC catalog). Waveforms from stations CD2 and XAN were processed. seismicity found in each aftershock sequence. The Dendro software previously developed by Sandia National Laboratories (Merchant, 2007) creates dendrograms based on waveform correlation coefficients to identify families of events

with similar waveforms. With this software, we were able to determine what percentage of catalog events was part of correlated families and explore the effect of varying the correlation threshold that determines the families. The Dendro


679

Table 1 Pertinent Information about the Three Earthquake-Aftershock Sequences Studied M w of mainshock Time period for study Latitude–longitude box for study Number of events in catalog Stations (distance from mainshock)

Northridge, 1994

Kashmir, 2005

Wenchuan, 2008

6.7 1/17 12:30 to 1/21 14:35 33.9–34.6° N, 118.8–118.2° W 412 PAS (27 km), MHD (348 km)

7.6 10/8 02:00 to 10/13 02:00 33–35° N, 72–74° E 462 NIL (99 km), AAK (907 km)

7.9 5/12 06:00 to 5/17 06:00 30–34° N, 102–106° E 802 CD2 (39 km), XAN (621 km)

software requires a list of known events as input, so only events in the catalog are used; this is generally less than the total number of repeated events, as repeated events of low magnitude may not be in the catalog. Nevertheless, by using Dendro to process the catalog list of aftershocks, we can get a first estimate of the potential impact of waveform correlation techniques on an aftershock sequence. The dendrogram shows the clustering relationships of all the events in the Northridge data set based on correlation of 40-s waveform segments starting 5 s before the predicted P arrival times. In this case, the data is from PAS. A portion of the full dendrogram is shown in Figure 5. The y axis lists the event identification numbers (orids) of all cataloged events; the x axis lists the correlation strength. The similarities between families are calculated using the single-link method: the correlation between families is equal to the highest correlation between any two members.

For any given pair of waveforms, the measured correlation can be found by tracing to the left until a vertical joining segment is found. Events with a correlation stronger than that of the correlation threshold (dashed vertical line) are considered to belong to the same (color coded) family. A full description of how dendrograms are calculated and interpreted is available in Merchant (2007). For the dendrogram in Figure 5, the correlation threshold is 0.7. For this threshold value, 50 clusters were created, with the majority of clusters, 30, having only two events. The largest cluster had 19 events. The dendrograms tell us how many events belong to a family of similar events, how large those families are, how similar the events are within a family, and how similar each family is to every other family. For correlation-based methods to be useful for event detection and identification, we would want to see that a high percentage of cataloged

Figure 5. Portion of the Dendrogram of Northridge Events. The origin ids that correspond to each line on the Dendrogram are listed on the right side of the plot. The dotted vertical line is the correlation threshold.

680


and the beginning of the S arrival. For our source-to-station distances, window lengths vary between 40 and 120 s. The length of the windowed data segments is the same as the length of the waveforms in the Master Waveform Library. Let wN to represent the vector of N consecutive samples of the filtered, windowed data segment, starting at time to :

aftershock events are grouped together into families. Our dendrogram results for the three data sets are summarized in Table 2. We clustered the events using two correlation thresholds: 0.7 and 0.5. The percentage of events that belongs to a family of similar events ranges from 83% (Northridge, PAS, 0.5 threshold) to < 1% (Wenchuan, XAN, either threshold). Our detailed examination of these clustering results suggests that the number of similar events depends on the shape of the fault (compact faults have more families), how close the recording station was to the events (closer is better), and the correlation threshold used to determine the family cutoffs (lowering the threshold resulted in more matches and bigger families that span a larger geographical area). For our purposes, these dendrogram clustering results suggest that, in most earthquake aftershock sequences, a significant percentage of events belonged to a family of repeated events, and thus that waveform correlation processing is worth further investigation.

wN to wto ; wto Δt; wto 2Δt; …; wto N − 1Δt: Similarly, let mN to represent the vector that is the master waveform: mN tm mtm ; mtm Δt; mtm 2Δt; …; mtm N − 1Δt: The correlation coefficient is a measure of similarity between two vectors and is defined as

Waveform Correlation Prototype System

PN−1 n0 wto nΔtmtm nΔt C q : PN−1 PN−1 n0 wto nΔtwto nΔt n0 mtm nΔtmtm nΔt

Waveform Correlation Detector Having verified that aftershock sequences typically include numerous repeated events, we proceeded to develop a prototype system to simulate using waveform correlation during an aftershock sequence. The waveform correlation detector (WCD) we developed compares incoming raw data to archived master waveforms to identify similar waveforms as they occur and tag them as repeated events. Our WCD flow is captured in the flow chart in Figure 6. Our algorithm operates on a single station, during a prescribed time period. The incoming raw data stream is filtered, windowed, and then correlated with each waveform in the Master Waveform Library. If a correlation is above the threshold, then we say a match is found and record information such as the start time of the data segment, the correlation strength(s), and the master waveform(s) that found the match. The incoming data stream is then advanced one sample, and the process repeats. The incoming data stream is first filtered and windowed. To accentuate the P arrivals, data were band-pass filtered, keeping 0.8–3.5 Hz, using a third-order Butterworth filter. The incoming data were then windowed; the window length is chosen so that the segmented data includes the P arrivals

(1)

This equation returns values between −1 and 1, where 0 is completely uncorrelated and 1 is when the events are exact multiples of one another; for example, wto amtm . To ensure that reflections are recognized, we use the absolute value, jCj in our processing. Also note that, as an implementation detail, we do some additional processing to ensure that an event is only declared a match once, even as the raw data slides past the master waveforms in time and the valleys and troughs of the waveforms can realign, triggering multiple correlation values above the threshold. We allow only one detection within a 5-s interval, choosing the one with the highest correlation value. The creation of the master waveform library is worth further discussion. Families cannot be monitored with waveform correlation until the first catalog representative has been found. Given that waveform correlation detection has been shown to be significantly more sensitive than traditional detection methods, it seems quite possible that the first catalog event in a family will not necessarily be the first event in the

Table 2 Dendrogram Results Northridge PAS

Threshold Number of families Number of events in largest family Total number of origins matched Percent matched

0.7 47 19 178 43

0.5 13 318 342 83

Kashmir MHD

0.7 25 8 102 25

0.5 43 38 198 48

NIL

0.7 24 8 91 20

0.5 32 156 243 53

Wenchuan AAK

0.7 12 4 26 6

0.5 36 15 102 22

CD2

0.7 5 3 9 3

0.5 9 193 119 43

XAN

0.7 6 3 3 0.4

0.5 23 5 5 0.6


Figure 6.

Waveform correlation detector system.

family that can be determined by waveform correlation (and thus might be recognized as an event by an analyst, according to our criteria). Thus, to get a more complete picture of the full set of events in each family, we mined the catalog for event signatures that occurred in our full sample length and used them to prepopulate the master library. Thus, we began the process with a library generated from catalog events from the entire sequence. In this way we stepped out of time and could find all similar events in the sequence, including those small events that occurred before their catalog match (now library) event occurred. This obviously does not simulate processing an aftershock sequence in real time, where it would not be possible to monitor for a given type of master event until it is available in the analyst-reviewed catalog, but it does give us a better understanding of repeated seismicity in a sequence. We compare these results with results from simulations of a real-time system in the conclusion. Master waveform selection is a topic of ongoing research, and great strides in this topic occurred during the time period in which our research was performed. Harris and Dodge (2011), for example, describe a way of automatically building up a library of subspace detectors. For this research we took a very simple approach; we simulated a simple operational system by processing catalog events in chronological order. The first event went into the library. It was then correlated with all other events in the catalog; events it matched were considered to already be represented by the first event and discarded. We then repeated the process with the second event, and so on, and thus built up our library. A more sophisticated algorithm for choosing master waveforms would have resulted in fewer masters that did not find any matches, and possibly more matches. Correlation Threshold Selection We have shown, in our dendrogram discussion, that the number of families created changes significantly depending on the choice of correlation threshold. How does one determine a correlation threshold to use? How similar is similar enough? How confident are we that similar waveforms actually correspond to nearby seismic events? We decided that for our WCD system our correlation threshold selection would be guided by the following qualitative and quantitative guidelines.

681 • For nuclear explosion monitoring applications, we expect that any automatic processing results must be validated by an analyst. Thus, it is essential that the waveforms our system groups into a family must be deemed to be similar by an analyst. This is a qualitative assessment, of course, but is helpful in rejecting thresholds that are clearly too low. • Assuming that the catalog locations are fairly accurate, we expect that the events in families have to lie near one another. Theory says events should correlate when they are separated by less than ∼λ=4 (Geller and Mueller, 1980; Israelsson, 1990), where λ is the dominate wavelength. Using 7 km=s as the velocity of propagation, and 0.8 Hz (our filterband’s lower limit) as the frequency, this works out to about 2.2 km. Thus, we expect our families to consist of closely grouped events. In practice, the catalog mislocation (Schaff and Richards, 2011) was significantly larger than 2.2 km, and we did not find this to be a viable method by which to set our threshold, though we did check the distance between catalog locations within a family to ensure that they were not significantly larger than the catalog error would allow. • We require that the probability of background noise being mistaken for an event should be less than once a year. For our third criteria, we turn to Wychecki-Vergara et al. (2001), which provides a rigorous method of evaluating the probability of mistaking noise for an event. The correlation threshold for a given probability of error is calculated based on the background noise at the sensor and the timebandwidth product of the master waveforms. Caution is required when using this method to choose thresholds, as background noise is not the only source of false matches (nonseismic transients often occur); it is still, however, a useful statistical method to characterize our system and obtain a lower bound on an acceptable correlation threshold. Wychecki-Vergara et al. (2001) shows the false alarm rate can be calculated as n FAR 1 − 0:5 0:5 betainc x2 ; − 1 ; 2 n

1 1; varR

(2)

(3)

where x is the proposed correlation threshold, betainc is the incomplete beta function, R is the cross correlation of the master event with sensor background noise, and n is the degrees of freedom. Essentially, we are correlating the master waveform with a long stream of background noise, measuring how different the master waveform is from background noise, and thereby determining how likely we are to match with noise by accident. The number of years between false alarms as obtained from the calculated FAR is shown for station AAK in Figure 7. This shows the effect of varying the threshold on the false alarm (false match) rate. Because the false alarm

682


Figure 7. The best, worst, and median false alarm rates for the master waveforms at station AAK. In the worst case a threshold of 0.5 will give one false alarm every 100 years. rate depends on the time-bandwidth product of a master waveform, we plot the results for the best, median, and worst (as defined by calculated FAR) master waveforms in the data set to see the range of results expected for this data set. Note that our chosen threshold of 0.5 results in a false match due to background noise less than every 100 years even for the worst master waveform, far exceeding our stated goal of one false match per year. Software Implementation Our algorithm was implemented in Matlab, using the Signal Processing and Parallel Computing toolboxes.

Results of Aftershock Sequence Processing We used our WCD to process all three aftershock sequences. Processing parameters are listed in Table 3. Before we discuss the results, we first need to define some terms that will be used in our analysis (see Table 4):

Northridge Earthquake The WCD found many similar events in the data from the Northridge earthquake. Figure 8 shows examples of families created when running the WCD on signals from stations PAS and MHD. We will discuss these families in detail to familiarize the reader with our approach, results, and vocabulary before proceeding to a general overview of the results. Station PAS is located only 27 km from the main earthquake, so all the aftershock signals we see fit easily in the 40-s window. MHD is 348 km from the main earthquake, and the 40-s time window shows the P arrival and P coda up to the start of the Sn arrival; because the window includes the start of the Sn , and therefore encodes distance information, we consider the 40-s window a long enough window length. The master event for both families is orid number 81741 and is plotted as the top trace. Below the master event are the traces from the three matches in each family that correlated best with the master event, in order of correlation value. For

Table 3 WCD Parameters for the Different Aftershock Sequences Parameter

Description

Northridge

Kashmir

Wenchuan

Filtering Window length (master and incoming data) Latitude–longitude box

Third-order, band-pass, Butterworth filter (zero phase) Length of data segments used in correlation

0.8–3.5 Hz 40 s

Only cataloged events whose origins were within the latitude–longitude box were added to the library

33.9–34.6° N 118.8–118.2° W

0.8–3.5 Hz 40 s 120 s 33–35° N 72–74° E

0.8–3.5 Hz 40 s 90 s 30–34° N 102–104° E


683

Table 4 Terms Used in This Analysis Term Catalog event Observable event Master event Match New signal Catalog match Family Unsuitable event

Description An event listed in the EDR (Northridge) or IDC-REB (Kashmir, Wenchuan) catalog during the time period of the WCD run. Event that is above the noise floor at a given station, as judged by an analyst. All catalog events were visually verified by an analyst to determine if they were seen. Waveform included in the archive; waveform to which other signals are matched. Signal that correlates above a threshold of 0.5 with a master event. A match that is not listed in the catalog. A match that is listed in the catalog. A master event and its matches. An event that would otherwise be a master but is corrupted by another arrival or a data dropout.

PAS (Fig. 8a), the first trace under the master event is a new signal (correlation 0.83); the next two traces are catalog matches, orids number 81855 (correlation 0.80) and number 81851 (correlation 0.67). Nine signals in all matched the master event in this family: three catalog matches and six new signals, making this a family of 10. Using the locations listed for the catalog matches in the EDR catalog, the events are between 0.2- and 0.7-km apart. Assuming that magnitude is proportional to logALg =T Lg , where ALg is Lg peak-topeak amplitude and T Lg is the Lg period, we can compare these values within the family to get an idea of the magnitude range spanned by the family. The values span almost 2.5 magnitude units. Station MHD also formed a family around the waveform it recorded for orid number 81741 (Fig. 8b). The three signals that had the best correlation values were three catalog

matches: number 81855 (correlation 0.83), number 81851 (correlation 0.83), and number 81934 (correlation 0.73). There were seven signals in all that matched this master event: six catalog matches and one new signal. The catalog matches at MHD include the three catalog matches from the PAS family, plus three additional catalog matches. The locations of the six catalog matches and master event are all around 34.36° N and 118.71° W, except for orid number 81934, which has a location of 34.33° N and 118.72° W in the EDR catalog (3.44-km away). The arrival time for the new signal in this family indicates that it is coming from the same source as one of the new signals in the family formed at PAS. Seeing the same catalog matches and new signals in the families formed at both PAS and MHD gives us confidence that the WCD is working as expected. The correlation values for this MHD family tended to be higher than

Figure 8. (a) The master event (top) and the three best matches in a 10-event family created for the Northridge aftershock sequence as observed at station PAS. (b) Same, but for an 8-event family as observed at station MHD. Both families have the same master event, orid number 81741.

684


those at PAS; this is likely due to the attenuation of higher frequencies as the signal travels to MHD. We used the log method described above to determine the magnitude range of these events. The values were lower than at PAS, as expected given the increased distance, and spanned 1.74 magnitude units. The WCD results for the two Northridge earthquake stations are summarized in Figure 9. The three pie charts offer insight into the way our WCD was run, the percentage of events that belongs to a family, and the effectiveness of the

Figure 9.

WCD in finding small-magnitude, new signals. Recall that our master events were culled from all catalog events to represent distinct waveform templates. During WCD, incoming data was compared against these master templates. Most masters found matches and created a family. Some masters represent unique events and did not find a match. We chose to exclude some catalog masters from the WCD as being unsuitable; this usually occurred when another event arrived in the viewing window of the event, thereby corrupting the master template. Figure 9a and 9d shows the distribution of

Northridge Master waveforms and events.

Applying Waveform Correlation to Three Aftershock Sequences master events for PAS and MHD. For insight, we show masters that formed families consisting of cataloged matches, masters that formed families only by matching new signals, unique masters, and unsuitable masters. The pie charts show the number of events in each category both as an absolute number and as a percentage. Next, we expand the pie chart to include all catalog events (Fig. 9b,e). All catalog events are either a master event (from Fig. 9a,d) or a catalog match. It is easy to determine the percentage of catalog events that belonged to a family, as the color scheme identifies all members of a family as a shade of blue or purple. From this chart we can determine the number and percentage of events that belong to a family. Two numbers are shown: the dashed line arrow and number (94% for PAS) counts the master events only if they find a catalog match; the dotted line and number (83% for PAS) also includes the master events that match only new signals. For the Northridge sequence, our WCD found a number of new signals and formed more families than would occur just grouping catalog data; thus, the second percentage is larger. The final charts (Fig. 9c,f) include the new signals, 942 at PAS and 55 at MHD, to illustrate the effectiveness of WCD in finding new, low-amplitude, events. From these charts it is apparent that WCD is very effective at finding families of similar events at these stations. Having established the number of events that belong to a family, the next item of interest is the makeup of the families. Figure 10 shows the number of families at (a) PAS and (b) MHD compared to the number of events in each family. The histograms let us note whether the families tend to be large or small. For viewing ease, we only show families with 50 or fewer members. At PAS we had a total of 1322 events belonging to a family (this includes masters that found a match, catalog matches, and new signals). The WCD created 150 different families (of two or larger), for an average family size of 8.8. At MHD, which had a total of 282 events belonging to a family, 131 families were created, for an average family size of 2.2. The number of families, each created around a distinct waveform template and, theoretically, a distinct geographical location, did not vary considerably from PAS to MHD. This indicates that PAS and MHD identified the aftershock clusters similarly. PAS, however, tended to create much larger families, which included many more new, low-magnitude signals.

Kashmir Earthquake We used stations NIL and AAK to perform waveform correlation on the Kashmir earthquake aftershocks. Figure 11 shows examples of representative families created at those stations. Station NIL is located 99 km from the main earthquake; a 40-s window works well; both P and Sn arrivals are seen in that window (Fig. 11a). AAK is 907 km from the main earthquake, so the 40-s window only shows the Pn arrival and part of the Pg arrival. Increasing the time window

685 to 120 s allows us to include the full Pg arrival, the P coda, and the Sn arrival (Fig. 11b). The master event for the NIL family shown is orid number 3417213. The master event for the AAK family is orid number 3417594. These represent the same cluster: the master event for AAK was the best correlating match for the master event of NIL. Figure 11a shows the master event from NIL and the three highest correlating traces in the family. The signal beneath the master event is a catalog match (orid number 3417594, the master event for the family from AAK; Fig. 11b), with a correlation value of 0.77. The next two signals seen in Figure 11a are another catalog match and a new signal, with correlation values of 0.58 and 0.57, respectively. All four signals look very similar, satisfying our criteria that our correlation threshold results in waveforms that look similar to an analyst. There were seven total signals that matched the master event in this family—three known aftershocks and four new signals. The correlation values range from 0.52 to 0.77. Instead of the Lg amplitude, we took the log of the Sn amplitude divided by the period of the Sn signal to estimate the size range of the events. For this family the events span a range of 2.8 magnitude units. The family formed around the waveform from orid number 3417594 at station AAK is shown in Figure 11b. This family, based on a 120-s correlation window, consists of a master event plus catalog matches, orid number 3417753 and number 3417213. The correlation values for the matches are 0.63 and 0.51, respectively, and there are clear similarities between the aftershocks and the master event, especially when the Sn arrival comes in. Using the log of the amplitude of the Sn wave over the period of the Sn signal, we estimate the range of event sizes to cover 1.0 magnitude units. The WCD results for the two Kashmir earthquake stations are summarized in Figure 12. The top pie charts (Fig. 12a,d) show the distribution of master events for NIL (40-s window) and AAK (120-s window). Next, the distribution of all observed cataloged events is shown (Fig. 12b,e). Not all catalog events were visible at the stations; an analyst reviewed all of the data to determine if each event was visible, above the noise floor, at each station. At NIL, 440 out of a theoretical 462 events were observed; at AAK, 360. All observed catalog events end up as master events, unsuitable events, or matches. From these charts it is easy to determine the percentage of observed catalog events that belonged to a family, guided by the arrows; recall that the color scheme identifies members of a family as a shade of blue or purple. The dotted arrow counts the master events only if they find a catalog match, the dashed also counts master events if they match only new signals. The final charts (Fig. 12c,f) include the new signals, to illustrate the effectiveness of WCD in finding new, low-amplitude events. Note that the closer station (NIL) found many more new signals than the more distant station (AAK). This is consistent with the results for the Northridge sequence and likely due both to the decreased signal-to-noise ratio at the farther station as well as to the fact that we kept our correlation threshold constant

686 even though the time-bandwidth product of our master waveforms increased when we lengthened the time window. Figure 13 shows the number of families at (a) NIL and (b) AAK compared to the number of events in each family.

Figure 10.


We show families up to 20 members. At NIL, which had a total of 1081 events belonging to a family, WCD created 219 different families. At AAK there were 98 family events, and 46 families were formed. The largest family had only five

The number of families at (a) PAS and (b) MHD compared to how many events are in the family for families with up to 50 members. The families at PAS tended to be much larger than at MHD, as they included many new, low-amplitude events.


687

Figure 11.

(a) The master event (top) and the top three correlated signals that matched for a family created by running the WCD on signals from the Kashmir aftershock sequence observed at station NIL. (b) The master event (top) and the origin that matched for a family created running the WCD on signals from the Kashmir aftershock sequences observed at station AAK. The master event for the AAK family (3417594) is the top correlated signal to the master event (3417213) for the NIL family.

members. The nearer station found substantially more new signals and formed larger families. Again, this is consistent with the Northridge results.

Wenchuan Earthquake Figure 14 shows examples of families created when running the WCD on signals from the Wenchuan sequence at stations CD2 and XAN. The master event for both families is orid number 4751536. Station CD2 is a Chinese National station; we received the CD2 data from Los Alamos National Laboratory (R. Stead and D. Yang, personal comm., 2010). This station is only 39 km from the mainshock of the Wenchuan earthquake, but there are so many data dropouts that we only see 262 of the 802 known aftershocks from the IDC catalog. Again, our analyst visually verified every event. Every event that was not compromised by a dropout was observed, so had it not been for the data quality problems, we expect we would have observed all the events. Because the station is located close to the signals, we see both the Pg and Lg arrivals in the 40-s window. Figure 14a shows a master event (top trace) and the three best correlated signals from the 16 members of the family at station CD2. The best correlated signals are two catalog matches (correlation values of 0.68 and 0.62) and one new signal (correlation value 0.62). Over-

all, there are six catalog matches and nine new signals in this family. Using the log of the Lg amplitude divided by the period of the Lg signal, we estimate a magnitude range of 3.1 for this family. Figure 14b shows the family formed around the same master event at XAN, a station 691 km from the mainshock. We used a 90-s window to capture the P and Sn arrivals. Only one catalog match was found (correlation value 0.54). The WCD results for the two Wenchuan earthquake stations are summarized in Figure 15. The top pie charts (Fig. 15a,d) show the distribution of master events for CD2 (40-s window) and XAN (90-s window). Figure 15b,e shows the distribution of all observed cataloged events. Due to the data quality issues, only 262 events were observed at CD2. At XAN, which was much farther away but did not have data dropouts, 752 were observed. All observed catalog events end up as master events, unsuitable events, or matches. From these charts, it is easy to determine the percentage of observed catalog events that belonged to a family, guided by the arrows; recall that the color scheme identifies members of a family as a shade of blue or purple. The dotted arrow counts the master events only if they find a catalog match; the dashed also counts master events if they match only new signals. The final charts (Fig. 15c,f) include the new signals, to illustrate the effectiveness of WCD in finding new,

688


Figure 12.

Master events, catalog events, and new signals for Kashmir data set.

low-amplitude events. Note that once again the closer station (CD2) found a multitude of new signals, whereas the more distant station (XAN) found fewer. The number of families at (a) CD2 and (b) XAN compared to the number of events in each family is shown in Figure 16. We only show families with 20 or fewer members. The WCD created 101 different families at CD2 from 451 events. Because there were no data dropouts at station XAN, there were more family events, 443, at that station, grouped into 160 families. Given the data dropouts in CD2, the increase from 101 families to 160 families is not surprising.

As expected, CD2’s closer proximity to the mainshock resulted in more new signals and larger families. Results from all three data sets are tabulated in Table 5.

Discussion When comparing the results seen from the different aftershock sequences, we see that three factors influence how effective waveform correlation processing is for finding aftershock events: (1) the distance the signals travel from the aftershock sequence to the station, (2) the geometry and


Figure 13.

The number of families at (a) NIL and (b) AAK compared to how many events are in the family for families with up to 20 members. The largest family at AAK only had 5 members.

mechanism of the fault, and (3) the length of the window over which the correlation is calculated. Combining information from the events, Figure 17 shows the percentage of catalog events belonging to a family as a function of station distance, with the closest station, PAS, on the left and the station farthest away, AAK, on the right. It is apparent that closer stations see a much higher percentage of similar events for our threshold value. The two best stations are PAS and NIL, with percentages of 92 and 77, respectively. These two stations are both close to the aftershock regions, but also, these aftershock regions (Northridge and Kashmir) are more localized than the Wenchuan aftershock region. The Northridge earthquake had a thrusting fault focal mechanism that did not rupture the surface (Hauksson et al., 1995). Thio and Kanamori (1996) found that the Northridge aftershocks spanned an area of about 20 × 30 km2 , and our farthest apart events were 72-km apart. The Kashmir earthquake also occurred on a thrust fault (Kaneda et al., 2008), although this earthquake did cause surface rupture. Its cluster had a diameter of about 150 km. Both PAS and NIL also had the highest numbers of new signals: 942 at PAS and 740 at NIL.

689 CD2 attracts attention in Figure 17 for having a surprisingly low percentage of family events, only 58% despite being only 39 km from the mainshock. We believe that the percentage would probably be higher without the data problems, but we also believe this to be related to the fault geometry. The Wenchuan earthquake was a thrust fault mechanism over two different faults: a 240-km-long surface rupture on the Benichuan fault and a 72-km-long surface rupture along the Pengguan fault (Xu et al., 2009). Figure 4 shows how the aftershocks from the Wenchuan earthquake follow the trend of the faults; our farthest apart events were 380-km apart. Although station CD2 is close to the aftershocks at the southern end of the fault, the aftershocks at the northern end of the fault can be > 200-km away. The path differences and attenuation influence how well the signals correlate. Signals from the southern end of the fault do not correlate well with signals from the northern end of the fault, and we find this reflected in the families that are created with WCD. The more tightly clustered the aftershocks, the more matches the WCD will find. Our results also bring to light the importance of choosing a suitable window length over which to do the correlation. We saw that choosing a window that included Pn and Lg or S arrivals yielded better results than a window that just included Pn and Pg . Including secondary arrivals would also seem to lower the false alarm rate, because including multiple distinct phases in the waveform causes it to only find matches from equidistant source events. There are consequences for using a long time window, however. First, a longer window means more processing, potentially problematic in a real-time system, because correlation processing time scales linearly with window length. Second, if the aftershocks are occurring close together in time, then there are more opportunities for spurious signals from other events to occur in the time window of the master signal, which requires the master to be discarded. If a longer window is used, then correlation threshold selection should be re-evaluated, as the probability of false alarm returned by the WycheckiVergara et al. technique (based on the time-bandwidth product) demonstrated that longer window lengths can have lower thresholds. The generally high percentage of events belonging to families encouraged us to simulate an operational system. In an operational system, the library of master events would grow as events occur in the sequence. Any similar events occurring prior to the detection of the first “type” event for a family would not be found. These master events could come directly from automatic processing without analyst review, making them available sooner (Harris and Dodge, 2011), or they could be added after traditional processing and analyst review, ensuring better quality but adding a significant delay. In our operational system simulation, we move forward through time and add all catalog events that were not detected via waveform correlation to the master library (we assume all catalog events would have been built by traditional processing methods). Thus, the master library starts

690


Figure 14.

(a) The master event (top) and the three top correlating signals for a family created by running the WCD on signals from the Wenchuan aftershock sequence observed at station CD2. (b) The master event (top) and the origin that matched for a family created running the WCD on signals from the Wenchuan aftershock sequences observed at station XAN. Both families have the same master event, orid number 4751536.

with no waveforms and immediately adds one as soon as the first catalog event is encountered. Subsequent windows of incoming data are then compared against this event. When the second catalog event moves into the correlation window, it either correlates with the first event and is added to that family or it is added to the library as a new master event. This process repeats for the duration of the processing. This dynamic library method returned 2%–3% fewer matches than the prepopulated library results described in this paper, so apparently relatively few similar events occur prior to the first “type” event in the catalog. This suggests that, in a given area of the rupture zone, the larger aftershocks occur earlier in the sequence. Regardless, this difference in WCD performance is small enough that we are confident that an operational system would be of value to the monitoring community. In operational use, the WCD could quickly identify likely repeated events, allowing quicker processing of repeated events and allowing the analysts to spend more time processing unfamiliar events. Workload reduction is shown for each station at the bottom of Table 5. Workload reduction is meant to demonstrate the potential time savings for an analyst processing all catalog events. New signals do not factor into this calculation. Workload reduction is calculated assuming that a family of events can be processed in the same amount of time as one event; because the additional events probably do require

some processing time, this gives an upper bound on workload reduction. Workload reduction is calculated as: Workload Reduction catalog matches=catalog size: Workload reduction varies significantly between data sets, determined by the number of catalog events that grouped into a family. Creating larger families is the most straightforward way to increase workload reduction, meaning that there is a trade-off between workload reduction and geospatial precision for a family. The next steps toward building an operation system requires taking what we have learned about station distance, fault geometry, and window length to identify when the WCD would be useful and how to automatically select parameters for a new earthquake sequence. We are currently working on having the WCD automatically select a suitable window length for each master waveform. It uses a user-defined acceptable probability of error (false correlation), rather than having the user specify a correlation threshold, and from this determines a suitable master-waveform specific threshold (which is dependent on time-bandwidth product of the master events). This should improve performance and simplify setup. Other challenges toward building an operational system include expanding the concept from one station to incorporating the results from multiple stations and figuring


Figure 15.

691

Results from Wenchuan earthquake sequence.

out how to integrate WCD with a traditional detector. Given the huge potential benefits of WCD during a large aftershock sequence, we think these are research areas worth pursuing.

Data and Resources Data from stations NIL and AAK came from the IRIS/ IDA network. Project IDA currently operates a global network of broadband and very broadband seismometers for the IRIS Consortium. Project IDA is based at the Cecil and Ida Green Institute of Geophysics and Planetary Physics, Scripts

Institution of Oceanography, University of California, San Diego, http://ida.ucsd.edu/ (last accessed September 2011). Data from station XAN is from the New China Digital Seismograph Network. Station PAS data is from the TERRAscope network, which is part of the Southern California Seismic Network operated by Caltech and USGS, http:// www.scsn.org (last accessed September 2011). The facilities of the IRIS Data Management System, and specifically the IRIS Data Management Center, were used to access the waveforms from NIL, AAK, XAN, and PAS, http://www .iris.edu (last accessed September 2011).

692


Table 5 Results from All Events Studied Northridge

Station distance (km) Correlation threshold Window length (s) Number of events in catalog Number of seen catalog events Number of master events Number of seen catalog events in Percentage of seen catalog events (with other catalog events) Number of seen catalog events in events OR new signals) Percentage of seen catalog events events OR new signals) Number of new signals identified Workload reduction

Pakistan

Wenchuan

PAS

MHD

NIL

AAK

CD2

XAN

a family (with other catalog events) in a family

27 0.5 40 412 412 176 343 83%

348 0.5 40 412 371 275 178 48%

99 0.5 40 462 440 301 223 51%

907 0.5 120 462 360 275 79 22%

39 0.5 40 802 262 185 113 43%*

621 0.5 90 802 752 624 130 17%

a family (with other catalog

380

212

341

88

151

225

in a family (with other catalog

92%

57%

78%

24%

58%*

30%

942 56%

55 22%

740 28%

10 12%

300* 20%*

218 9%

Percent of events belonging to a family, number of new events identified, and workload reduction are highlighted. Recall that CD2 had data quality issues likely reducing the effectiveness of waveform correlation.

Figure 17.

The percentage of known events that matched other known events or new signals at each station studied. The distances from the station to the mainshock of the earthquake are along the top of the plot, and the distances increase from left to right. Station PAS has the highest number of matches, whereas station AAK has the lowest.

Figure 16.

The number of families at (a) CD2 and (b) XAN compared to how many events are in the family for families with up to 20 members. At CD2 there are families of many different sizes while at XAN most of the families have less than < 3 members.

Station MHD waveforms were obtained from the Northern California Seismic Network through the Northern California Earthquake Data Center, a joint project of the University of California Berkeley Seismological Laboratory and the USGS, http://www.ncedc.org (last accessed October 2010). The waveforms from station CD2 were obtained via personal communication with Los Alamos National Laboratory and are not generally available to the public.


Acknowledgments This work was performed under the auspices of the U.S. Department of Energy by Sandia National Laboratory under Award Number DE-AC0494AL85000. The authors would like to thank our colleagues at LANL for providing the CD2 data. In addition, we would like to thank David Schaff, Paul Richards, Charlotte Rowe, Steven Gibbons, and William Junek for helpful conversations and support. Last, we would like to thank two anonymous reviewers for their helpful comments.

References Bendick, R., R. Bilham, M. A. Khan, and S. F. Khan (2007). Slip on an active wedge thrust from geodetic observations of the 8 October 2005 Kashmir earthquake, Geology 35, 267–270. Dieterich, J. (1994). A constitutive law for rate of earthquake production and its application to earthquake clustering, J. Geophys. Res. 99, 2601–2618. Felzer, K. R., R. E. Abercrombie, and G. Ekstrom (2004). A common origin for aftershocks, foreshocks and multiplets, Bull Seismol. Soc. Am. 94, 88–98. Geller, R. J., and C. S. Mueller (1980). Four similar earthquakes in central California, Geophys. Res. Lett. 7, 821–824. Gibbons, S., and F. Ringdal (2006). The detection of low magnitude seismic events using array-based waveform correlation, Geophys. J. Int. 165, 149–166. Harris, D. B. (1991). A waveform correlation method for identifying quarry explosions, Bull. Seismol. Soc. Am. 81, 2395–2418. Harris, D. B., and D. A. Dodge (2011). An autonomous system for grouping events in a developing aftershock sequence, Bull. Seismol. Soc. Am. 101, 763–774. Hauksson, E., L. Jones, and K. Hutton (1995). The 1994 Northridge earthquake sequence in California: Seismological and tectonic aspect, J Geophys. Res. 100, no. 12 335. Israelsson, H. (1990). Correlation of waveforms from closely spaced regional events, Bull. Seismol. Soc. Am. 80, 2177–2193. Kaneda, H., R. Nakata, H. Tsutsumi, H. Kondo, N. Sugito, Y. Awata, S. S. Akhta, A. Majid, W. Khattak, A. A. Awan, R. S. Yeats, A. Hussain, M. Ashraf, S. G. Wesnouski, and A. B. Kausar (2008). Surface rupture of the 2005 Kashmir, Pakistan Earthquake and its active tectonic implications, Bull. Seismol. Soc. Am. 98, 521–557.

693 Kilb, D., J. Gomberg, and P. Bodin (2000). Triggering of earthquake aftershocks by dynamic stresses, Nature 408, 570–574. Merchant, B. J. (2007). The GNEMRE Dendro Tool, SAND Report #2007-6439. http://www.osti.gov/bridge/purl.cover.jsp?purl=/926809‑ yvpPPE/ (last accessed April 2012). Schaff, D., and P. Richards (2011). On finding and using repeating seismic events in and near China, J. Geophys. Res. 116, B03309. Schaff, D., and F. Waldhauser (2005). Waveform cross-correlation-based differential travel-time measurements at the Northern California Seismic Network, Bull. Seismol. Soc. Am. 95, 2446–2461. Schaff, D., and F. Waldhauser (2010). One magnitude unit reduction in detection threshold by cross correlation applied to Parkfield (California) and China Seismicity, Bull. Seismol. Soc. Am. 100, 3224–3238. Schaff, D., G. Bokelmann, W. Ellsworth, E. Zanzerkia, F. Waldhauser, and G. Beroza (2004). Optimizing correlation techniques for improved earthquake location, Bull. Seismol. Soc. Am. 94, 705–721. Thio, H. K., and H. Kanamori (1996). Source complexity of the 1994 Northridge Earthquake and its relation to aftershock mechanisms, Bull. Seismol. Soc. Am. 86, S84–S92. Thorbjarnardottir, B. S., and J. C. Pechmann (1987). Constraints on relative earthquake locations from cross-correlation of waveforms, Bull. Seismol. Soc. Am. 77, 1626–1634. Utsu, T. (1961). A statistical study on the occurrence of aftershocks, Geophysics 30, 521–605. Withers, M., R. Aster, and C. Young (1999). An automated local and regional seismic event detection and location system using waveform correlation, Bull. Seismol. Soc. Am. 89, 657–669. Wychecki-Vergara, S., H. Gray, and W. Woodware (2001). Statistical development in support of CTBT Monitoring, DSWA01-98-C-0131. Xu, X., X. Wen, G. Yu, G. Chen, Y. Klinger, J. Hubbard, and J. Shaw (2009). Coseismic reverse- and oblique-slip surface faulting generated by the 2008 Mw 7.9 Wenchuan earthquake, China, Geology 37, 515–518.

Sandia National Laboratories P.O. Box 5800 MS 0404 Albuquerque, New Mexico 87185-0404 [email protected] Manuscript received 17 February 2012

Applying Waveform Correlation to Three ... - GeoScienceWorld

Applying Waveform Correlation to Three ... - GeoScienceWorld

Suggest Documents

Using waveform cross correlation for automatic ...

High-precision three-dimensional ... - GeoScienceWorld

architectural characterization and three ... - GeoScienceWorld

Correlation of metamorphosed Paleozoic strata of ... - GeoScienceWorld

Three-dimensional waveform modeling of ionospheric signature

Three-dimensional waveform modeling of ionospheric signature ...

Extraction of three-dimensional fracture trace ... - GeoScienceWorld

Extraction of three-dimensional fracture trace ... - GeoScienceWorld

Plutonism in three dimensions: Field and ... - GeoScienceWorld

Three-dimensional pore connectivity evaluation in ... - GeoScienceWorld

True three-dimensional trishear: A kinematic ... - GeoScienceWorld

Three-dimensional imaging of inhomogeneous ... - GeoScienceWorld

Three-dimensional seismic geomorphology and ... - GeoScienceWorld

Domain Reduction Method for Three-Dimensional ... - GeoScienceWorld

Optical quasi-three-dimensional correlation

Correlation of an Electrical Penetration Graph Waveform with Walking ...

A Design Method of Code Correlation Reference Waveform in ... - MDPI

Waveform cross correlation for seismic monitoring of ... - arXiv

Waveform Cross-Correlation and Relocation of Seismicity in Western ...

Found in transition: applying milestones to three unique ... - PeerJ

Applying a validity argument model to three examples ...

Applying a Formal Requirements Method to Three NASA ... - CiteSeerX

Waveform evaluations subject to hardware

Correlation chart of the Proterozoic assembly of ... - GeoScienceWorld