Empirical Evaluation of Aleatory and Epistemic Uncertainty in Eastern Ground Motions by Gail Atkinson ABSTRACT Horizontal-component response spectra data for ground motions recorded on hard-rock sites in eastern North America (ENA) are used to explore the aleatory and epistemic uncertainty in ground-motion prediction equations (GMPEs). An all-station sigma, expressing the total calculated scatter of values about a GMPE, ranges from 0.25 to 0:29 log 10 unit. Single-station sigmas, in which the scatter is evaluated station by station relative to a regional GMPE, average in the range of 0.23–0.28. The scatter of observations about site-specific GMPEs (GMPEs developed from multiple events recorded at a single station), which comes the closest to measuring the actual aleatory variability, has average values of 0.22–0.26. Overall, aleatory variability of ground motions in ENA is no larger than that for California, at least for moderate events recorded on hard-rock sites. Epistemic uncertainty is considered by looking at the standard deviation of GMPEs developed separately for each station (i.e., the scatter of predictions rather than the scatter of observations). This exercise suggests that the overall epistemic uncertainty in ENA GMPEs should be at least 0.15 log unit (as a standard deviation of the median GMPEs) in the magnitude–distance range in which the prediction equations can be anchored by empirical data (magnitude < 5:5, distances > 50 km). It should be larger than 0.15 unit at large magnitudes and close distances.
INTRODUCTION Aleatory uncertainty in ground-motion prediction is the random scatter of observations about a median ground-motion prediction equation (GMPE) and is an important component in probabilistic seismic hazard assessment (e.g., McGuire, 2004; Al Atik et al., 2010). A question of interest is whether aleatory uncertainty is similar for earthquake ground motions in eastern North America (ENA) in comparison with that of other regions such as California or whether it might be larger because of the possibly greater variability in stress drop and other such factors. The epistemic uncertainty (uncertainty in what is the correct median GMPE) is also of interest. This note investigates uncertainty in GMPEs in ENA from an empirical perspective. I used a database of ENA response spectra data (Assatourians and Atkinson, 2010; www 130
Seismological Research Letters
Volume 84, Number 1
.seismotoolbox.ca) to develop GMPEs for small to moderate events and explored their aleatory and epistemic uncertainty. I focused first on the Charlevoix region, for which we have sufficient data in magnitude–distance space to develop a reasonable GMPE for small to moderate events (M < 6). The data are not sufficient at large magnitudes to develop GMPEs that are applicable to the magnitude–distance range that controls seismic hazard analyses. However, the GMPEs that can be developed for moderate events can be used to explore both aleatory and epistemic uncertainty.
ALEATORY UNCERTAINTY IN ENA GMPES To evaluate aleatory uncertainty (sigma) in ENA ground motions, I developed a simple GMPE for moderate events recorded on seismograph sites in the Charlevoix region and compared its computed variability (sigma, the standard deviation of observations about the mean) with a similar GMPE developed for southern California. I followed the methodology outlined by Atkinson (2006). In that study, I used moderate-event data recorded in the Los Angeles basin (downloaded from the U.S. Geological Survey’s ShakeMap web site) to develop an overall GMPE for a reference ground condition (B/C boundary) for sites in southern California. I then evaluated the mean residual and its standard deviation on a site-by-site basis to compare single-station sigma values with the overall sigma values. That study (Atkinson, 2006) provides a baseline of typical computed aleatory variability for southern California sites for moderate events for both the overall aleatory variability (sigma) and the single-station sigma. To evaluate ENA variability in a similar way to that used by Atkinson (2006), I started with a database of pseudoacceleration response spectra (PSA; 5% damped) observations obtained from six 3-component seismographs in the Charlevoix zone operated by the Geological Survey of Canada. The reader is referred to Assatourians and Atkinson (2010) for details of the record compilation and processing; these data are catalogued at www.seismotoolbox.ca. Figure 1 shows the locations of the Charlevoix stations in relation to earthquakes for which we have recordings. All six Charlevoix stations are believed to be sited on hard rock, although this study will show that one of the stations, A21, has a significant site response and is therefore probably not actually a hard-rock site. The Charlevoix zone is selected because it provides a reasonable January/February 2013
doi: 10.1785/0220120096
distribution of earthquake magnitudes and distances for development of an empirical GMPE. The data distribution in magnitude–distance space for the Charlevoix stations is shown in Figure 2, in which the magnitude variable is Nuttli magnitude, M N . M N is used as it is available for all events and is appropriate for the purposes of this study; the distance variable is hypocentral distance (Rhypo ). The PSA data from Charlevoix are sufficient to develop empirical GMPEs for the magnitude range M N 3.5 to ∼5:5 at distances of Rhypo ∼ 20 to 600 km. A simple functional form is selected (to be applied to each of several frequencies) following that used for the California study of Atkinson (2006): log 10 PSA c0 c1 M N c2 M N 2 c3 log Rhypo c4 Rhypo :
(1)
As in the California study, a least squares regression is used. This simple regression approach is sufficient for the purposes of this study; the data distribution with distance is relatively uniform with magnitude, and we have no plans to use these GMPEs beyond this limited magnitude–distance range. Thus, we are not particularly concerned with magnitude–distance trade-offs or potential bias in the equations themselves. The log(base 10) of the geometric mean of the two horizontal components of PSA is the regression variable. The distribution of residuals (defined as log [observed PSA/predicted PSA]) for this regression is shown for a sample frequency in Figure 3. An interesting observation is that the effects of smoothing by the simple functional form through complications in propagation, such as the Moho bounce effects (Burger et al., 1987), can be seen in the residuals. These effects manifest in a trilinear shape in the residuals, which is why hinged trilinear forms have been proposed for more detailed regressions of attenuation shape (e.g., Atkinson, 2004). Overall, the residuals tend to be slightly positive at Rhypo < 30 km, slightly negative at Rhypo 30–100 km, and neutral at
▴
Figure 1. Location of Charlevoix stations (triangles) and recorded earthquakes (circles, with size of circle representative of size of event).
▴
Figure 2. Distribution of Charlevoix PSA data in magnitude– distance space.
Rhypo > 200 km. Despite this apparent deficiency in the functional form, the overall variability (sigma) is actually quite small, especially considering that this variability includes site-to-site variability and model misfit components (and is thus an overestimate of the true aleatory variability due to source and path effects). Table 1 compares the values of sigma for the Charlevoix GMPE with those obtained by Atkinson (2006) for the corresponding regression of ShakeMap data in southern California. The slightly smaller ENA sigma relative to southern California may reflect more homogeneous path
▴
Figure 3. Regression residuals for horizontal-component (geomean) PSA at 1 Hz for all Charlevoix station data.
Seismological Research Letters
Volume 84, Number 1
January/February 2013
131
Table 1 Comparison of Aleatory Variability (Sigma in Log10 Units) in Charlevoix (This Study) to that Obtained for Southern California (Atkinson, 2006) All Station Frequency (Hz) 0.3 1 3.3 PGA PGV
Charlevoix 0.291 0.273 0.258 0.266 0.252
Individual Station (Averaged) California 0.279 0.288 0.323 0.307 0.296
Charlevoix 0.280 0.258 0.225 0.227 0.231
California 0.260 0.268 0.295 0.268 0.277
The subdivision of residuals by station reveals a significant site response peak (about 0.38 log unit at 5 Hz) for station A21. PGA, peak ground acceleration; PGV, peak ground velocity.
and site conditions in the east or possibly better calibration and quality control of the stations and processing procedures. If the sigmas are averaged by station (as in the study of Atkinson [2006]), we can obtain single-station sigmas for the six Charlevoix stations. The average single-station value (for the six Charlevoix stations) is shown in Table 1 in comparison with average single-station values for California. It appears that sigma (and thus the aleatory variability) is slightly less for ENA than that for California for both the global and single-station sigma cases. An F -test of the significance of the differences between the values of standard deviation reveals that we cannot reject (at the 95% confidence level) the hypothesis that the values of sigma in the two regions are the same. Therefore, making the common assumption that the aleatory variability is equal in the two regions is reasonable. Finally, when averaging the residuals by station, I noted that station A21 has a significant average residual, > 0:3 log unit (factor of 2) at most frequencies, indicating a significant site response. This suggests that A21 is probably not actually sited on hard rock.
equations (for a given magnitude and distance) provide some measure of epistemic uncertainty in developing GMPEs from the database. However, it should be emphasized that not all components of epistemic uncertainty in ENA GMPEs are necessarily represented within the calculated epistemic variability. Specifically, the interevent component of variability due to source variability might be larger for ENA in general than that for the earthquakes in the area considered (Charlevoix and the surrounding region). Similarly, the variability in path effects for ENA in general might be larger than that for the study region. Thus, the actual epistemic uncertainty in ENA GMPEs might be larger than that indicated by variability of the site-specific GMPEs as determined in this study. Each GMPE will have an associated aleatory variability, calculated as the standard deviation of the residuals for the data used to develop it. This sigma is a good measure of the actual aleatory variability of motions that will be received at a site, due
SITE-SPECIFIC GMPES We can explore aleatory uncertainty in more detail, and partially investigate epistemic uncertainty, by developing sitespecific GMPEs for each of the Charlevoix stations. There are also three stations in western Quebec (GAC, MNT, and ALFO) and one station east of Charlevoix in the St. Lawrence region (ICQ) that have sufficient data (including data at < 100 km) to explore this approach. In total then, there are 10 stations for which I seek single-station GMPEs. Station distributions of data are shown in Figure 4 for representative stations; in general, there are 35–40 events in the range of M N 3.5–5.8 for each station. The idea is to develop a site-specific GMPE for each station, effectively removing the site-to-site (intraevent) component of variability from the ground-motion model. Each of these site-specific GMPEs will have different coefficients and result in different predicted median motions versus magnitude and distance. The relative discrepancies between the prediction 132
Seismological Research Letters
Volume 84, Number 1
▴
Figure 4. Data distribution for site-specific GMPEs for several representative stations; GAC is in western Quebec; other stations are in Charlevoix.
January/February 2013
to earthquakes of various magnitudes at a range of distances, because it includes the interevent components of variability due to source and path effects, but not the intraevent component (e.g., Al Atik et al., 2010). The magnitudes used here are small (M N 3.5–5.8) but span nearly the same width in magnitude space as is used for typical GMPEs; specifically, we span 2.3 magnitude units as we go from 3.5 to 5.8, whereas most GMPEs are spanning about 2.5 magnitude units as they go
from 5 to 7.5. The data used here are sparse, in addition to being at small magnitudes. Nevertheless, they provide a reasonable sample that will provide some potentially useful insights into uncertainty in predicting ground motions at a site. For each station, I performed a regression (for the horizontal geomean) to a simplified version of equation (1). I simplified equation (1) by omitting the quadratic term in magnitude, and so I obtained a site-specific GMPE for each
▴
Figure 5. Example of data and predictions for site-specific GMPE for station A21. PSA observations for station A21 are shown as circles, with the symbol size scaling with magnitude. The bars show corresponding predicted amplitudes (from the regression) for a few selected magnitude values that span the regression range to provide a sense of the GMPE for that site.
Seismological Research Letters
Volume 84, Number 1
January/February 2013
133
of the four selected frequencies (0.5, 1, 5, and 10 Hz) using log PSA c0 c1 M N c3 log Rhypo c4 Rhypo :
(2)
I decided to omit the additional term in M 2N (in equation 1) because of the paucity of data that results when only a single station is being used to develop a GMPE. I checked the results obtained by including the quadratic magnitude term (in M 2N ) and found that this changed the predicted ground motions very little, with just a tiny reduction in sigma (< 0:01). I therefore chose the more stable and simple linear term in magnitude, but it may be noted that further small reductions in sigma would result if a quadratic term were added (i.e., the true aleatory variability is ∼0:01 unit smaller than that calculated). A further simplification was introduced for frequencies 0.5 and 1 Hz because preliminary regressions indicated that for these frequencies, the coefficient c4 is usually either positive (curvature trending up instead of down) or statistically insignificant. Thus, for frequencies 0.5 and 1 Hz, the term c4 is set equal to zero. The residuals of these site-specific GMPEs have zero mean (by definition) and a standard deviation, sigma. This sigma includes an event component of the aleatory variability (as each recording is actually a different event) and a path component (as each recording follows a different path and is at a different distance); there is no site component, as all recordings used in a regression are at the same site. The sigma from the site-specific GMPEs is a reasonable representation of the aleatory variability that will be realized at a specific site, although it differs from the usual definition of aleatory variability that is obtained from a multisite (and often multiregion) empirical GMPE development (the latter tends to overestimate the actual aleatory variability). It is slightly more site specific than the single-station sigma defined by Atkinson (2006), in which sigma was calculated with reference to a global GMPE devel-
oped for the study region by breaking down the sigma from a composite GMPE by station. By contrast, the site-specific sigma defined here will be with reference to a site-specific GMPE. Figure 5 shows typical regression results of equation (2) for an example station (A21). In the figure, the observations (geomean, horizontal component) are compared with the prediction equations determined by the regression for that station. The estimate of aleatory variable (standard deviation of residuals) is given in Table 2 for each station for which a regression was performed. Average values are in the range 0.26–0.22, decreasing as frequency increases. This is our best representation of the actual aleatory variability in ground motion received at a site for an event with known M N . In comparing the GMPEs between stations, I noted that the amplitudes are relatively large amplitudes for station A21, especially at high frequencies. This is probably why A21 clipped during the M N 5.5 Riviere du Loup event, although some closer stations did not; the clipped signal was removed from the PSA database and so does not appear in Figure 5. Station A21 also shows greater variability than do the other stations. Table 2 shows that the computed aleatory variability tends to decrease with increasing frequency. It should be noted that for events with a known moment magnitude (instead of known M N ), the aleatory variability would be expected to trend in the opposite direction, being smallest at low frequencies and larger at high frequencies. This is because M N is a high-frequency magnitude, whereas moment is a low-frequency magnitude. An interesting implication is that sigma for use in probabilistic seismic hazard analysis could be minimized by developing the entire analysis (from magnitude recurrence statistics through to GMPEs) using an appropriate magnitude scale. This was originally pointed out by Atkinson and Hanks (1995), who proposed use of a high-frequency magnitude scale to improve the characterization of events in terms of their high-frequency ground-motion generation. In other words, to conduct a hazard analysis for low-frequency motions,
Table 2 Aleatory Variability as Computed for Site-Specific GMPEs by Station (Log10 Units) Sigma Station No. 1 2 3 4 5 6 7 8 9 10 Average
Name A11 A16 A21 A54 A61 A64 ALFO GAC ICQ MNT
Latitude 47.243 47.471 47.704 47.457 47.693 47.826 45.628 45.703 49.522 45.503
Longitude −70.198 −70.006 −69.690 −70.413 −70.090 −69.892 −74.884 −75.478 −67.272 −73.623
No. of Observations 34 35 36 37 37 39 16 42 20 20
0.5 Hz 0.259 0.262 0.410 0.209 0.274 0.246 0.258 0.187 0.259 0.213 0.258
GMPE, ground-motion prediction equation.
134
Seismological Research Letters
Volume 84, Number 1
January/February 2013
1 Hz 0.283 0.258 0.396 0.214 0.236 0.222 0.240 0.184 0.211 0.197 0.244
5 Hz 0.201 0.191 0.324 0.228 0.190 0.159 0.281 0.261 0.135 0.189 0.216
10 Hz 0.213 0.204 0.332 0.234 0.195 0.182 0.227 0.297 0.131 0.196 0.221
moment magnitude should be used, whereas to conduct a hazard analysis for high frequencies, a high-frequency magnitude should be used if one wishes to minimize aleatory uncertainty. Epistemic Uncertainty The epistemic uncertainty in ENA GMPEs can be evaluated by looking at the spread of predicted values from the various sitespecific GMPEs developed for Charlevoix, western Quebec
(GAC, MNT, and ALFO) and the lower St. Lawrence (ICQ) sites. Figure 6 shows an example of the predicted PSA values from the regression results for each site for the magnitude range M N 5.0–5.5. Figure 6 can be thought of as a scatter plot of predictions (or epistemic uncertainty) rather than a scatter plot of observations (aleatory uncertainty). The use of a 0.5-unit magnitude range implies that we are looking at epistemic uncertainty over this breadth of magnitude.
▴
Figure 6. Predictions of the site-specific GMPEs for each station for events in the magnitude range M N 5.0–5.5 at four frequencies (symbols). The predictions are plotted for the magnitude–distance combinations corresponding to the observations. For comparison, a generic low GMPE for M N 5.0 (M 4.5) and a high GMPE for M N 5.5 (M 5.0) for rock sites in ENA are also plotted (lines).
Seismological Research Letters
Volume 84, Number 1
January/February 2013
135
For comparison with the site-specific GMPE predictions, a typical range of epistemic uncertainty believed to apply to regional ENA GMPEs (for the same magnitude range) is also plotted in Figure 6. This range is obtained from the recommended range (from low to high GMPEs) for the 2012 Canadian national hazard map computations (G. Atkinson and J. Adams, unpublished manuscript, 2012). To summarize, this range was obtained by taking the mean log amplitudes plus/minus a standard deviation from five alternative ENA GMPEs (AB060 and A080 from Atkinson and Boore [2011],
Pezeshk et al. [2011], and Silva et al. [2002], single- and double-corner models). The mean amplitudes define a central GMPE, whereas the plus/minus one standard deviation values define a low and high GMPE representing epistemic uncertainty in the best GMPE. The epistemic uncertainty band (from plus/minus one standard deviation) was further widened by 0:1 log 10 unit at close distances to reflect greater near-source epistemic uncertainty that may apply in ENA (relative to that provided by the standard deviation of the alternative equations). To represent this range in Figure 6, I plotted the
▴
Figure 7. Site-specific GMPEs for M N 4.5 (dashed) and M N 5.5 (solid); station A21 is indicated with plus symbols. Heavy solid line shows high regional GMPE from G. Atkinson and J. Adams, unpublished manuscript (2012) for M N 5.5 (M 5.0; all for hard rock).
136
Seismological Research Letters
Volume 84, Number 1
January/February 2013
low GMPE curve for the lower bound magnitude and the high GMPE curve for the upper bound magnitude. The ENA GMPEs are for moment magnitude (M), so to plot values corresponding to M N , a conversion of −0:5 magnitude unit to go from M N to M was assumed, based on the empirical conversion of Sonley and Atkinson (2005). The ENA GMPEs are for B/C boundary, whereas the site-specific GMPEs are for a hard-rock condition; thus, a small correction of the GMPEs from B/C to hard rock was made, based on the average factors for M 5 as deduced from Atkinson and Boore (2006). This correction is not critical because it is relatively small (∼0:1 unit is subtracted from the GMPEs for B/C to obtain equivalent hard-rock GMPEs) and because our main interest in this note is in the relative width of the uncertainty band as opposed to its absolute level.
▴
The overall impression from Figure 6 is that the proposed epistemic uncertainty bounds for ENA GMPEs (G. Atkinson and J. Adams, unpublished manuscript, 2012) are in reasonable accord with the range of prediction models from site-specific GMPEs. It is also noteworthy that the GMPEs for station A21 (station number 3) are always high relative to the corresponding amplitude predictions for other stations. This suggests that noticeably high amplitudes that are sometimes observed in ground-motion amplitude datasets and that typically map into the scatter attributed to aleatory variability may be a systematic site-response issue. This highlights the importance of modeling site response in ground-motion predictions. In ENA, in particular, typical generic factors based on site class or shear-wave velocity may not be sufficient to accommodate site-response
Figure 8. Standard deviation of site-specific GMPEs.
Seismological Research Letters
Volume 84, Number 1
January/February 2013
137
issues, particularly when soft soils overlie hard rock. It is important to consider such site-response issues when deciding on generic conversion factors to go from one site condition to another. An overall assessment of the epistemic uncertainty in developing the GMPEs can be made by quantifying the spread of prediction equations for a given M N and Rhypo . Figure 7 shows the developed GMPEs for M N 4.5 and 5.5. Note that these equations are reasonably well constrained by data at distances > 20 km and poorly constrained at closer distances; ground motions within 10 km of the epicenter are not constrained at all by data and should not be accorded any credence. All stations except A21 (the Charlevoix station showing large site response) have GMPEs that fall below the high GMPE for ENA from G. Atkinson and J. Adams, unpublished manuscript (2012). It is observed that the site-specific GMPEs are very similar to each other, especially at high frequencies. They are relatively tightly clustered for distance ranges that are well constrained by data; not surprisingly, the scatter becomes larger as the GMPEs are extended in to close distances for which they are not constrained. The lack of data constraints at close distances (and large magnitudes) is the key reason why GMPEs for ENA hazard applications are developed from model-based methods that do not rely on empirical regression. The standard deviation of the site-specific GMPEs about their mean, expressing epistemic uncertainty in regression for the given dataset, is plotted in Figure 8 for the range of magnitudes and distances considered. The spread in standard deviations for different magnitudes reflects the overall variability in the calculated epistemic uncertainty; the distance dependence is attributable to fact that the data constrain the GMPEs better at regional distances than at close distances. This plot suggests that the epistemic uncertainty in median GMPEs for moderate events is quite low at regional distances, about 0.15–0.20 log unit. At intermediate distances, it is in the range 0.2–0.25 unit. At close distances, the epistemic uncertainty indicated in Figure 8 is not meaningful because the site-specific GMPEs are not at all constrained by data. It is important to recognize that the epistemic uncertainty indicated in Figure 8 is not a true measure of the total epistemic uncertainty in ENA GMPEs in general, but rather just an expression of epistemic uncertainty in the site-specific GMPEs for moderatemagnitude events as derived here via empirical regression. Figure 8 does suggest, however, that the overall epistemic uncertainty in ENA GMPEs should be at least 0.15 log unit (as a standard deviation of the median GMPEs) in the magnitude– distance range in which the prediction equations can be anchored by empirical data. By implication, it should be larger than 0.15 unit at large magnitudes and close distances. Note that
138
Seismological Research Letters
Volume 84, Number 1
while this study places lower limits on epistemic uncertainty, it cannot constrain its upper limits.
ACKNOWLEDGMENTS This study evolved from stimulating discussions with John Adams over the appropriate values for aleatory and epistemic uncertainty in eastern North America. I thank two anonymous reviewers for their constructive suggestions. The financial support of the Natural Sciences and Engineering Research Council is also acknowledged.
REFERENCES Al Atik, L., N. Abrahamson, F. Cotton, F. Scherbaum, J. Bommer, and N. Kuehn (2010). The variability of ground-motion prediction models and its components, Seismol. Res. Lett. 81, 794–801. Assatourians, K., and G. Atkinson (2010). Database of processed time series and response spectra for Canada: An example application to study of the 2005 M N 5, 4 Riviere du Loup, Quebec earthquake, Seismol. Res. Lett. 81, 1013–1031. Atkinson, G. (2004). Empirical attenuation of ground motion spectral amplitudes in southeastern Canada and the northeastern United States, Bull. Seismol. Soc. Am. 94, 1079–1095. Atkinson, G. (2006). Single-station sigma, Bull. Seism. Soc. Am. 96, 446–455. Atkinson, G., and D. Boore (2011). Modifications to existing groundmotion prediction equations in light of new data, Bull. Seismol. Soc. Am. 101, 1121–1135. Atkinson, G., and D. Boore (2006). Ground motion prediction equations for earthquakes in eastern North America, Bull. Seismol. Soc. Am. 96, 2181–2205. Atkinson, G., and T. Hanks (1995). A high-frequency magnitude scale, Bull. Seismol. Soc. Am. 85, 825–833. Burger, R., P. Somerville, J. Barker, R. Herrmann, and D. Helmberger (1987). The effect of crustal structure on strong ground motion attenuation relations in eastern North America, Bull. Seismol. Soc. Am. 77, 420–439. McGuire, R. (2004). Seismic Hazard and Risk Analysis, Vol. 6, EERI Monograph MNO-10, Earthq. Eng. Res. Inst., Oakland, California, 150 pp. Pezeshk, S., A. Zandieh, and B. Tavakoli (2011). Ground-motion prediction equations for eastern North America from a hybrid empirical method, Bull. Seismol. Soc. Am. 101, 1859–1870. Silva, W. J., N. J. Gregor, and R. Darragh (2002). Development of regional hard rock attenuation relations for central and eastern North America, Technical Report, Pacific Engineering and Analysis, El Cerrito, California, 80 pp., www.pacificengineering.org. Sonley, E., and G. Atkinson (2005). Empirical relationships between moment magnitude and Nuttli magnitude for small earthquakes in southeastern Canada, Seismol. Res. Lett. 76, 752–755.
Department of Earth Sciences University of Western Ontario London, Ontario N6A 5B7, Canada
[email protected]
January/February 2013