Estimation of the Maximum Possible Magnitude in the F

1 downloads 0 Views 279KB Size Report
and Richter, 1956) truncated at M, the standard maximum likelihood estimation results again in the maximum observed magnitude. Using theoretical results from ...
 

           Originally published as:          Holschneider, M., Zöller, G., Hainzl, S. (2011): Estimation of the Maximum Possible Magnitude in the  Framework of a Doubly Truncated Gutenberg‐Richter Model. ‐ Bulletin of the Seismological Society of  America, 101, 4, 1649‐1659,   DOI: 10.1785/0120100289 

Estimation of the Maximum Possible Magnitude in the Framework of a Doubly-Truncated Gutenberg-Richter Model Matthias Holschneider 1 , Gert Z¨oller 2 , and Sebastian Hainzl

3

Abstract We discuss to what extent a given earthquake catalog and the assumption of a doublytruncated Gutenberg-Richter distribution for the earthquake magnitudes allow for the calculation of confidence intervals for the maximum possible magnitude M . We show that, without further assumptions like the existence of an upper bound of M , only very limited information may be obtained. In a frequentist formulation, for each confidence level α the confidence interval diverges with finite probability. In a Bayesian formulation, the posterior distribution of the upper magnitude is not normalizable. We conclude that the common approach to derive confidence intervals from the variance of a point estimator fails. Technically, this problem can be f for the maximum magnitude. Then, the Bayesian overcome by introducing an upper bound M

posterior distribution can be normalized and its variance decreases with the number of observed events. However, since the posterior depends significantly on the choice of the unknown value f, the resulting confidence intervals are essentially meaningless. The use of an informative of M prior distribution accounting for pre-knowledge of M is also of little use, because the prior is

only modified in the case of the occurrence of an extreme event. Our results suggest that the maximum possible magnitude M should be better replaced by MT , the maximum expected magnitude in a given time interval T , for which the calculation of exact confidence intervals becomes straightforward. From a physical point of view, numerical models of the earthquake process adjusted to specific fault regions may be a powerful alternative to overcome the shortcomings of purely statistical inference. 1

Institute of Mathematics, University of Potsdam, Karl-Liebknecht-Str. 24, 14476 Potsdam, Germany; Email:

[email protected] 2

Institute of Mathematics, University of Potsdam, Karl-Liebknecht-Str. 24, 14476 Potsdam, Germany; Email:

[email protected] 3

GFZ German Research Centre for Geosciences, Telegrafenberg, 14473 Potsdam; Email: [email protected]

1

1

Introduction

The estimation of the maximum possible magnitude M in a tectonic setting is a crucial point for seismic hazard assessment. Such estimates are usually derived from historic earthquakes. The methods that are used to constrain M can be roughly divided into empirical and statistical approaches. In the simplest empirical approach, the magnitude of the largest historic event is considered to be the maximum possible magnitude in a seismic zone (Reiter , 1991). Apart from high uncertainties in the magnitude determination, this approach ignores the possibility of both missing earthquakes in the past and stronger earthquakes in the future. A simple way to address this problem is to add a constant increment to the maximum observed magnitude. Other empirical methods constrain M by means of geological and paleoseismic data or by tectonic analogs, i.e. improving the database by using records from regions that share tectonic features with the study region. An overview is given by Wheeler (2009). Statistical approaches use point estimators of M based on an extrapolation of the frequency-size distribution. Imposing the Gutenberg-Richter law (Gutenberg and Richter , 1956) truncated at M , the standard maximum likelihood estimation results again in the maximum observed magnitude. Using theoretical results from statistics, the best point estimators in terms of low variance can be constructed (Pisarenko, 1991). For example, Kijko (2004) presents point estimations of M for a frequency-size distribution that deviates moderately from the Gutenberg-Richter law, as well as a nonparametric approach, where no frequency-size distribution has to be specified. While different point estimators have pros and cons, the calculation of confidence intervals is mostly missing or performed in approximation. Bayesian estimations as discussed by Cornell (1994) allow for a more straightforward calculation of confidence intervals; there is, however, a dependence on the imposed prior distribution. In the following, these approaches will be discussed in more detail. Assuming a family of frequency-size distributions for earthquake magnitudes, e.g.

a doubly-

truncated Gutenberg-Richter (DTGR) distribution depending on parameters like the Richter b value and the upper magnitude bound M , and given a number of independent observations, a

2

Bayesian approach allows use to address the question: what do we actually learn from the data under the DTGR modeling assumption? A different approach is based on point estimators of the underlying parameters. It is a common belief that the “best estimator” might be given by the least-variance unbiased estimator (Kolmogorov , 1950). Indeed, when interested in a point estimation, this is a valuable concept of optimality (Cornell, 1994; Pisarenko et al., 1996; Kijko, 2004). Although such a point-estimator may be constructed, it is not clear for which practically relevant question this estimator is optimal. In the present work, however, we study the following question which may be of direct practical relevance, e.g. for the insurance of critical infrastructures: Given a confidence level α, say α = 5%, can we calculate a magnitude m∗ from an earthquake catalog such that the true maximum magnitude, whatever the value is, exceeds m∗ with a probability less than α? After briefly summarizing the DTGR model, we study this question using both the frequentist and the Bayesian approaches. Both approaches show that the question above cannot be answered solely on the basis of the DTGR law and a given earthquake catalog. In the frequentist approach we shall see that there is a lower limit of the confidence level α depending on the earthquake sample and the Richter b value, down to which confidence intervals can be calculated. In many realistic situations, this limit is, however, still very high and therefore estimations of the maximum possible magnitude are not reliable for practical purposes. If an upper bound of the maximum magnitude is assumed, the arising problems can be solved technically, but the results depend crucially on this unknown parameter. In the context of Bayesian estimations of maximum possible magnitudes, it is frequently considered as a key question to find an appropriate informative prior distribution. Finally, we demonstrate that an informative prior distribution of the maximum possible magnitude does not necessarily improve the estimate for typical earthquake catalogs.

3

2

The Doubly-Truncated Gutenberg-Richter (DTGR) Model

The Gutenberg-Richter relation (Gutenberg and Richter , 1956) serves as a well-established law to predict earthquake probabilities as function of the magnitude. While the validity of this law is commonly accepted for small and intermediate earthquakes, the behavior for large magnitudes is controversial. Physical models for regional seismicity sometimes favor the characteristic earthquake distribution including a large “characteristic” event that occurs more frequently than predicted by the Gutenberg-Richter law. However, model selection based on statistical testing is difficult because of the small number of large events. In the present work, we impose the Gutenberg-Richter law between a lower and an upper magnitude bound. The methodology is, however, not restricted to this distribution. Replacing the Gutenberg-Richter model by a different model, e.g. a characteristic earthquake model including scaling behavior for small and intermediate earthquakes and a properly parameterized characteristic earthquake “bump”, can be included in a straightforward manner. In Section “Bayesian Estimation of Maximum Magnitude in the DTGR Law” we will demonstrate, however, that the main result of our study does not depend on the details of the imposed frequencysize distribution. In the doubly-truncated Gutenberg-Richter model the probability density function of observed magnitudes m0 ≤ m ≤ M reads pβM (m) =

β exp (−βm)χ[m0 ,M ] (m) , exp (−βm0 ) − exp (−βM )

(1)

where β = log (10) · b represents the Richter-b value and χ[m0 ,M ] (m) is the characteristic function which takes the value 1 on the interval [m0 , M ] and 0 elsewhere. The lower magnitude bound m0 is subject to catalog completeness in a given region and is therefore assumed to be known. The upper magnitude bound M and the Richter-b value are unknown parameters to be estimated. All parameters m0 , M and β are assumed to be constant in time. The distribution function of the DTGR model reads FβM (m) =

Zm

m0

pβM (x)dx =

exp (−βm0 ) − exp (−βm) ; exp (−βm0 ) − exp (−βM ) 4

m0 ≤ m ≤ M.

(2)

We now consider n earthquakes with magnitudes mi (i = 1, . . . , n). Each magnitude is drawn independently from the same Gutenberg-Richter distribution. Therefore the likelihood function of this sample of n independent and identically distributed (i.i.d.) random numbers, {mi } = {m1 , m2 , . . . , mn } is L({m1 , m2 , . . . , mn }|β, M ) = χ[m0 ,M ] (µ)

β n exp (−βnhmi) , [exp (−βm0 ) − exp (−βM )]n

(3)

where µ = max {mi } 1≤i≤n

(4)

is the maximum observed magnitude and hmi =

n 1 X mi . N

(5)

i=1

is the sample mean of the observed magnitudes. For later use, we mention that the probability distribution of the random variable µ for n independent observations is P r(µ ≤ z) = [FβM (z)]n .

(6)

Since the likelihood function depends on the observations only through n, µ and hmi, the FisherNeyman factorization theorem (Fisher , 1922) then shows that n, µ and hmi are sufficient statistics, i.e. these three quantities contain all information about the unknown parameters M and b. We emphasize once more that the observations should represent seismicity with uncorrelated magnitudes, i.e. mi are i.i.d. random numbers. In practice, magnitude data can arise from either historic earthquakes or from a recent earthquake catalog. In the latter case, the data may be declustered (Gardner and Knopoff , 1974; Reasenberg, 1985) in order to fulfil the requirement of independency to a certain degree. We note, however, that declustering techniques primarily remove spatiotemporal earthquake clusters rather than magnitude clusters. It is sometimes more convenient to use n, βobs and µ as sufficient statistics, where βobs =

1 hmi − m0

5

(7)

(Aki , 1965). In this case, the likelihood function reads L({m1 , m2 , . . . , mn }|β, M ) = χ[m0 ,M ] (µ)f (β, M, n, m0 , βobs ),

(8)

with f (β, M, n, m0 , βobs ) =

3

β n exp (−βn(m0 + 1/βobs )) . [exp (−βm0 ) − exp (−βM )]n

(9)

Confidence Intervals

In this Section, we address the following question: Given any confidence level α, 0 < α ≤ 1, can we infer from an earthquake catalog and a DTGR law a magnitude m∗ such that the probability for M > m∗ is at most α? At first, we consider the frequentist approach. For the sake of simplicity we assume that β is actually known. Suppose nothing is known about the seismogenic zone under consideration apart from that the observed seismicity follows the DTGR law with known β and some unknown upper bound M . Suppose a very large, virtually infinite collection, of catalogs is given, each containing n events, and each being obtained from such a DTGR law, where the upper truncation limit may vary from one catalog to the other. In any case, this parameter is completely unknown. The problem is now to give a function from the observed statistics to the real numbers ψ:

[m0 , ∞)n → R,

(10)

(m1 , . . . , mn ) 7→ ψ(m1 , . . . , mn ), such that the assertion M < ψ(m1 , . . . , mn ) is wrong in at most a fraction α of cases. This should hold regardless of how the original very large set of catalogs was obtained. Nothing is assumed about the distribution of the true maximum magnitudes in this setting. In mathematical terms we must have P(ψ(m1 , . . . , mn ) < M |M ) < α for all M.

(11)

The interval [µ, ψ] is called the frequentist or Neyman-Person confidence interval for M . Since n

6

is held fixed, we have a sufficient statistics, µ = max{mi }, and thus we may suppose that ψ = ψ(µ(m1 , . . . , mn ))

(12)

and consider confidence intervals of the form [µ, ψ(µ)].

(13)

The main statement of this paper, which will be derived mathematically in Appendix B, reads in the frequentist formulation as follows: For fixed level α ∈ (0, 1) there is a critical threshold mc mc = m0 − β −1 log(1 − α1/n )

(14)

such that for µ < mc necessarily we must have   exp (−β(µ − m0 )) − 1 1 +1 . ψ(µ) ≥ m0 − log β α1/n

(15)

The right hand side of Eq. (15) defines the smallest possible confidence interval in agreement with the findings of Pisarenko (1991). As the observed maximum magnitude approaches mc , the confidence interval for the actual maximum magnitude diverges to ∞. For µ ≥ mc there lo longer exists a frequentist confidence interval. The singularity in Eq. (15) is illustrated in Fig. 1(a). Now, we address the question: How often does it happen that the observed maximum magnitude µ is above the critical threshold mc and we thus cannot give any constraints for M . This clearly depends on the true M . For M = mc it never happens but as M increases, this probability quickly increases to 1 − α (see Fig. 1(b)). To see this, we have to insert mc from Eq. (14) into Eqs. (2) and (6) P(µ ≥ mc |β, M ) = 1 − [FβM (mc )]n .

(16)

As an example, using typical values m0 = 5, b = 1, n = 10 and α = 0.05, we get P(µ ≥ mc |β, M ) = 0.95. Thus in 95% of the cases we are in the unfortunate situation where M cannot be inferred from catalogs modeled with the DTGR law. Pisarenko (1991) uses Eq. (15) to construct the so-called fiducial distribution ϕµ (M ): For a given true value of M and a sample {mi } represented by the sufficient statistic µ, the probability that 7

M ≤ z is



exp (−βm0 ) − exp (−βµ) P(M ≤ z|µ) = ϕµ (z) = 1 − α = 1 − exp (−βm0 ) − exp (−βz)

n

.

(17)

In terms of Eq. (2) the fiducial distribution ϕ(M ) reads ϕµ (M ) = 1 − [FβM (µ)]n .

(18)

In the fiducial formulation, the problem of diverging confidence intervals discussed above (Eq. 14 and Fig. 1) is expressed by the fact that ϕ(M ) tends to a constant value less than unity, when M → ∞: ϕµ (∞) = 1 − {1 − exp [−β(µ − m0 )]}n = 1 − α0 ;

(19)

that is, the fiducial probability density is not normalized with respect to z. To deal with this problem, Pisarenko (1991) formally allows M = ∞ with the finite probability α0 corresponding to the case µ ≥ mc in Eqs. (15) and (16). He considers this situation to have insufficient sample size n and therefore excludes it from further consideration. In his view, case studies are assumed to be reliable, when M = ∞ is unlikely, in other words α0 is sufficiently small (e.g. α0 < 10−2 ). In our study, we argue that a formal redefinition of M as described above only rephrases the problem of diverging confidence interval, but does not solve it. In practical situations, the upper limit of the confidence interval of M has to be estimated based on a given confidence level α and a given earthquake catalog containing n events in the magnitude range [m0 ; µ]. Without further information, two cases have to be distinguished: h 1. µ < m0 −β −1 log(1−α1/n ): the (1−α) confidence interval is µ; m0 −

1 β

log

n

exp [−β(µ−m0 )]−1 α1/n

oi +1

2. µ ≥ m0 − β −1 log(1 − α1/n ): no information on M can be extracted from the data, because the upper limit of the confidence interval diverges. In the framework of the DTGR law and a given earthquake catalog, the problem of diverging confidence interval cannot be overcome. f for the magnitude Technically, finite confidence intervals are achieved, if an absolute upper bound M is assumed,

f. M ≤M 8

(20)

From a physical point of view, this assumption can be justified, because limitations in the fault size or available energy will prevent earthquakes from becoming infinitely large. To our knowledge, no earthquake with magnitude m ≥ 9.5 has been reported so far. Using this additional constraint will modify Eq. (15) to     exp (−β(µ − m0 )) − 1 1 f − 1 , M ψ(µ) = min m0 − log β α1/n

(21)

leading to finite confidence intervals for all types of earthquake catalogs and all confidence levels f, which is an unknown parameter, Eq. (21) α. However, because results will clearly depend on M is not useful for calculating confidence intervals in practical situations. This difficulty is intuitively

clear: the upper cutoff of the distribution can only be explored using large events, which are very rare. So from a finite number of observations only very limited information about this parameter can be obtained based on pure statistics. In the next section, the same kind of conclusion will be obtained from a Bayesian point of view.

4

Bayesian Estimation of Maximum Magnitude in the DTGR Law

The central questions in this Section are: What can be learned about M and b from the observational data in a Bayesian framework (Bernardo and Smith, 1994; Z¨ oller et al., 2010)? And to what extent, if at all, can small events in a catalog be used to better estimate M ? More precisely, what inferences can be drawn from the posteriori joint probability density P(β, M |{m1 , . . . , mn }) given the observations? This will, of course, depend on what we know a priori about the parameters β and M . Suppose nothing is known about M , so we assume a flat prior for M , and some proper prior distribution with density p(β) for β (or b = β/ log (10)). The joint prior distribution for (β, M ) is then the improper prior with density p(β)dβdM . The posterior density after the observations {mi } (i = 1, . . . , n) have been made, then reads (up to a constant normalization factor) P(β, M |{mi }) ∼ h(β, M, m0 , n, µ, hmi) =

β n exp (−βnhmi)χ[m0 ,M ] (µ) p(β), [exp (−βm0 ) − exp (−βM )]n 9

β > 0.

(22)

For later use, we rewrite this expression as h = g(β)[exp (−βm0 ) − exp (−βM )]−n χ[m0 ,M ] (µ)

with g(β) = β n exp (−βnhmi)p(β)

(23)

Note that the data only enters through the sufficient statistics n, hmi and µ as it should. This expression contains all information about β and M in the light of our modeling assumptions together with our prior belief about the parameters. We now consider various inferences about M and β from their posterior distribution. The function M 7→ h(β, M, m0 , n, µ, hmi) decreases monotonically for any choice of the other parameters, from g(β)[exp (−βm0 ) − exp (−βM )]−n to g(β) exp (nβm0 ) as M → ∞. When considering the marginal distribution for M P(M |{mi }) =

Z

P(β dβ, M |{mi }),

(24)

we obtain again a monotonically decreasing function of M . From this it follows that, whatever we know a priori about β, the most probable maximum magnitude is simply the maximum observed magnitude µ (Pisarenko et al., 1996) . ˆ max M

posterior

= µ,

(25)

which would be the maximum posterior estimator for M . If however we would like to construct some posterior confidence intervals around this value, we encounter the following problem. Since the posterior distribution tends towards some positive constant, it cannot be normalized, i.e. Z

P(M dM |{mi }) = ∞.

(26)

This seemingly purely mathematical statement has important practical consequences. Without any suitable prior assumptions about M , we cannot infer any information about M in the following sense: given an interval of magnitude values, the posterior probability for M to be in that interval cannot be computed. Therefore, assuming a DTGR law within a Bayesian setting does not allow to gain information about the maximum magnitude from a historic earthquake catalog alone, even if the catalog has high quality. This does not mean that earthquake catalogs are essentially useless 10

in this respect; we rather need additional information or additional model assumptions in order to constrain M in terms of a posterior distribution that can be normalized. This question will be addressed in the next two Sections. Furthermore, one might suspect that the divergence of the posterior distribution to a constant value might be related to the sharp truncation of the probability distribution at M . We show that our result is independent of the particular shape of the distribution at large magnitudes and also holds e.g. for smoothly decaying distributions like the tapered Pareto distribution (also called modified Gutenberg-Richter law) discussed by Kagan and Schoenberg (2001). The reason is illustrated in Fig. 2: For M ≫ µ, all distributions pβM (m) for different values of M become essentially identical on the interval [m0 ; µ], where earthquakes are observed. In mathematical terms: On [m0 ; µ] the density pβM (m) converges uniformly to a limit distribution p∗β (m) if M → ∞. Consequently, all values of M become equally likely in this limit, and therefore the posterior distribution of M converges to a positive constant for M → ∞ and cannot be normalized, independent of the details of pβM (m) at large magnitudes. As an example, we refer to the tapered Pareto distribution: the divergence of the likelihood function is shown in Fig. 2 in Kagan and Schoenberg (2001). The previous statement seems to be in contrast to common approaches to produce point estimates of M from observations {mi }. Indeed, based on the earlier findings (Rao, 1945; Blackwell, 1947; c of the maximum Kolmogorov , 1950), Pisarenko et al. (1996) derive the unbiased point estimator M magnitude M with the lowest possible variance. It is expressed using the Gutenberg-Richter density

pβM (m) from Eq. (1) as c = µ + [npβµ (µ)]−1 = µ + 1 [exp (β(µ − m0 )) − 1]. M nβ

(27)

c is The variance of the estimator M c] = var[M

2 ZM  1 n (x) − M 2 . [exp (β(x − m0 )) − 1] dFβM x+ nβ

(28)

m0

From this equation it is seen that the variance depends on the unknown value M . In Kijko (2004), an alternative estimator has been proposed based on an iterative procedure. The final result, 11

however, is again a point estimator whose variance depends on M . Consequently, the common way to use a fixed multiple of the standard variation of the estimators around its observed value to give an error bar fails and in particular Eq. (28) cannot be used to provide a confidence interval. The estimation of the variance itself as in Pisarenko et al. (1996) does not resolve the problem either, since the estimator of the variance has a variance on its own that should also be taken into account. This “closure” problem cannot be resolved, which is again a consequence of the non-normalizability of the posterior. The fact is, there exists no Bayesian confidence interval nor, as we have shown, a frequentist confidence interval with finite probability.

5

Estimation of Maximum Magnitude M Given that M has an f Upper Bound M

The problem that P(β, M |{mi }) in Eq. (22) cannot be normalized does not occur if some suitable a priori information about M is available. As in the frequentist formulation in Section “Confidence f to M , so that the density of the prior Intervals”, we suppose that there is an upper bound M f. For reasonable estimates in the region of M distribution satisfies p(β, M ) = 0 for M ≥ M

f = 8.0, 9.0, 10.0, . . . should also be suitable and should under consideration, different values M f do not represent realistic earthquake provide similar results. We emphasize that these values of M

scenarios; we rather use them to make the posterior distribution normalizable and then ask the question: How sensitive is the posterior distribution to this unknown value? If this dependence f will allow for a reasonable Bayesian estimate of the maximum is weak, the introduction of M

magnitude M . However, the inferences we make crucially depend on precisely this upper bound f. For this reason the assumption of a physically possible magnitude does not help to overcome M the problems of estimating M within a Bayesian framework, as can be seen in the following.

We will suppose for simplicity that the prior for β is flat between two bounds [βmin ; βmax ]. The

12

posterior density is then P(β, M |{mi }) ∼ f (β, M, n, m0 , βobs )χ[µ,M f] (M ) χ[βmin ,βmax ] (β)

(29)

with f from Eq. (9), where n, m0 , and βobs are number, minimum magnitude and the Aki-estimated b value (= βobs / log (10)) of the observational data. We now want to analyze and visualize these posterior densities. One way is to generate Monte-Carlo samplings of observations. Each such sample would produce a different posterior distribution. The ensemble of such posteriors would however be difficult to visualize. One could compute expectation values of these posteriors which would yield quantities which are in spirit similar to the Fisher information. Here, instead we will look at the outcome of typical samples which can be done without MonteCarlo simulation using the parameterization of the posterior distributions through n, m0 , βobs , µ f as in Eq. (9). In a first example, we choose values for n and m0 . This resembles a sample and M

of n historic earthquakes with m ≥ m0 . Imposing additionally a Gutenberg-Richter distribution log10 (N ) = a − bobs m with bobs = βobs / log (10), results in a dependence of µ on bobs . Let us consider that the maximum observed magnitude µ corresponds to N ≈ 1 in a Gutenberg-Richter distributed sample, which leads to µ = m0 + (1/bobs ) log10 (n). In Fig. 3, we show results of the profiled posterior for fixed β = βobs obtained by restricting the posterior as given by Eq. (29) to the line β = b log (10) and normalizing suitably P(M |{mi }, β = βobs ) =

f (β, M, n, m0 , βobs )χ[µ,M f] (M ) f RM

.

(30)

f (β, M, n, m0 , βobs )dM

µ

f; the We use n = 20, m0 = 6.0, and calculate the profiled posterior for three different values of M

resulting values of µ and hmi are provided in the caption of Fig. 3. The three panels in in Fig. 3 provide results for different values of β = βobs (= 0.8 · log (10) in panel a, 1.0 · log (10) in panel b, f = 8.0, M f = 9.0, 1.2 · log (10) in panel c). Each panel includes three curves corresponding to M

f = 10.0, respectively. The vertical lines represent the 50% confidence interval above µ, i.e. ∆m M

13

with

µ+∆m Z

P(M dM |{mi }, β = βobs ) = 0.5.

(31)

µ

f. Higher b values indicating a In all cases, the posterior density depends clearly on the choice of M

small ratio of large to small earthquakes are characterized by a more rapid decay of the posterior distribution. Since the b value is unknown for a given set of observations, it is reasonable to calculate the marginal density of Eq.(29) associated with M instead of the profiled density.

The results given in Fig. 3 are related to specific examples of real earthquake samples. However, we expect that the number of observations n as well as the maximum observed magnitude µ will be the most important parameters driving the posterior density of the true maximum magnitude M . Therefore we focus on the 50% confidence interval introduced above as a function of these f = 10.0 and calculate the marginal two parameters. We choose a large upper magnitude bound M density for βmin = log (10) · 0.5 and βmax = log (10) · 1.5: P(M |{mi }) =

βZmax

P(M, βdβ|{mi }).

(32)

βmin

The result is given in Fig. 4(a). We emphasize that a point in this plot does not provide a general confidence interval for given values of n and µ; rather it represents a confidence interval for a typical example of an earthquake sample with these parameters. The maximum observed magnitude is M0 < µ ≤ 9.5. The plot is based on calculations with m0 = 6.0. As an example, n = 20, µ = 6.5 means that 20 earthquakes with magnitudes 6 ≤ mi ≤ 6.5 have been recorded. First, we note that for a large number of observed earthquakes in a small magnitude range, the posterior density is concentrated close to the maximum observed event (black region). In this case, the number of events is sufficient to make the truncation “visible”. Consequently, the width of the confidence interval will tend to zero for all values of µ, if n → ∞. Second, for a smaller number of events covering a broader range of magnitudes, the posterior is also broader and the truncation becomes less certain. If the range of observed magnitudes increases further and µ approaches the maximum f, the width of the distribution decreases due to the upper bound of M . possible value M 14

The results presented in Fig. 4 refer to observed earthquakes with magnitudes above m0 = 6 and thus reflect historic earthquake data. In modern earthquake catalogs, the magnitude of completeness is much lower leading to the question, whether additional information is gained if smaller events are also taken into account. While very small earthquakes with magnitudes much smaller than the observed maximum, are not expected to provide a contribution to the posterior density, the effect of intermediate earthquakes may be important. Therefore, the results for n observed events m0 = 6 (in Fig. 4) are compared with 10n observed events with m0 = 5 assuming b = 1. We find, for example, by using 20 events with m ≥ 5 instead of 2 events with m ≥ 6 (µ = 6.5 in both cases), ∆m decreases by only 1%. For higher values of n and µ the information gain due to smaller earthquakes drops rapidly to zero. We conclude that the distribution of M depends mainly on the maximum observed magnitude and events of comparable size. f of the maximum magnitude, the clearest constraints for the To summarize, given a physical limit M

f are provided for two situations. First, if the number of observed true maximum magnitude M ≤ M earthquakes is high in relation to the range of their magnitudes, the truncation becomes significant.

Second, if the maximum observed magnitude is close to the maximum possible magnitude, the range of the posterior density is small leading to a narrow function owing to the normalization constraint.

6

Estimation of Maximum Magnitude M Given an Informative Prior

In Bayesian analysis, the use of uninformative, i.e. non-normalizable, prior distributions is possible, if the resulting posterior distribution becomes normalizable. In the situation described in Section “Bayesian Estimation of Maximum Magnitude in the DTGR Law”, an uninformative prior distribution leads to a non-normalizable posterior distribution. However, if additional knowledge is accounted for in the prior, the posterior distribution may become normalizable. In general, the posterior will depend on the available prior information. With large sample volume however, the influence of the prior diminishes in general. This dependence on the “right” prior information is 15

often considered as the key problem for the Bayesian estimation of the maximum possible magnitude (Cornell, 1994). We note that the posterior distribution depends on the sufficient statistics n and µ, rather than on all the details of the earthquake catalog. While for a fixed prior distribution the number n determines the width of the posterior distribution, the observed maximal magnitude, µ, enters in the cutoff condition (see Eq. 3) and therefore acts as a location parameter for the posterior. In fact, depending on the maximum observed magnitude, the posterior may be almost identical to the prior. If not, there is an information gain but since this happens for large mu – a rare event – we usually stick with the prior information. As a concrete illustrative example, we mention the Parkfield segment in California. Imposing a surface rupture length of L = (35 ± 1)km, the relation of Wells and Coppersmith (1994) M = A log (L) + B

(33)

with A = 1.16 ± 0.07, B = 5.08 ± 0.10 predicts a maximum possible magnitude of M = 6.87 ± 0.21.

(34)

In this context, we emphasize that our approach of estimating M is restricted to a single-segment rupture assuming that an earthquake cannot jump to an adjacent fault segment. Considering the possibility of multi-segment ruptures would clearly increase the maximum magnitude. Schmedes et al. (2005) address this problem for the Lower Rhine Embayment, Germany, by assuming finite probabilities for a rupture to jump the barrier to an adjacent segment when constructing a prior distribution of M . Although this method introduces new parameters with additional uncertainties, it is easy to implement and a feasible way to account for the occurrence of multi-segment ruptures. The prior and the posterior distribution will then cover a broader range of magnitudes; later we will see that the main result of this section also holds for multi-segment ruptures on a broader range of magnitudes. Using a prior distribution, which is located around the value in Eq. (34), and a sample likelihood function based on the Parkfield catalog with µ = 6 and b = 0.89, will result in a posterior distribution, which is essentially identical to the prior distribution. However, if an earthquake with 16

magnitude µ = 6.8 occurs, the posterior distribution will change drastically due to the cutoff at µ. In the present situation, where the actual maximum observed magnitude is below the lower bound of the prior distribution, the prior is not affected by the likelihood function. The earthquake catalog with maximum observed magnitude µ provides almost no information gain, because we do not know, which of the two alternatives explains the absence of an earthquake with magnitude higher than µ: 1. Such an event is possible and will occur after sufficient time has elapsed; 2. It is impossible, because it is beyond the true (unknown) maximum possible magnitude M . This problem vanishes only where the maximum observed event is not rare, i. e. many earthquakes in a small magnitude range have been observed, such that an earthquake with magnitude greater than the maximum observed event would be overdue in a sufficiently large sample. The situation in Parkfield described above can be illustrated as using synthetic data drawn from a DTGR distribution with known parameters. A prior distribution for M is generated in the following way: Using the relation in Eq. (33), we generate a distribution of magnitudes by drawing 100.000 values of A, B, and L from a uniform distribution within the error ranges. This function is fitted by a normal distribution, which is truncated and renormalized, i. e. the function is zero for magnitudes smaller than the minimum and higher than the maximum of the 100.000 random values. The truncation removes unrealistic high or low values of M . Next we produce “observations” by Monte-Carlo simulations resembling Parkfield seismicity using Gutenberg-Richter parameters b = 0.89 and a = 3.64. For a catalog containing n = 25 earthquakes, the minimum magnitude is then m0 = 3.39. The maximum magnitude to be estimated is chosen here as the maximum value of the prior distribution: M = 1.23 log (L) + 5.18 = 7.09. Fig. 5 shows the prior and the posterior distribution for two different Monte-Carlo simulations resembling Parkfield seismicity, Fig. 5(a) with µ = 6.11 and Fig. 5(b) with µ = 6.87. It is demonstrated that the occurrence of the rare event in Fig. 5(b) completely changes the picture. While Fig. 5(a) resembles qualitatively the current situation in Parkfield discussed above, in Fig. 5(b) the value of M becomes constrained by the event with µ = 6.87 in the catalog (lower bound) and the upper bound of the prior distribution M = 7.09. Since both figures, however, represent realizations of the same process, we conclude 17

that the Bayesian estimation of M leads only to a gain of information, if the maximum observed earthquake matches the prior distribution. Otherwise, the reason for the absence of a rare large event is unknown. In most cases Bayesian inference based on the DTGR law is not useful for estimating M , because the earthquake catalog enters only through a single rare event into the calculation. If multi-segment are taken into account, the distributions in Fig. 5 will be broader; the finding that the information gain mainly depends on the maximum observed event remains, however, unchanged. We conclude, that the search for an appropriate prior distribution is an important question on its own; however, in most cases Bayesian inference based on the DTGR law will lead to almost no information gain, because typical earthquake catalogs provide only a small amount of information within this framework.

7

Discussion and Conclusions

Earthquakes with maximum possible magnitude are usually rare events. It is intuitively clear that statistical inference on such rare events from a finite data set will be limited. In this study, we have derived a quantitative description of these limits using both the frequentist and the Bayesian approaches. We assume the validity of the doubly-truncated Gutenberg-Richter (DTGR) model for earthquake magnitudes and focus on confidence intervals for the maximum possible magnitude (upper bound of the DTGR law). We find that the frequentist confidence interval becomes singular in most realistic cases. In the Bayesian approach, the profiled posterior density is not normalizable, since its density is monotonically decreasing but does not tend to 0 for large magnitudes. If additionally an upper bound for the maximum magnitude is known, confidence intervals are always finite, the posterior distribution is normalizable, and inference from past events is possible. Such a maximum magnitude is justified since it accounts for the physical limitation of the earthquakes size. However, calculations imposing such an upper bound, show that the posterior distribution depends significantly on this unknown quantity. Similar observations have been reported for the tapered Pareto distribution (Kagan and

18

Schoenberg, 2001). The maximum possible magnitude in a seismic zone is a key ingredient of seismic hazard assessment. Historic earthquakes may provide qualitative constraints for the earthquake sizes; in the context of Bayesian statistics, as well as in the frequentist view, the calculation of confidence intervals for the estimation of the maximum magnitude is impossible, even for large observational records. Even in the presence of an informative prior distribution of the maximum magnitude, the posterior distribution is mostly of little use, because it is dominated by the maximum observed event. The rarity of the largest events in a typical earthquake catalog prohibits evaluation of their proximity to the true maximum possible magnitude. We conclude that the maximum magnitude as used in seismic hazard assessment is a questionable quantity, because from a statistical point of view, a limited data set does not allow to estimate a magnitude that is maximum for all times. As a reasonable alternative, we suggest using MT , which is the maximum expected magnitude in a finite time window of length T . Imposing a stationary Poisson process in the time interval [0; T ] with rate Λ of earthquakes with magnitudes between m0 and M , the distribution function with respect to MT is then given by FΛβM (MT ) =

∞ X Λn

n=0

n!

exp (−Λ)[FβM (MT )]n ;

m0 ≤ MT ≤ M

(35)

with FβM from Eq. (2). Rewriting Eq. (35) leads to FΛβM (MT ) = exp [−Λ(1 − FβM (MT ))];

m0 ≤ MT ≤ M

(36)

with the probability density function pΛβM (MT ) = Λ pβM (MT ) exp [−Λ(1 − FβM (MT ))];

m0 ≤ MT ≤ M.

(37)

This equation still includes the unknown value M . There are two alternatives to deal with this problem: 1. in practical exercises questions like the following may be addressed: “Which magnitude is expected in the next 50 years, given an upper bound of M = 8?”; 2. the untruncated GutenbergRichter law (M → ∞) may be explored. In any case, the distribution of MT is normalizable. 19

In a simple approach, the rate Λ in Eq. (36) which is equivalent to the Richter-a values can be taken from an earthquake catalog to compute FΛβM (MT ). Performing Monte-Carlo simulations which are again adjusted to the Parkfield segment (m0 = 4, a = 3.64, b = 0.89), we generate 10.000 catalogs using M = 8 and compare the 95% confidence interval with the case M → ∞ (untruncated Gutenberg-Richter model). We find that for M = 8 the upper limit of the confidence interval deviates by only 1% from the untruncated Gutenberg-Richter law; for M = 9 the relative difference to M → ∞ is 0.5% suggesting that the untruncated Gutenberg-Richter law is appropriate for calculations of MT . In a next step, the uncertainties of Λ will be estimated in the framework of a Bayesian setting. An comprehensive discussion of MT is left for future studies. As a second alternative that takes advantage of recent developments in modeling the earthquake process, we suggest to consider physical models with reduced complexity that are adjusted to a specific fault region and allow simulation of thousands of years of earthquake history. As an example for such an “earthquake simulator”, we refer to the model used in Ben-Zion and Rice (1993) and Z¨ oller et al. (2008) which has been adjusted to the Parkfield segment with respect to fault dimension and average displacement rate. A simulation over 41.000 years including 1.800.000 earthquakes with m ≥ 3.5 leads to a maximum magnitude M = 7.04 which is in very good agreement with the maximum magnitude M = 7.09 calculated from the relation of Wells and Coppersmith (1994) with L = 35km. Other models based on rate and state dependent friction (Dieterich, 1994) may also be useful, e. g. to account for earthquakes extending over two or more fault segments. An important advantage of physics-based models is that they are not restricted to the DTGR law. The frequency-size distribution is rather a result of the imposed physics. The problem of model selection for a specific fault region is not at all straightforward. In the light of the limitations of purely statistical inference, however, physics-based models seem to provide a reliable ground for the estimation of seismological parameters.

20

Data and Resources No data were used in this paper.

Acknowledgments This work was supported by the German Research Society (HA4789/2-1) and the Potsdam Research Cluster for Georisk Analysis, Environmental Change and Sustainability (PROGRESS). We thank Associate Editor Morgan Page and two anonymous reviewers for comments that helped to improve the manuscript.

References Aki, K. (1965), Maximum likelihood estimation of b in the formula log N = a−bM and its confidence limits Bull. Earthquake Res. Inst. Tokyo Univ., 43, 237-239. Ben-Zion, Y., and J. R. Rice (1993). Earthquake failure sequences along a cellular fault zone in a three-dimensional elastic solid containing asperity and nonasperity regions, J. Geophys. Res., 98, 14,109–14,131. Bernardo, J.M., and A.F.M. Smith (1994). Bayesian Theory, Wiley, Chichester. Blackwell, D. (1947). Conditional expectation and unbiased sequential estimation, Ann. Math. Statist. 18, 105-110. Cornell, C. A. (1994). Statistical Analysis of Maximum Magnitudes. In Johnston, A. C., Coppersmith, K. J., Kanter, L. R., and Cornell, C. A. (eds), The Earthquakes of Stable Continental Regions - Vol. 1. Assessment of Large Earthquake Potential. California, Electric Power Research Institute, Palo Alto, p. 5-27.

21

Dieterich, J. H. (1994). A constitutive law for rate of earthquake production and its application to earthquake clustering, J. Geophys. Res., 99, 2601–2618. Fisher, R.A. (1922). On the mathematical foundations of theoretical statistics, Philosophical Transactions of the Royal Society of London. Series A 222: 309368. Gardner, J.K., and L. Knopoff (1974). Is the sequence of earthquakes in Southern California, with aftershocks removed, Poissonian?, Bull. Seism. Soc. Am., 64, 1363-1367. Gutenberg, B., and C. Richter (1956). Earthquake magnitude, intensity, energy and acceleration, Bull. Seismol. Soc. Am., 46, 105-145. Kagan, Y. Y., and F. Schoenberg (2001). Estimation of the upper cutoff parameter for the tapered Pareto distribution, J. Appl. Probab., 38A, 901-918. Kijko, A. (2004). Estimation of the Maximum Earthquake Magnitude mmax , Pure Appl. Geophys, 161, 1655-1681. Kolmogorov, A. (1950). Unbiased estimators, Izvestiya AN SSSR, Ser. Math., 303–326. English translation in Selected Works of A. N. Kolmogorov, Vol. 2, Probability Theory and Mathematical Statistics, A. N. Shiryayev (Editor), Kluwer Academic Publishers (Dordbrecht/Boston/London), 1992, 597 pp. Pisarenko, V. F. (1991). Statistical evaluation of maximum possible earthquakes, Izvestiya Phys. Solid Earth, 27, 757-763. Pisarenko, V. F., A. A. Lyubushin, V. B. Lysenko, and T. V. Golubeva (1996). Statistical estimation of Seismic Hazard Parameters: Maximum Possible Magnitude and Related Parameters, Bull. Seismol. Soc. Am., 86, 691-700. Rao, C. R. (1945). Information and accuracy attainable in estimation of statistical parameters, Bull. Calcutta Math. Soc. 7, 81-91. 1969-1982, J. Geophys. Res., 90, 5479-5495.

22

Reasenberg, P. (1985). Second-order moment of Central Californian seismicity, 1969-1982, J. Geophys. Res., 90, 5479-5495. Reiter, L. (1991). Earthquake Hazard Analysis, Issues and Insights, Columbia University Press, New York. Schmedes, J., S. Hainzl, S.-K. Reamer, F. Scherbaum, and K.-G. Hinzen, (2008). Moment release in the Lower Rhine Embayment, Germany: seismological perspective of the deformation process, Geophys. J. Int., 160, 901-909, doi 10.1111/j.1365-246X.2005.02525.x. Wells, D. L., and K. J. Coppersmith (1994). New empirical relationships among magnitude, rupture length, rupture width, rupture area, and surface displacement, Bull. Seism. Soc. Am., 84, 9741002. Wheeler, R. L. (2009). Methods of Mmax estimation east of the Rocky mountains, USGS Open File Report 2009-1018 (2009). Z¨ oller, G., S. Hainzl, and M. Holschneider (2008). Recurrent large earthquakes in a fault region: What can be inferred from small and intermediate events? Bull. Seismol. Soc. Am., 98, 2641– 2651, doi 10.1785/0120080146. Z¨ oller, G., S. Hainzl, and M. Holschneider (2010). Recurrence of large earthquakes: Bayesian inference from catalogs in the presence of magnitude uncertainties, Pure Appl. Geophys, 167, 845-853, doi 10.1007/s00024-010-0078-0.

23

Figure captions

Figure 1: (a) Illustration of Eq. (15) with m0 = 5, b = 1, n = 10 and three values of α. For any given confidence level α, the confidence interval diverges as soon as the maximum observed magnitude exceeds mc from Eq. (14). The probability is P r(µ ≥ mc ) ≈ 0.95 for α = 0.05, P r(µ ≥ mc ) ≈ 0.99 for α = 0.01, and P r(µ ≥ mc ) ≈ 1.00 for α = 0.001. (b) Probability P r(µ ≥ mc ) that the maximum observed magnitude µ exceeds the critical threshold mc as function of the true maximum magnitude M (Eq. 16). The three curves belong to the three confidence levels α shown in (a).

Figure 2: Illustration of the divergence of the Bayesian posterior distribution of M . For large values of M (M ≫ µ), all models represented by probability densities pβM (m) with different values of M (in the sketch pβM1 (m) and pβM2 (m) ) describe the observations in the interval [m0 ; µ] equally well and are thus equally likely. The Bayesian posterior distribution with respect to M converges for M → ∞ to a positive constant and is therefore not normalizable. This result is independent of the shape of pβM (m) at large magnitudes m.

Figure 3: Profiled posterior P(M |{mi }, β = βobs ) (Eq. 30) for three choices of the fixed b values: (a) b = 0.8, (b) b = 1.0, (c) b = 1.2. The corresponding values of µ follow from µ = m0 + (1/bobs ) log10 (n): (a) µ = 7.60, (b) µ = 7.30 (c) µ = 6.87; the Aki-estimated mean magnitude are: f = 8.0, 9.0, 10.0 are shown; (a) hmi = 6.87, (b) hmi = 6.43, (c) hmi = 6.29. In all cases results for M

the number of observations is n = 20, the magnitude of completeness is m0 = 6.0. Vertical lines denote the length of the 50% confidence interval ∆m (see Eq. 31).

Figure 4: Length of 50% confidence interval ∆m (color coded) as function of the number n of observations and the maximum observed magnitude µ for the marginal density P(M |{mi }) (Eq. 32) f = 10. for m0 = 6, b = 1, and M

24

Figure 5: Bayesian estimation of M for two samples of magnitudes m, each drawn from a DTGR distribution with m0 = 4.32, M = 7.09, n = 25, b = 0.89. The histograms give the prior distribution calculated from 100.000 realizations of Eq. (33) with randomly varying values of A and B as described in the text. The histogram is fit by a doubly-truncated normal distribution (dashed line). The Bayesian posterior distribution P(β, M |{mi }) is shown as a solid line. The maximum “observed” earthquake has magnitude µ = 6.11 in the first realization (a), while this value is µ = 6.87 in the second one (b).

25

Figures Figure 1

0.8

α = 0.05

0.0

6

0.2

7

8

Pr( µ > − mc ) 0.4 0.6

9

α = 0.01

5

upper limit of confidence interval

α = 0.001

1.0

(b)

10

(a)

5.0

5.1

5.2

5.3

5.4

5.5

5.6

5.5

maximum observed magnitude

6.0

6.5

true maximum magnitude M

26

7.0

Figure 2

log pβ M (m)

m0 µ observations

27

M1 M 2

m

Figure 3

(b)

(c)

8.0

8.5

9.0 M

~ M = 10.0 9.5

10.0

7.5

8.0

1.5 8.5

9.0 M

28

~ M = 10.0

9.5

10.0

1.0

pβM^(M)

1.5 1.0

~ M = 9.0

~ M = 8.0 ~ M = 9.0

0.5

~ M = 9.0

~ M = 8.0

0.5

1.5

pβM^(M)

~ M = 8.0

0.5

pβM^(M)

2.5

2.0

2.0

3.5

(a)

7.0

7.5

8.0

8.5 M

9.0

~ M = 10.0

9.5

10.0

maximum observed magnitude

Figure 4

9

1.8 1.6

8.5

1.4 1.2

8

1 0.8

7.5

0.6 0.4

7

0.2 6.5

0 5

10

15

20

nr of observations

29

25

30

Figure 5

(b)

pdf 0

0

1

5

2

pdf 3

10

4

5

15

6

(a)

6.6

6.7

6.8

6.9 M

7.0

7.1

7.2

6.6

30

6.7

6.8

6.9 M

7.0

7.1

7.2

Appendix A Here we list the most important parameters used in this study. Commonly used functions are not given.

31

{mi }

set of observed magnitudes

n

sample size of observations

µ

maximum observed magnitude (Eq. 4)

hmi

sample mean of observed magnitudes

m0

lower magnitude bound (subject to catalog completeness)

M

maximum magnitude for all times (true value, to be estimated)

MT

maximum magnitude in a time window of length T (used only in Section “Discussion and Conclusions”)

c M

f M

point estimation of M upper limit of maximum magnitude bound (used only in Section

b

f” “Estimation of Maximum Magnitude M Given that M has an Upper Bound M

β

b log (10)

a

Richter a value

Λ

rate of earthquakes in a given magnitude range

βobs

estimation of β based on Aki’s formula (Eq. 7)

βmin and βmax

Richter b value

lower and upper bound of β for calculation of the marginal probability density with respect to β (used only in Section

[µ, ψ(µ)] α

f” “Estimation of Maximum Magnitude M Given that M has an Upper Bound M frequentist confidence interval with respect to the estimation of M

confidence level: the probability for M to lie within the confidence interval is at least 1 − α

m∗

estimated magnitude, such that the true value M exceeds m∗ with a probability less than a given value of α

mc θ

critical magnitude threshold: for µ ≥ mc the frequentist confidence interval diverges general expression for a parameter to be estimated

32

Appendix B To show Eq. (14) and (15), let Fθ denote the distribution function, with all parameters held fixed, except θ = M , which is the unique parameter we want to estimate. The sufficient statistic µ has the distribution function Fθn (µ). We are looking for a confidence interval of the form [µ, ψ(µ)] with a strictly monotonically increasing function ψ. Being a confidence interval of degree α reads P(ψ(µ) ≤ θ|θ) ≤ α



P(µ ≤ ψ −1 (θ)|θ) ≤ α.

(B1)

Here we have used the monotonicity to ensure the existence of an inverse for ψ on its range. The last probability simply reads Fθn (ψ −1 (θ)) ≤ α



Fθ (ψ −1 (θ)) ≤ α1/n

⇔ ψ −1 (θ) ≤ Fθ−1 (α1/n ).

(B2)

Setting h : θ 7→ Fθ−1 (α1/n ), we may write ψ(µ) ≥ h−1 (µ).

(B3)

Using now the explicit expression for Fθ from Eq. (2) the formulas follow.

33

Suggest Documents