BAYESIAN ESTIMATION FOR A PARAMETRIC MARKOV RENEWAL ...

1 downloads 0 Views 615KB Size Report
Jan 30, 2013 - ME] 30 Jan 2013 ... a non monotonic behavior in the hazard, in Garavaglia and Pavani (2009) the model of Alvarez (2005) is modified and a ...
BAYESIAN ESTIMATION FOR A PARAMETRIC MARKOV RENEWAL MODEL APPLIED TO SEISMIC DATA ILENIA EPIFANI, LUCIA LADELLI, AND ANTONIO PIEVATOLO

arXiv:1301.6494v2 [stat.ME] 30 Jan 2013

Abstract. This paper presents a complete methodology for the Bayesian inference of a semi-Markov process, from the elicitation of the prior distribution, to the computation of posterior summaries, including a guidance for its JAGS implementation. The interoccurrence times (conditional on the transition between two given states) are assumed to be Weibull-distributed. We examine the elicitation of the joint prior density of the shape and scale parameters of the Weibull distributions, deriving in a natural way a specific class of priors, along with a method for the determination of hyperparameters based on “historical data” and moment existence conditions. This framework is applied to data of earthquakes of three types of severity (low, medium and high size) occurred in the central Northern Apennines in Italy and collected by the CPTI04 (2004) catalogue.

1. Introduction Markov renewal processes or their semi-Markov representation have been considered in the seismological literature since they are models which allow the distribution of the interoccurrence times between earthquakes to depend on the last and the next earthquake and to be not necessarily exponential. The time predictable and the slip predictable models studied in Grandori Guagenti and Molina (1986), in Grandori Guagenti et al. (1988) and in Betr`o et al. (1989) are special cases of Markov renewal processes. These models are capable of interpreting the predictable behavior of strong earthquakes in some seismogenetic area. In these processes the magnitude is a deterministic function of the inter-occurrence time. A stationary Markov renewal process with Weibull inter-occurrence times has been studied from a classical statistical point of view in Alvarez (2005). The Weibull model allows to 2010 Mathematics Subject Classification. Primary: 60K20, 62F15, 62M05, 86A15; Secondary: 65C05. Key words and phrases. Bayesian inference, Gibbs sampling, Markov Renewal process, Multi-State model, Predictive distribution, Weibull distribution. 1

consider monotonic hazard rates; it contains the exponential model as special case which gives a Markov Poisson point process. In that paper the parameters of the model have been fitted to the large earthquakes in North Anatolian Fault Zone through maximum likelihood and the Markov Poisson point process assumption is tested. In order to capture a non monotonic behavior in the hazard, in Garavaglia and Pavani (2009) the model of Alvarez (2005) is modified and a Markov renewal process with inter-occurrence times that are mixtures of an exponential and a Weibull distribution is fitted to Turkish data. In Masala (2012) a parametric semi-Markov model with generalized Weibull distribution for the inter-occurrence times is considered and fitted to Italian earthquakes. This semiMarkov model with generalized Weibull distributed times has been first used in Foucher et al. (2009) to study the evolution of HIV infected patients. Votsi et al. (2012) consider a semi-Markov model for the seismic hazard assessment in the Northern Aegean sea and they estimate the quantities of interest (semi-Markov kernel, Markov renewal functions, etc.) through a nonparametric method. While a wide literature concerning classical inference for Markov renewal models for earthquake forecasting exists, to our knowledge in this context a Bayesian approach is limited. Patwardhan et al. (1980) consider a semi-Markov model with log-normal distributed discrete inter-occurrence times that apply to the large earthquakes in the circum-Pacific belt. They stress the fact that using Bayesian techniques is relevant when prior knowledge is available and is fruitful even if the sample size is small. Mar´ın et al. (2005) have also employed semi-Markov models in the Bayesian framework but applied to a completely different area: sow farm management. They use WinBugs to perform computations (but without giving details) and elicit their prior distributions on parameters from knowledge on farming practices. From a probabilistic viewpoint, a Bayesian statistical treatment of a semi-Markov process amounts to model the data as a mixture of Semi-Markov processes, where the mixing measure is supported on the parameters, by means of their prior laws. A complete characterization of such a mixture is given in Epifani et al. (2002). 2

In this paper we develop a parametric Bayesian analysis for a Markov renewal process modeling earthquakes in an Italian seismic region. The magnitudes are classified into three categories according to their severity and these categories represent the state visited by the process. Following Alvarez (2005), the inter-occurrence times are assumed to be Weibull random variables. The prior distribution of the parameters of the model is elicited using an “historical dataset”. The time of the last occurrence in the historical dataset constitutes the origin for the “current sample”, which is formed by the sequences of earthquakes and the corresponding inter-occurrence times collected up to some time T to form the likelihood function. When T does not coincide with an earthquake the last observed inter-occurrence time is censored. The posterior distribution of the parameters is obtained through Gibbs sampling and the following summaries are estimated: the transition probabilities, the shape and scale parameters of the Weibull waiting times for each transition and the socalled cross state-probabilities (CSPs). The transition probabilities indicate whether the strength of the next earthquake is in some way dependent on the strength of the last one; the shape parameters of the waiting time indicate whether the hazard rate between two earthquakes of given magnitude classes is decreasing or increasing; the CSPs indicate what is the probability that the next earthquake occurs at or before a given time and is of a given magnitude, conditionally on the waiting time from last earthquake and on its magnitude. The paper is organized as follows. In Section 2 we illustrate the dataset. It is one of the sequences analyzed by Rotondi (2010). Section 3 introduces the parametric Markov renewal model. Three magnitude classes are considered and a Weibull distribution is assumed for the inter-occurrence times. Section 4 deals with the elicitation of the prior, which is based on a generalized inverse Gamma distribution, considering also the case of scarce prior information. Section 5 contains the data analysis with the estimation of the above-mentioned summaries. The matter of whether the observed earthquakes reflect a time predictable or a slip predictable model is also discussed. Section 6 is devoted to some concluding remarks. An appendix contains the detailed derivation of the full conditional distributions and the JAGS implementation of the Gibbs sampler. 3

6˚E

8˚E

10˚E

12˚E

14˚E

16˚E

18˚E

46˚N

46˚N

MR3 44˚N

44˚N

42˚N

42˚N

40˚N

40˚N

38˚N

38˚N

36˚N

36˚N 6˚E

8˚E

10˚E

12˚E

14˚E

16˚E

18˚E

Figure 1. Map of Italy with dots indicating earthquakes with magnitude Mw ≥ 4.5 belonging to macroregion MR3 (Rotondi (2010)). Inclusion in the macroregion was based on the association between events and seismogenetic sources; the region contour has only an aesthetic function.

2. A test dataset We test our method on a sequence of seismic events chosen among those examined by Rotondi (2010), which was given us by the author. The sequence collects events that occurred in a tectonically homogeneous macroregion, identified as MR3 by Rotondi (2010) and corresponding to the central Northern Apennines in Italy. The subdivision of Italy into eight (tectonically homogeneous) seismic macroregions can be found in the DISS (2007) and the data are collected by the CPTI04 (2004) catalogue. Considering earthquakes with magnitude Mw ≥ 4.5, the sequence is complete from year 1838: using a lower magnitude would make the completeness of the series questionable, especially in its earlier part. The map of these earthquakes marked by dots appears in Figure 1. As a lower threshold for the class of strong earhquakes we choose Mw ≥ 5.3, as suggested by 4

Rotondi (2010). Then a magnitude state space with three states is obtained by indexing an earthquake by 1, 2 or 3 if its magnitude belongs to intervals [4.5, 4.9), [4.9, 5.3), [5.3, +∞), respectively. Magnitude 4.9 is just the midpoint between 4.5 and 5.3. Rotondi (2010) considers a nonparametric Bayesian model for the distribution of the inter-occurrence time between strong earthquakes (i.e. Mw ≥ 5.3), after a preliminary data analysis which rules out Weibull, Gamma, log-normal distributions among other frequently used holding time distributions. On the other hand, with a Markov renewal model, the sequence of all the inter-occurrence times is subdivided into shorter ones according to the magnitudes, so that we think that a parametric distribution is a viable option. In particular, we focussed on the macroregion MR3 because the Weibull distribution seems to fit the inter-occurrence times better than in other macroregions. This fact is based on qq-plots. The qq-plots for MR3 are shown in Figure 2. The plot for transitions from 1 to 3 shows a sample quantile that is considerably larger than expected. The outlying point corresponds to a long inter-occurrence time of about 9 years, between 1987 and 1996, while 99 percent of inter-occurence times are below 5 years. Obviously, the classification into macroregions influences the way the earthquake sequence is subdivided.

3. Markov Renewal Model Let us observe over a period of time [0, T ] a process in which different events occur, with random inter-occurence times. Let us suppose that the possible states of the process are the points of the finite set E = {1, . . . , s} and suppose that the process starts in state j0 . Let us denote by τ the number of times the process changes state in the time interval [0, T ] and let si denote the time of the i-th change of state. Hence, 0 < s1 < · · · < sτ ≤ T . Let j0 , j1 , . . . , jτ be the sequence of states visited by the process and xi the sojourn time in state ji−1 , for i = 1, . . . τ . Then (1)

xi = si − si−1

for i = 1, . . . , τ 5

0

● ● ● ●





−1

0

2 1

1



● ●

−1.0



0

1

theor. quant.

−2.0

−1.0

0.0 0.5



0.0

0.5

● ●

−0.5

0

● ●





−1

−2.0





−2

●●



3 to 3 ●

sample quant.





theor. quant.

−4

1 0 −3 −2 −1

●●

1

1



3 to 2



0





theor. quant.

●●

−2

−1



−2





−2





3 to 1





● ●



1

●●

theor. quant.

sample quant.

−1

●● ●



−3

−1.5

−2

●●

2 to 3

1 sample quant.

1 −1

1

●● ●

theor. quant.



sample quant.

−3 0



−3

sample quant.

−1

2 to 2





0

−2

2 to 1



● ●

● ●●



theor. quant.

●● ●● ●● ●●● ● ●●● ● ● ● ●●●●● ●●●● ●●



−1 0

1





−3





0

sample quant.

1



−3

sample quant.

0

● ●

theor. quant.



−5

−1



sample quant.

−2

● ● ● ●●●●● ●● ●●●●●●●● ●●● ●●●

−2 −1

−3

−3 −2 −1

● ●

−4

1 to 3 ●

0



1 to 2

● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●●●●

−3 −2 −1

−1 −3 −5

sample quant.

1

1 to 1

● ●

−1.5

−1.0

theor. quant.

−0.5

0.0

0.5

theor. quant.

Figure 2. Weibull qq-plots of earthquake inter-occurrence times (central Northern Appennines) classified by transition between magnitude classes. with s0 := 0. Furthermore, let uT be the time passed in jτ i.e. uT = T − sτ ,

(2)

Time uT is a right-censored time. Finally, our data are collected in the vector (j, x, uT ), where (j, x) = (jn , xn )n=1,...,τ . In what follows, we assume that the data are the result of the observation, on the time interval [0, T ], of an homogeneous Markov renewal process (Jn , Xn )n≥0 starting from j0 . This means that the sequence (Jn , Xn )n≥0 satisfies (3)

P (J0 = j0 ) = 1,

P (X0 = 0) = 1

and for every n ≥ 0, j ∈ E and t ≥ 0: (4) P (Jn+1 = j, Xn+1 ≤ t|(Jk , Xk )k≤n ) = P (Jn+1 = j, Xn+1 ≤ t|(Jn , Xn )) = pJn j FJn j (t), 6

where p = (pij )i,j∈E is a transition matrix and (Fij )i,j∈E is an array of distribution functions on R+ = (0, +∞). For more details on Markov Renewal processes see, for example, Limnios and Oprisan (2001). We just recall that, under Assumptions (3) and (4): – the process (Jn )n≥0 is a Markov chain, starting from j0 , with transition matrix p, – the sojourn times (Xn )n≥0 , conditionally on (Jn )n≥0 , form a sequence of independent positive random variables, with distribution function FJn−1

Jn .

We assume the functions Fij are absolutely continuous with respect to the Lebesgue measure with density fij . Hence, the likelihood function of data (j, x, uT ) is: (5)

L(j, x, uT ) =

τY −1

!1(τ >0) pji ji+1 fji ji+1 (xi+1 )

i=0

×

X

pjτ k F¯jτ k (uT ),

k∈E

where, for every x, F¯ij is the survival function: F¯ij (x) = 1 − Fij (x) = P (Xn+1 > x|Jn = i, Jn+1 = j) . Furthermore, we assume that each inter-occurrence time density fij is a Weibull density with shape parameter αij and scale parameter θij :  αij −1 αij x −( x )αij , x > 0, αij > 0, θij > 0 . (6) fij (x) = e θij θij θij For conciseness, let α = (αij )i,j∈E and θ = (θij )i,j∈E . In order to write the likelihood in a more convenient way, let us introduce the following natural statistics. We will say that the process visits the string (i, j) if a visit to state i is followed by a visit to state j and denote: – xρij the time spent in state i at the ρ-th visit to string (i, j) – Nij the number of visits to the string (i, j). Then, assuming τ > 0, from Equations (5) and (6) we obtain the following expression for the likelihood function:

7

(7) L(j, x, uT | p, α, θ) =

Y

ik pN ik ×

i,k∈E

 ×

Y i,k∈E

Nik αik

1

Nik Y

αik Nik θik

ρ=1

!αik −1 xρik

( × exp −

×

Nik 1 X

(xρik )αik αik θik ρ=1

) ×

  αjτ k ! uT pjτ k exp − . θ jτ k k∈E

X

Our purpose is now to perform a Bayesian analysis for p, α and θ which allows us to introduce prior knowledge on the parameters. As we will show, this analysis is possible, using a Gibbs sampling approach, via the introduction of an auxiliary variable in order to treat easily the right-censored datum. 4. Bayesian analysis 4.1. The prior distribution. In our Bayesian analysis, we assume that p is independent on α and θ. In particular, the rows of p are s independent vectors with Dirichlet distribution with parameters γ1 , · · · , γs and total mass c1 , · · · cs , respectively. This means that, for i = 1, . . . , s, the prior density of the i-th row is s Γ(ci ) Y γij −1 π1;i (pi1 , . . . , pis ) = Qs pij j=1 Γ(γij ) j=1

on T = {(pi1 , . . . , pis )| pij ≥ 0,

P

j pij = 1} where γi = (γi1 , · · · , γis ) and ci =

Ps

j=1

γij .

As far as α and θ is concerned, we will assume the θij ’s, given the αij ’s, are independent with generalized inverse Gamma density: ( ) αij bij (αij )mij −(1+mij αij ) bij (αij ) π2;ij (θij |α) = π2;ij (θij |αij ) = θij × exp − αij Γ(mij ) θij

θij > 0, −αij

where mij > 0 and bij (αij ) is a positive function of αij . In other terms, a priori θij

,

given αij , has a Gamma density with shape mij and rate 1/bij (αij ) > 0 . In symbols θij |αij ∼ GIG(mij , bij (αij ), αij ). 8

Finally, we will assume the components of α are independent and have a log-concave density π3;ij with support bounded away from zero. As discussed in Gilks and Wild (1992), the log-concavity of π3;i,j is necessary in the implementation of the Gibbs sampler (see also Berger and Sun (1993)), although adjustments exist for the non-log-concave case (see Gilks et al. (1995)). Conversely, a support bounded away from zero ensures the existence of the posterior moments of the θij ’s. Anyway, we will specify π3;i,j in detail in the next section. 4.2. Elicitation of the hyperparameters. In this section we focus our attention on the prior of (θij , αij ), for i, j fixed. Following an approach developed in Bousquet (2006), we provide a marginal prior distribution π3;i,j for the shapes αij along with a statistical justification of the choice of the generalized inverse Gamma prior. An interpretation of the hyperparameters is also provided. First of all, for the sake of clarity, let us drop the indices i, j in the notations and quantities introduced in the previous section. Suppose now that a vector of historical dataset x−m = (x−m , . . . , x−1 ) is available, i.e. a past “historical dataset” of m sojourn times in state i followed by a visit to state j. Moreover, let us accept to use as prior density for (α, θ) its posterior density given x−m , when we start from the improper prior: (8)

 α0 c 1(θ≥0) 1(α≥α0 ) , π ˜ (α, θ) ∝ θ−1 1 − α

with suitable c ≥ 0 and α0 ≥ 0. Then it is immediate to see that this action corresponds to assume as prior distribution for θ given α the density: π2 (θ|α) = GIG(m, b(α, x−m ), α) and as prior density of α:   αm−1−c (α − α0 )c mα (9) π3 (α) ∝ exp 1(α≥α0 ) , bm (α, x−m ) β(x−m ) P Pm α −1 with b(α, x−m ) = m i=1 x−i and β(x−m ) = m[ i=1 ln(x−i )] . Remark 1. We notice that the prior for α, θ we propose has a simple hierarchical structure: π2 (θ|α) is a generalized inverse Gamma density with the first parameter equal to the size of an historical dataset from the same process, whereas the second parameter is a deterministic 9

function of α. Moreover, the improper prior (8) we start with depends on two parameters α0 and c that can be chosen in a such way that the proper prior distribution of θ derived from it has finite moments of sufficiently high order. Finally, it is easy to see that, if m > 1, then the prior π3 of α is proper.

4.3. Simplifying the prior. In the previous section we consider a prior with a simple hierarchical structure obtained as the posterior coming from a Bayesian inference on historical data. In order to simplify the expression of the posterior distributions and to ensure P α that posterior moments are finite, we replace the function b(α, x−m ) = m i=1 x−i , in π2 (θ|α) and π3 (α) by an easier convex function of α. More precisely, we replace such a function with bq (α) = tαq [(1 − q)−1/m − 1]−1 , where tq is a positive number and q ∈ (0, 1). Then one can see that, for such a choice of bq (α), tq turns out to be the quantile of order q of the marginal distribution of the inter-occurrence times between a visit to state i followed by a visit to j. See Bousquet (2010). Hence, for a fixed q ∈ (0, 1), we can estimate tq using the historical data. Denote by tˆq such an estimate and let ˆbq (α, x−m ) = tˆα [(1 − q)−1/m − 1]−1 . q Thus, we propose for our Bayesian analysis the following priors: π2 (θ|α) = GIG(m, ˆbq (α, x−m ), α)

(10) and (11)

π3 (α) ∝ α

m−1−c

Pm   ln x −i i=1 (α − α0 ) exp −mα ln tˆq − 1(α≥α0 ) . m c



For m > 1, π3 (α) is a proper prior if q is chosen in such a way that Pm ln x−i ln tˆq − i=1 >0 m and is always log-concave for any c ≥ 0. Notice that such a choice is possible for almost all set of data. 10

Finally let us elicit the hyperparameters α0 and c in order that the posterior second moment of θ is finite. Since  i2/α  Γ(m − 2/α) hˆ 2 ˜ E(θ ) = E bq (α, x−m ) ≤ KE(Γ(m − 2/α)) Γ(m) ˜ then one can see that if α0 = 2/m and c ≥ 1, then E(θ2 ) < +∞ for a suitable constant K, and hence too the posterior second moment of θ is finite. As noticed above, π3 (α) is a log-concave density for any c ≥ 0. In particular, for c = m − 1, π3 (α) is an α0 -shifted Gamma density. 4.4. Scarce prior information. The construction of the prior distribution of (α, θ) has to be modified for those pair of states between which no more than two transitions were observed, that is, m ≤ 2. If m = 2 then α0 = 1, which rules out decreasing hazard rates. In the absence of additional specific prior information, this is an arbitrary restriction, so an α0 value smaller than 1 must be chosen. Then, the prior second moment of θ is not finite anymore. For the posterior second moment to be finite we need α > 2/(2 + N ), where N is the number of transitions between the two concerned states in the (current) sample. Thus the second moment of θ can stay non-finite, even a posteriori, if 2/(2 + N ) > α0 . This would show that the data add little information for that specific transition. To avoid this, we may let α0 = 2/3, corresponding to the smallest prior sample size such that α0 < 1. The value of c can still be m − 1 = 1. If m = 1, the single historical observation x−1 determines ˆbq (α, x−m ). As tˆq = x−1 for any q, it seems reasonable to use q = 0.5, so x−1 would represent the prior opinion on the median sojourn time. Since with m = 1 the argument of the exponent in π3 (α) in (11) is zero so that π3 (α) is improper, then we make it proper by restricting its support to an interval [α0 , α1 ]. The value α1 = 10 is suitable for all practical purposes. As before, the choice α0 = 2/m = 2 would be too much restrictive, so we select again α0 = 2/3. The rule c = m−1 would give c = 0 in this case, so we keep c = 1. If m = 0, the prior information on the number of transitions is that there have been no transitions, but there is no information 11

on the sojourn time. Then we introduce a single random fictitious observation t˜q , which is uniformly distributed between the smallest and the largest sojourn times associated with all the transitions in the historical dataset, and we use t˜q to obtain ˆbq (α, t˜q ). Thus we fall in the previous case by substituting m = 1 for m = 0 and by assuming c = 1 and [α0 , α1 ] = [2/3, 10]. For clearness, we summarize the hyperparameter selection for priors (10)-(11) in Table 1.

tq

m>2 m=2 m=1 m=0 tˆq tˆq x−1 t˜q

c m−1

1

1

1

2 2 2 2 α0 m 3 3 3 Table 1. Hyperparameter selection as the prior sample size m varies

5. Analysis of the central Northern Apennines sequence In this section we analyze the macroregion MR3 sequence, using the semi-Markov model. Model fitting, model validation and an attempt at forecasting involves the following steps: (1) choose the historical dataset x−m for the elicitation of the prior distribution; (2) model fit is assessed by comparing observed inter-occurrence times (grouped by transition) to posterior predictive intervals; (3) cross state-probabilities are estimated, as an indication to the most likely magnitude and time to the next event, given information up to the present time; (4) an interpretation in terms of slip predictable or time predictable model is provided. For the elicitation of the prior distribution we cut the observations into two parts, using the first part, called the historical data, for this purpose, and the second part, called the current data, for the likelihood. A look at the matrix of the observed transition frequencies in the whole dataset from year 1838 to 2002, displayed in Table 2(a), shows that our 195event series contains enough data to allow the assignment of at least three events for any 12

transition to the historical dataset. This choice avoids the need to adjust the prior as summarized in Table 1 for mij ≤ 2. The cut-point corresponding to this strategy is the 82nd event which occurred in 1916, with frequency matrix as in Table 2(b). (a)

(b)

1 2 3 1 65 30 17 2 32 15 7 3 15 9 4

1 2 3 1 27 14 4 2 14 8 4 3 4 3 3

Table 2. Number of observed transitions in the whole (a) and historical (b) datasets.

Let us consider the predictive check mentioned above. Fig. 3 shows posterior predictive 95 percent probability intervals of inter-occurrence time for every transition, with the observed inter-occurrence times superimposed. These are empirical intervals computed by generating stochastic inter-occurrence times from their relevant distributions at every iteration of the Gibbs sampler. (The Gibbs sampler itself is completely illustrated in the appendix). Possible outliers, represented as triangles, are those times with Bayesian pvalue (that is the predictive tail probability) less than 2.5 percent. The percentage of these extreme points, along with the predictive expected value of the inter-occurrence times, is showed in Table 3. Most points fall within the predictive intervals. Those few interoccurrence times which are really extreme, such as the small values observed in transitions (1, 1), (2, 1), (1, 2) and the large one in transition (1, 3) match unsurprisingly the outlying points in the corresponding qq-plots in Fig. 2. (In some qq-plots there are more outlying points than in Fig. 3 because the qq-plots comprise the entire series). This fact could be regarded as a lack of fit of the Weibull model, but we are more inclined to attribute it to an imperfect assignment of some events to the macroregion or to an insufficient filtering of secondary events. The shape parameters αij are particularly important as they reflect an increasing hazard if larger than 1, a decreasing hazard if smaller than 1 and a constant hazard if equal to 1. 13

2000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●



● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ●

● ●



● ● ●

● ● ● ● ● ● ●

● ● ● ●



● ●

● ●



● ● ●

● ●



● ● ●

● ●

● ● ●

● ●

● ● ●

5

10

20

50

100

200

500

● ●

mean median

2

Intertimes (days) posterior predictive 2.5% and 97.5% quantiles

95% posterior predictive intervals of the intertimes

(1,1)

(2,1)

(3,1)

(1,2)

(2,2)

(3,2)

(1,3)

(2,3)

(3,3)

Couple of states

Figure 3. Posterior predictive 95 percent credible intervals of the mean interoccurrence times in days with actual times denoted by (blue) solid dots. Suspect outliers are denoted by (red)-pointing triangles. The (green) dotted line shows the posterior mean and the (violet) dashed line the posterior median.

1 2 3 1 253.28 (10.5%) 272.27 (18.8%) 340.98 (7.7%) 2 273.45 (16.7%) 314.67 (14.3%) 293.35 (3.3%) 3 204.48 ( 0.0%) 237.54 (16.7%) 312.73 (0.0%) Table 3. Summaries of the posterior distributions of the mean interoccurrence times for each transition. Posterior means are followed by (percentage of points having posterior p-value less than 2.5 percent).

14

1 2 3 1 1.182 (0.102) 1.981 (0.252) 0.828 (0.103) 2 1.318 (0.173) 1.218 (0.232) 1.716 (0.473) 3 1.026 (0.179) 0.996 (0.150) 2.098 (0.618) Table 4. Summaries of the posterior distributions of the shape parameter α. Posterior means are followed by (standard deviations).

Table 4 displays the posterior means of these parameters (along with their posterior standard deviations). Finally Table 5 shows the posterior means of the transition probabilities.

1 2 3 1 0.57 0.27 0.16 2 0.58 0.28 0.14 3 0.52 0.32 0.16 Table 5. Posterior means of the transition matrix p.

Cross state-probability plots are an attempt at predicting what type of event and when it is most likely to occur. A cross state-probability (CSP) Ptij0 |∆x represents the probability that the next event will be in state j within a time interval ∆x under the assumption that the previous event was in state i and t0 time units have passed since its occurrence: (12) Ptij0 |∆x = P (Jn+1 = j, Xn+1 ≤ t0 + ∆x| Jn = i, Xn+1 > t0 ) =  pij F¯ij (t0 ) − F¯ij (t0 + ∆x) P = . ¯ h∈E pih Fih (t0 ) Fig. 4 displays the CSPs with time origin on 31 December 2002, the closing date of the CPTI04 (2004) catalogue. At this time, the last recorded event had been in class 2 and had occurred 965 days earlier (so t0 is about 32 months). From these plots we can read out the probability that an event of any given type will occur before a certain number of months. For example, after 24 months, the sum of the mean CSPs in the three graphs indicates that the probability that an event will have occurred is more than 90 percent, 15

with a larger probability assigned to an event of type 2, followed by type 1 and type 3. The posterior means of the CSPs are also reported in Table 6.

mean median

48

47

46

45

44

43

42

41

40

39

38

37

36

35

34

33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

9

11

8

10

7

6

5

4

3

1

0.0

x●

x x x x x x x x● x● x● x● x● x● x● x● x● x● x● x● x● x● x● x● x● x● x● x● x● x● x● x● x● x● x● x● x x● x● x● x● ● ● ● ● ● ● ● x x● x● ● x ● ● x ● x● x● 2

x 1

0.4

0.8

90% posterior predictive intervals of CSP from 2 to 1

x mean 1 median

48

47

46

45

44

43

42

41

40

39

38

37

36

35

34

33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

● ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● x ● ● x x● x● x● x● x● x● x x x● x● x● x● x● x● x● x ● x● x● x● x●

2

x● 1

0.0

0.4

0.8

90% posterior predictive intervals of CSP from 2 to 2

x mean 1 median

48

47

46

45

44

43

42

41

40

39

38

37

36

35

34

33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x● x● x● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1

0.0

0.2

0.4

90% posterior predictive intervals of CSP from 2 to 3

Figure 4. Posterior mean and median of CSPs with time origin on 31 December 2002 up to 48 months ahead, along with 90 percent posterior credible intervals. Transitions are from state 2 to state 1, 2 or 3 (first to third panel, respectively).

to 1 to 2 to 3

1 Month 2 Months 3 Months 4 Months 5 Months 6 Months Year 2 Years 3 Years 4 Years 0.070 0.122 0.171 0.210 0.243 0.270 0.364 0.410 0.418 0.420 0.059 0.105 0.150 0.187 0.221 0.249 0.363 0.444 0.468 0.477 0.014 0.024 0.034 0.041 0.048 0.054 0.075 0.088 0.092 0.093

Table 6. CSPs with time origin on 31 December 2002, as represented in Fig. 4. 16

The predictive capability of our model can be assessed by marking the time of the next event on the relevant CSP plot. In our specific case, the first event in 2003 which can be assigned to the macroregion MR3 happened in the Forl`ı area on 26 January and was of type 1, with a CSP of 7 percent. This is a low probability, but a single case is not enough to judge our model, which would be a bad one if repeated comparisons did not reflect the pattern represented by the CSPs. Therefore we repeated the same comparison by re-estimating the model using only the data up to 31 December 2001, 31 December 2000, and so on backwards down to 1992. For this particular exercise the historical data for prior elicitation are such that the frequency of each transition is at least two (not three as done so far) and the hyperparameters are those on the second column of Table 1. The reason is that, by going backwards, the only (3, 3) transition is removed from the current data set, so that the CSP for (3, 3) would rely on prior information only, resulting in a too large posterior variance of the parameters. The results are shown in Table 7. The boxed numbers correspond to the observed events and it is a good sign that they do not always correspond to very high or very low CSPs, as this would indicate that events occur too late or too early compared to the estimated model. An exception is represented by the last four arrays in the table, which are all related to the period from the last event in July 1987 to the event in October 1996, during which no events with Mw ≥ 4.5 occurred. This is the longest inter-occurrence time in the whole series and is the outlying point in the top right panel of Fig. 2 and the associated transition is a relatively rare one, so it is not surprising that the boxed CSPs are small. The examination of the posterior distributions of transition probabilities and of the predictive distributions of the inter-occurrence times can give some insight into the type of energy release and accumulation mechanism. We consider two such mechanisms, the time predictable model (TPM) and the slip predictable model (SPM). In the TPM, when a maximal energy threshold is reached, some fraction of it (not always the same) is released; therefore, the waiting time for the next event is longer if the current event is stronger. So, the waiting time distribution depends on the current event type, but 17

end of catalogue: 31/12/2001; previous event type:2; waiting time: 600 days 1 Month 2 Months 3 Months 4 Months 5 Months 6 Months Year 392 days 2 Years 3 Years to 1 0.053 0.095 0.135 0.169 0.199 0.224 0.318 0.373 0.375 0.384 to 2 0.034 0.063 0.092 0.118 0.143 0.166 0.272 0.383 0.388 0.432 to 3 0.017 0.031 0.043 0.054 0.063 0.070 0.098 0.115 0.115 0.119 end of catalogue: 31/12/2000; previous event type: 2; waiting time: 235 days 1 Month 2 Months 3 Months 4 Months 5 Months 6 Months Year 2 Years 757 days 3 Years to 1 0.061 0.112 0.162 0.205 0.245 0.280 0.417 0.502 0.505 0.518 to 2 0.022 0.040 0.059 0.076 0.092 0.106 0.176 0.248 0.252 0.279 to 3 0.017 0.031 0.047 0.061 0.074 0.085 0.132 0.159 0.159 0.164 end of catalogue: 31/12/1999; previous event type: 1; waiting time: 177 days 1 Month 2 Months 3 Months 4 Months 130 days 5 Months 6 Months Year 2 Years 3 Years to 1 0.062 0.115 0.166 0.210 0.222 0.251 0.286 0.430 0.525 0.544 to 2 0.038 0.074 0.110 0.142 0.151 0.172 0.197 0.283 0.305 0.305 to 3 0.013 0.023 0.033 0.042 0.045 0.050 0.058 0.091 0.122 0.135 end of catalogue: 31/12/1998; previous event type: 3; waiting time: 280 days 1 Month 2 Months 3 Months 4 Months 5 Months 6 Months 188 days Year 2 Years 3 Years to 1 0.062 0.109 0.153 0.189 0.221 0.247 0.252 0.339 0.389 0.400 to 2 0.037 0.067 0.095 0.119 0.141 0.160 0.164 0.234 0.290 0.307 to 3 0.037 0.068 0.100 0.126 0.150 0.170 0.175 0.238 0.268 0.274 end of catalogue: 31/12/1997; previous event type: 3; waiting time: 442 days 1 Month 2 Months 85 days 3 Months 4 Months 5 Months 6 Months Year 2 Years 3 Years to 1 0.074 0.131 0.176 0.184 0.227 0.264 0.294 0.402 0.463 0.475 to 2 0.054 0.096 0.131 0.138 0.173 0.205 0.232 0.343 0.428 0.455 to 3 0.010 0.016 0.021 0.021 0.025 0.028 0.030 0.037 0.041 0.042 end of catalogue: 31/12/1996; previous event type: 3; waiting time: 77 days 1 Month 2 Months 3 Months 4 Months 5 Months 6 Months Year 450 days 2 Years 3 Years to 1 0.073 0.131 0.186 0.231 0.271 0.304 0.422 0.447 0.483 0.493 to 2 0.043 0.075 0.106 0.132 0.155 0.174 0.248 0.268 0.300 0.314 to 3 0.012 0.027 0.049 0.076 0.105 0.129 0.172 0.175 0.178 0.179 end of catalogue: 31/12/1995; previous event type: 1; waiting time: 3100 days 1 Month 2 Months 3 Months 4 Months 5 Months 6 Months 288 days Year 2 Years 3 Years to 1 0.173 0.304 0.420 0.511 0.589 0.651 0.798 0.861 0.964 0.982 to 2 0.001 0.002 0.002 0.003 0.003 0.004 0.004 0.004 0.005 0.005 to 3 0.001 0.003 0.004 0.004 0.005 0.005 0.007 0.007 0.008 0.008 end of catalogue: 31/12/1994; previous event type: 1; waiting time: 2735 days 1 Month 2 Months 3 Months 4 Months 5 Months 6 Months Year 653 days 2 Years 3 Years to 1 0.169 0.295 0.410 0.502 0.580 0.643 0.857 0.954 0.964 0.982 to 2 0.001 0.002 0.002 0.003 0.003 0.004 0.005 0.005 0.005 0.005 to 3 0.001 0.002 0.003 0.004 0.004 0.005 0.007 0.007 0.008 0.008 end of catalogue: 31/12/1993; previous event type: 1; waiting time: 2370 days 1 Month 2 Months 3 Months 4 Months 5 Months 6 Months Year 2 Years 1018 days 3 Years to 1 0.166 0.289 0.403 0.495 0.573 0.636 0.854 0.964 0.981 0.982 to 2 0.001 0.002 0.003 0.003 0.004 0.004 0.005 0.005 0.005 0.005 to 3 0.001 0.002 0.003 0.003 0.004 0.004 0.006 0.007 0.007 0.007 end of catalogue: 31/12/1992; previous event type: 1; waiting time: 2005 days 1 Month 2 Months 3 Months 4 Months 5 Months 6 Months Year 2 Years 3 Years 1383 days to 1 0.161 0.283 0.395 0.486 0.564 0.628 0.849 0.963 0.983 0.986 to 2 0.001 0.002 0.003 0.003 0.004 0.004 0.005 0.006 0.006 0.006 to 3 0.001 0.002 0.002 0.003 0.003 0.004 0.005 0.006 0.006 0.006

Table 7. CSPs as the end of the catalogue shifts back by one-year steps. The numbers in boxes are the probability that the next observed event has occurred at or before the time when it occurred and is of the type that has been observed.

18

4 Years 0.387 0.456 0.121 4 Years 0.522 0.293 0.165 4 Years 0.548 0.305 0.140 4 Years 0.402 0.313 0.276 4 Years 0.479 0.466 0.043 4 Years 0.496 0.319 0.179 4 Years 0.985 0.005 0.008 4 Years 0.986 0.005 0.008 4 Years 0.986 0.005 0.007 4 Years 0.987 0.006 0.006

not on the next event type, that is, we expect Fij (t) = Fi· (t), j = 1, 2, 3. The strength of an event, does not depend on the strength of the previous one, because every time the same energy level has to be reached for the event to occur, so we expect pij = p·j , j = 1, 2, 3, that is, a transition matrix with equal rows (which is indeed the case, as seen from Table 5). The CSPs (12) would factorize as follows, (13)

Ptij0 |∆x

  pij F¯ij (t0 ) − F¯ij (t0 + ∆x) p·j F¯i· (t0 ) − F¯i· (t0 + ∆x) P = = , ¯ F¯i· (t0 ) h∈E pih Fih (t0 )

so that, given i, they are proportional to each other as j = 1, 2, 3 for any ∆x. In the SPM, after an event energy falls to a minimal threshold and increases until the next event, where it starts to increase again from the same threshold. Here, the magnitude of an event depends on the length of the waiting time, but not on the magnitude of the previous one, because energy always accumulates from the same threshold. In this case again pij = p·j , but Fij (t) = F·j (t), so (14)

Ptij0 |∆x

 p·j F¯·j (t0 ) − F¯·j (t0 + ∆x) P . = ¯ h∈E p·h F·h (t0 )

If this is the case CSPs, given j would be equal to each other as i = 1, 2, 3 for any ∆x. As the a posteriori transition probability matrix has essentially equal rows, the CSPs should be examined with regard to the properties (13) and (14) to decide whether our sequence is closer to a TPM or to an SPM. An additional feature that can help discriminate is the tail of the waiting time distribution: for a TPM, the tail of the waiting time distribution is thinner after a weak earthquake than after a strong one; for an SPM, the tail of the waiting time is thinner before a weak earthquake than before a strong one. However this additional job proves unnecessary, after examination of Figures 5 and 6, which represent the posterior means of the ratios of the CSPs as a function of ∆x for t0 = 0. In Fig. 5, the ratio of two CSPs –say Ptij0 |∆x /Ptik0 |∆x – starts to be equal to pij /pik only when ∆x is large enough for the survival functions F¯ij (t0 + ∆x) and F¯ik (t0 + ∆x) to be close to zero. A similar behaviour is observed in Fig. 6. The conclusion is that neither a SPM nor a TPM

19

is supported by the posterior distributions of the parameters, because of the behavior of the waiting times. 6. Concluding Remarks Whe have presented a complete methodology for the Bayesian inference of a semi-Markov process, from the elicitation of the prior distribution, to the computation of posterior summaries, including a guidance for its JAGS implementation. In particular, we have examined in detail the elicitation of the joint prior density of the shape and scale parameters of the Weibull-distributed holding times (conditional on the transition between two given states), deriving in a natural way a specific class of priors, along with a method for the determination of hyperparameters based on “historical data” and moment existence conditions. This framework has been applied to the analysis of seismic data, but it can be adopted for inference on any system for which a Markov renewal process model is plausible. A possible and not-yet explored application is the modelling of voltage sags (or voltage dips) in power engineering: the state space would be formed by different classes of voltage, starting from voltage around its nominal value, down to progressively deeper sags. In the engineering literature, the dynamic aspect of this problem is in fact disregarded, while it could help bringing additional insight into this phenomenon. With regard to the seismic data analysis, other uses of our model can be envisaged. The model can be applied to areas with a less complex tectonics, such as Turkey, by replicating for example Alvarez’s anaysis. Outliers, such as those appearing in Figure 3, could point out to events whose assignment to a specific seismogenetic source should be re-discussed. The analysis of earthquake occurrence can support decision making related to the risk of future events. We have not examined this issue here, but a methodology is outlined by Cano et al. (2011). A final note concerns the more recent Italian seismic catalogue CTPI11 (2011), including events up to the end of 2006. Every new release of the catalogue involves numerous changes in the parameterization of earthquakes; as the DISS event classification by macroregion is 20



Ratios of couples of CSPs from 2

ratio of CSP (1,1) to (1,2) ratio of CSP (1,1) to (1,3) ratio of CSP (1,2) to (1,3)





ratio of CSP (2,1) to (2,2) ratio of CSP (2,1) to (2,3) ratio of CSP (2,2) to (2,3)

● ●

20

Ratio of CSPs

10













● ● ●







































● ●

10

5



● ●









0 1M

3M

5M

1Y 2Y 3Y 4Y 5Y 6Y 7Y 8Y 9Y



1M





3M



5M

(a)











































1Y 2Y 3Y 4Y 5Y 6Y 7Y 8Y 9Y

(b)



ratio of CSP (3,1) to (3,2) ratio of CSP (3,1) to (3,3) ratio of CSP (3,2) to (3,3)

● ●

200

500

1000

Ratios of couples of CSPs from 3

100 50





20

Ratio of CSPs





10



5





2

Ratio of CSPs

30

15



40

20

Ratios of couples of CSPs from 1

1M





3M





5M









































1Y 2Y 3Y 4Y 5Y 6Y 7Y 8Y 9Y

(c) ij ik Figure 5. Posterior means of ratios of CSPs P0|∆x /P0|∆x with time origin on 0

up to 10 years ahead. Transitions are from state 1, 2 and 3 to state 1, 2 or 3. 21

Ratios of couples of CSPs to 1 ratio of CSP (1,1) to (2,1) ratio of CSP (1,1) to (3,1) ratio of CSP (2,1) to (3,1)



1.6





ratio of CSP (1,2) to (2,2) ratio of CSP (1,2) to (3,2) ratio of CSP (2,2) to (3,2)

● ●

● ●

1.0



Ratio of couples of CSPs to 2





























● ●



0.8

1.4



● ●



































1.0





0.6







Ratio of CSPs

1.2











0.4

● ●



0.8





● ●

0.2



0.6



● ●

1M

3M

5M

1Y 2Y 3Y 4Y 5Y 6Y 7Y 8Y 9Y

1M

3M

5M

(a)

1Y 2Y 3Y 4Y 5Y 6Y 7Y 8Y 9Y

(b) Ratio of couples of CSPs to 3 ●

ratio of CSP (1,3) to (2,3) ratio of CSP (1,3) to (3,3) ratio of CSP (2,3) to (3,3)

200



50

100



20



10

Ratio of CSPs

● ●

● ●

5

● ● ● ● ●

2





1

Ratio of CSPs



1M

3M

5M

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

1Y 2Y 3Y 4Y 5Y 6Y 7Y 8Y 9Y

(c) ij kj Figure 6. Posterior means of ratios of CSPs P0|∆x /P0|∆x with time origin on 0

up to 10 years ahead. Transitions are from state 1, 2 and 3 to state 1, 2 or 3. 22

not yet available for events in this catalogue we could not use this more recent source of data. Appendix A. Gibbs sampling Here we derive the full conditional distributions needed to implement a Gibbs sampling and give indications on its JAGS (Plummer 2010) implementation. A.1. Full conditional distributions. Without loss of generality we can assume that the last holding time is censored i.e. uT > 0. In this case in order to obtain simple full conditional distributions and then an efficient Gibbs sampling, we introduce the auxiliary variable jτ +1 which represents the unobserved state following jτ . Thus the state space of the Gibbs sampler is (p, α, θ, jτ +1 ) and the following full likelihood derived from (7): (15) L(j, x, uT , jτ +1 |p, α, θ) =

Y

ik pN ik ×

i,k∈E

 ×

Y

Nik αik

i,k∈E

1

Nik Y

αik Nik θik

ρ=1

!αik −1 xρik

( × exp −

Nik 1 X

(xρik )αik αik θik ρ=1

) ×

   αjτ jτ +1  uT × pjτ jτ +1 exp − θjτ jτ +1 is multiplied by the prior and used to determine the full conditionals. For every i and j let p(−i) = the transition matrix p without i-th row, α(−ij) = (αhk , h, k ∈ E,

(h, k) 6= (i, j)) ,

θ(−ij) = (θhk , h, k ∈ E,

(h, k) 6= (i, j)) ,   ˜i = N ˜ij = Nij + 1((jτ ,j )=(i,j)) , N ˜ij , j = 1, . . . , s , N τ +1

˜ ij (αij ) = M

Nij X

α

(xρij )αij + uT ij 1((jτ , jτ +1 ) = (i, j)),

ρ=1

Cij =

Nij Y

xρij .

ρ=1

23

Furthermore, let mij be the number of visits to the string (i, j) in the historical dataset, x−ρ ij , for ρ = 1, . . . mij , denote the corresponding inter-occurrence times and x−mij = −mij

(xij

, . . . , x−1 ij ).

The following result on the full conditional distributions of the Gibbs sampling holds.

Proposition A.1. Let the prior on (p, α, θ) be as in Section 4. Then (a) the conditional distribution of pi , given j, x, uT , jτ +1 , p(−i) , α and θ, is a Dirichlet ˜ i + γi ; distribution with parameter N α

(b) the conditional distribution of θijij , given j, x, uT , jτ +1 , p, α and θ(−ij) is an ˜ ij (αij ) ; inverse-Gamma distribution of parameters mij + Nij and ˆbq (αij , x−mij ) + M (c) the conditional density of αij , given j, x, uT , jτ +1 , p, α(−ij) and θ is log-concave and, up to a multiplicative constant, is equal to: (16)

) (  αij ˆbq (αij , x−m ) + M ˜ ij (αij ) −m −N N +1 ij ij ij , π3;ij (αij ) × Cij tij αij ij exp − α qij θij θijij

where tij qij is determined from Table 1; (d) the conditional density of the unseen state Jτ +1 , given j, x, uT , p, α and θ, is n  αjτ j o uT pjτ j exp − θj j αjτ j o ; n τ P uT p exp − j k τ k∈E θj j τ

(e) if mij > 0 then tij qij is a known constant (see Table 1). If mij = 0, then the conditional distribution of t˜ij qij given j, x, uT , p, α and θ is a double-truncated Weibull with shape αij + 1, scale θij and the truncation points given by the smallest and the largest interoccurrence times included in all the historical dataset.

Proof. Notice that the row pi , given the data and jτ +1 is conditionally independent on (p(−i) , α, θ). Hence π(pi |j, x, uT , jτ +1 , p(−i) , α, θ) ∝ L(j, x, uT , jτ +1 |p, α, θ) × π1;i (pi ) ∝

Y j∈E

24

˜ N

pij i,j ×

Y j∈E

γ −1

pijij

and point (a) of the thesis follows. With regard to the full conditional distribution of θij given (j, x, uT , jτ +1 , p, α, θ(−ij) ), we have π2;ij (θij |j, x, uT , jτ +1 , p, α, θ(−ij) ) ∝ L(j, x, uT , jτ +1 |p, α, θ) × π2;ij (θij |αij ) # " ˜ (α ) M αik −1 Y N˜ Y − ikαikik αij ˆbq (αij , x−mij )mij −(1+mij αij ) C Nik ik θ ik × e × = θij pij ij × αik αik Nik Γ(m ) θ ij ik i,k∈E i,k∈E ( ) ˆbq (αij , x−m ) ij × exp − αij θij ) ( ˆbq (αij , x−m ) + M ˜ ij (αij ) −[1+αij (mij +Nij )] ij . ∝ θij exp − α θijij As one can see, the last function is the kernel of an inverse Gamma distribution with parameters mij + θij and ˆbq (αij , x−mij ) + Mi,j (αij ) and point (b) of the thesis follows. A similar reasoning yelds a posterior distribution π3;ij (αij |j, x, uT , jτ +1 , p, α(−ij) , θ) which is proportional to (16). Furthermore, concerning its log-concavity, from (16) it follows that log π(αij |j, x, uT , jτ +1 , p, α(−ij) , θ) = h i = log π3;ij (αij ) + αij log(Cij tij ) − (m + N ) log θ ij ij ij qij + (Nij + 1) log αij −

ˆbq (αij , x−m ) + M ˜ ij (αij ) ij . αij θij

All terms of this expression are log-concave. In particular π3;i,j is log-concave by construction and the last term is a sum of concave functions of kind g(z) 7→ −z αij . Hence the log-concavity follows from the property that the sum of concave functions is concave. Finally (4) implies the following: P (Jτ +1 = j, Xτ +1 > uT | j, x, p, α, θ) k∈E P (Jτ +1 = k, Xτ +1 > uT | j, x, p, α, θ) αjτ j o n  pjτ j exp − θujTj αjτ k o . n τ =P uT p exp − k∈E jτ k θj k

P (Jτ +1 = j| j, x, uT , p, α, θ) = P

τ

25

The same type of argument can be used for proving (e).



A.2. JAGS implementation. Proposition A.1 implies that JAGS should be able to run an exact Gibbs sampler. The model description we have adopted in the JAGS language based on the full likelihood (15). It is not important that the model description matches the actual model which generated the data as long as the full conditional distributions remain unchanged. In particular, for any i, jτ and

P

k

Nik can be considered as fixed. Then, given pi ,

(Ni1 , . . . , Nis ) is regarded as a sample from a multinomial distribution with probability P vector pi and k Nik trials; furthermore, the factor pjτ jτ +1 in the full likelihood is regarded as the prior for Jτ +1 as it ranges from 1 to s; finally, upon multiplication by the Dirichlet prior for pi we re-obtain the correct full conditional distribution of Proposition A.1 (a). The uncensored inter-occurrence times are described as Weibull distributed with shape and rate parameters that depend on j. The right-censored observation uT is handled by a special instruction provided by the JAGS language, which should represent the actual likelihood term correctly, so that the correct form of the full conditional of Jτ +1 is preserved. The description of the prior distributions is routine. We have mentioned the Dirichlet prior for pi ; αij has a shifted Gamma distribution if mij > 1, as specified at the end of α

Section 4.3; aij := 1/θijij is Gamma distributed, conditionally on αij , for any value of mij ˜ij and tij qij ; tqij is uniformly distributed if mij = 0, otherwise it is constant. The only nontrivial case arises for the prior of αij if mij ≤ 1, which reduces to π ˜3 (αij ) ∝ (1 − α0 /α) 1α0 ≤α≤α1 (see Section 4.4). This distribution is coded using the so-called zeros trick: a fictitious zero observation from a Poisson distribution with mean φ = − ln π ˜ (αij ) is introduced; a uniform prior over [α0 , α1 ] is assigned to αij ; then the likelihood factor corresponding to the zero observation is exp(−φ), which multiplied by the uniform prior gives the desired factor in the joint distribution of the data and the parameters.

26

Acknowledgments We are grateful to Renata Rotondi for providing us with data and the map of Italy along with her very helpful comments. We are solely responsible for any remaining inaccuracy.

References Alvarez, E.E. (2005). Estimation in stationary Markov Renewal Processes, with application to earthquake forecasting in Turkey. Methodol. Comput. Appl. Probab., Vol. 7, pp. 119–130. Berger, J.O. and Sun, D. (1993). Bayesian Analysis for the Poly-Weibull Distribution. J. Amer. Statist. Assoc., Vol. 88, pp. 1412–1418. Betr`o, B., Garavaglia, E., Grandori Guagenti, E., Rotondi, R. and Tagliani, A. (1989). Sulla distribuzione dei tempi di intercorrenza fra eventi sismici in alcune zone italiane (On the distribution of the inter-occurence times between seismic events in some Italian areas). Proceedings del 4 Conv. Naz. “L’Ingegneria Sismica in Italia”, ANIDIS - Politecnico di Milano, Milano, Ed. Patron Bologna, Vol. 1, pp. 135–144. Bousquet, N (2006). A Bayesian analysis of industrial lifetime data with Weibull distributions. Rapport de recherche INRIA n.6025, pp. 1-21. Bousquet N. (2010) Elicitation of Weibull priors. http://arxiv.org/abs/1007.4740v2 . Cano J., Moguerza J.M., R´ıos Insua D. (2011). Bayesian analysis for semi-Markov processes with applications to reliability and maintenance. Technical report, Madrid, Universidad Rey Juan Carlos. CPTI

Working

Group

(2004).

version 2004 (CPTI04),

Catalogo

Parametrico

dei

Terremoti

Ist. Naz. di Geofis. e Vulcanol.,

Bologna,

Italiani, Italy.

http://emidius.mi.ingv.it/CPTI04/. DISS Working Group (2007) Database of Individual Seismogenic Sources (DISS), version 3.0.2: A compilation of potential sources for earthquakes larger than M 5.5 in Italy and surrounding areas. http://diss.rm.ingv.it/diss/. 27

Epifani I., Fortini S., and Ladelli L. (2002). A characterization for mixtures of Semi Markov processes. Statist. Probab. Lett., Vol. 60, pp. 445–457. Foucher, Y., Mathieu, E., Saint-Pierre, P., Durand J.F. and Daur`es, J.P.(2009). A SemiMarkov Model Based on Generalized Weibull Distribution with an Illustration for HIV Disease. Biometrical Journal, Vol. 47, pp. 825–833. Garavaglia, E. and Pavani, R. (2012). About Earthquake Forecasting by Markov Renewal Processes. Methodol. Comput. Appl. Probab., Vol. 13, pp. 155–169. Gilks, W.R., Best, N.G. and Tan, K.K.C. (1995). Adaptive rejection Metropolis sampling. Applied Statistics, Vol. 44, pp. 455–472. Gilks, W.R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statistics, Vol. 41, pp. 337–348. Grandori Guagenti, E. and Molina, C. (1986). Semi-Markov Processes in seismic risk analysis. In Semi-Markov Models: Theory and Applications, Ed. Janssen, J., pp. 487–503, Plenum Press, New York. Grandori Guagenti, E., Molina, C. and Mulas G. (1988). Seismic risk analysis with predictable models. Earthquake Eng. Struct. Dynam., Vol. 16, pp. 343–359. Hougaard, P. (1999). Multi-state models: a review. Lifetime Data Anal., Vol. 5, pp. 239– 264. Limnios, N. and Oprisan, G. (2001). Semi-Markov Model Processes. Birkhauser, Boston. Masala, G. (2012). Earthquakes occurrences estimation through a parametric semi-Markov approach. J. Appl. Statist., Vol. 39, pp. 81–96. Mar´ın J., Pl`a L. and R´ıos Insua D. (2005). Forecasting for Some Stochastic Process Models Related to Sow Farm Management. J. Appl. Statist., Vol. 32, pp. 797–812. Patwardhan A.S., Kulkarni R.B. and Tocher D. (1980). A semi-Markov model for characterizing recurrence of great earthquakes. Bull. Seismol. Soc. Am., Vol. 70, pp. 323–347. Peruggia, M. and Santner, T. (1996). Bayesian analysis of time evolution of earthquakes. J. Amer. Statist. Assoc., Vol. 91, pp. 1209–1218. Plummer, Martyn 2010. JAGS Version 3.1.0: http://mcmc-jags.sourceforge.net/ 28

Just Another Gibbs Sampler, URL

Rotondi, R. (2010). Bayesian nonparametric inference for earthquake recurrence time distributions in different tectonic regimes. J. Geophys. Res., Vol. 115, B01302, DOI: 10.1029/2008JB006272. Rovida, A., Camassi, R., Gasperini P. and Stucchi M. (Eds.) (2011). CPTI11, the 2011 version of the Parametric Catalogue of Italian Earthquakes. Milano, Bologna, Italy. http://emidius.mi.ingv.it/CPTI. Votsi, I. Limnios, N., Tsaklidis, G. and Papadimitriou, E. (2012). Estimation of the Expected Number of Earthquake Occurrences Based on Semi-Markov Models. Methodol. Comput. Appl. Probab., Vol. 14, pp. 685–703.

E-mail address: [email protected] Politecnico di Milano, piazza Leonardo da Vinci 32, I-20133, Milano, Italy E-mail address: [email protected] Politecnico di Milano, piazza Leonardo da Vinci 32, I-20133, Milano, Italy E-mail address: [email protected] CNR IMATI, Via Bassini 15, I-20133 Milano, Italy

29

Suggest Documents