Likelihood Based Inference for the Multivariate Renewal Hawkes ...

Likelihood Based Inference for the Multivariate Renewal Hawkes Process

∗

Tom Stindl Department of Statistics, UNSW Sydney and Feng Chen Department of Statistics, UNSW Sydney

Abstract The recent introduction of the renewal Hawkes (RHawkes) process has extended the modeling capabilities of the classical Hawkes self-exciting process by allowing the immigrant arrival times to follow a general renewal process rather than a homogeneous Poisson process. This paper considers the multivariate extension to the RHawkes process by allowing different event types to interact with self- and cross-excitation effects, which we term the multivariate renewal Hawkes (MRHawkes) process model. We propose a recursive algorithm to directly compute the likelihood of the model, which forms the basis of statistical inference. A modified algorithm for likelihood evaluation is also developed which reduces computational time. Our algorithm also implies a procedure to compute independent uniform residuals which serves as the basis for goodness-of-fit assessment of the temporal patterns of the events and distribution of event types. The plug-in predictive density function for the next event time and methods to make future predictions using simulations are presented. Simulation studies will show that our likelihood evaluation algorithms and the prediction procedures are performing as expected. To illustrate the proposed methodology, we analyze data on earthquakes in two Pacific island countries Fiji and Vanuatu and also trade-through data for the stock BNP Paribas on the Euronext Paris stock exchange.

Keywords: finance, forecasting, maximum likelihood, model assessment, point process, seismology

∗

This research includes computations using the Linux computational cluster Katana supported by the Facuatly of Science, UNSW Sydney. Stindl was supported by an Australian Government Research Training Program Scholarship. Chen was partly supported by a UNSW SFRGP grant.

1

1

Introduction

The class of self-exciting point processes provides a flexible framework to model and describe a wide range of events that occur over time whose intensity or rate of arrival is influenced by both exogenous and endogenous effects. An attractive feature of this class of models is their ease of interpretation and as such have appeared in a wide variety of application domains. The most commonly applied model of this type is the Hawkes process Hawkes (1971), where exogenous events termed immigrants arrive according to a homogeneous Poisson process. This model allows for straightforward calculation of the likelihood and therefore makes likelihood based inference easy to implement. However, in many applications the Hawkes model fails to provide an adequate fit to the data. There are many works in the literature devoted to extending the Hawkes process. One approach is to allow the model parameters to vary over time while still assuming the immigrants arrive according to a (possible inhomogeneous) Poisson process (Mohler et al., 2011; Chen and Hall, 2013; Roueff et al., 2016; Godoy et al., 2016). Another approach relies on the branching Poisson process interpretation of the Hawkes process (Hawkes and Oakes, 1974) and generalizes the immigrant arrival process. An example is the renewal Hawkes (RHawkes) process (Wheatley et al., 2016; Chen and Stindl, 2017) where the immigrant arrival process is allowed to be a general renewal process rather than the homogeneous Poisson process. The background event intensity of the RHawkes process is no longer a deterministic constant or function as in the classical Hawkes process or its time-varying extensions, and thus the likelihood function of the RHawkes process is nontrivial to evaluate. Convinced that the likelihood for the RHawkes process takes exponential time to evaluate, Wheatley et al. (2016) proposed two Expectation-Maximization (EM) type algorithms (Dempster et al., 1977) to calculate the maximum likelihood estimator (MLE) without having to evaluate the likelihood, using two different choices of the sets of missing data. They applied the RHawkes process to model the mid-quote price changes of the E-mini S&P500 future contracts and found the RHawkes model provided a better fit to the data than the Hawkes model. Chen and Stindl (2017) proposed a recursive method to directly evaluate the likelihood of the RHawkes process, in quadratic time, and showed that the likelihood can be directly optimized to obtain the MLEs of the model parameters and their 2

standard errors. They also proposed computationally efficient methods for goodness-of-fit assessment and prediction, and applied their methodologies to earthquake occurrence modeling and to foreign exchange data analysis. They also found that the RHawkes process provided better fit to the earthquake data than the classical Hawkes process and was able to give reasonably accurate predictions of future earthquakes. In many areas the researchers also encounter multi-type event sequence data. For example, in earthquake modeling, the data may contain earthquakes from several neighboring regions. In finance, the tick history data on a specific stock typically records both the times of trades and the times of quotes; and order book data records the arrival times and other features of limit and market orders such as the side of the trade. In these applications, it is of interest to study not only the interactions within events of the same type, but also the interactions between events of different types. Therefore, multivariate point processes, where different components are allowed to interact with each other, are needed. A multivariate point process model for this purpose is the multivariate Hawkes process (Hawkes, 1971). Bowsher (2007) modeled the timing of trades and mid-quote price changes for a NYSE stock using a generalized bivariate Hawkes process that allows the baseline event rate to vary with time. Embrechts et al. (2011) fit the bivariate Hawkes process to daily data on the negative and positive exceedances of certain threshold levels by the Dow Jones Industrial Average index. Bacry et al. (2013) showed that the multivariate Hawkes process can demonstrate the Epps effect and lead-lag effect observed in financial data. When interpreted as branching Poisson processes, both the multivariate Hawkes process and the generalization considered by Bowsher (2007) assume the arrival processes of immigrants to be Poisson and therefore do not allow over- or under-dispersion of the numbers of immigrants, or serial correlation of the numbers of immigrants in non-overlapping time intervals, even events of the same type. Such assumptions restrict the modeling capabilities of the multivariate Hawkes process unnecessarily. In this paper we consider a point process model which extends the renewal Hawkes process by allowing the events of the process to be of different types and, in addition to the self-excitation effect among events of the same type, allowing events of each type to affect the future occurrence rates of events of other types through the mutual excitation

3

mechanism adopted in the multivariate Hawkes processes. The model also extends the multivariate Hawkes process in that the immigrant events of different types can arrive according to general renewal processes, rather than Poisson processes in the classical multivariate Hawkes processes. This implies that the numbers of immigrant events of the same type in non-overlapping time intervals are allowed to have serial correlation and to be overor under-dispersed relative to the Poisson distribution. We naturally call this model the multivariate renewal Hawkes process, or MRHawkes process for short. Similar to the RHawkes process, the MRHawkes process can be efficiently simulated by utilizing the branching process interpretation. Also similar to the RHawkes process, the MRHawkes process does not have an easy to evaluate likelihood function. We derive an algorithm to efficiently evaluate the likelihood of the MRHawkes process model, using an approach analogous to that of Chen and Stindl (2017). We demonstrate the feasibility of fitting the MRHawkes process model to data by likelihood maximization, on simulated data and on real life data. The time and space complexities of the algorithm for MRHawkes process likelihood evaluation are both polynomial in the number of events observed, and therefore the algorithm can be fairly slow on large data sets. To overcome this issue, we shall propose a simple modification to the algorithm, which can yield a good approximation of the likelihood in quadratic time and linear storage space. We will also provide an approach to assess the goodness-of-fit of the MRHawkes model based on the Rosenblatt residuals (Rosenblatt, 1952). A simulation based approach to predict future event occurrences will also be proposed. The rest of the paper is structured as follows. Section 2 introduces the MRHawkes process model. Section 3 derives the algorithm to evaluate the likelihood of the MRHawkes process. The method to evaluate the goodness-of-fit is presented in Section 4 and the method for future events prediction in Section 5. Results of our simulation studies are presented in Section 6 together with methods to simulate the process and the assessment of the predictive performance of the model. Applications in seismology and finance are presented in Section 7 with an analysis of earthquakes arising in two Pacific island countries Fiji and Vanuatu and a data set of trade-throughs for the stock BNP Paribas on the Euronext Paris stock exchange.

4

2

Model and Notation

Let {(τi , zi ), i = 1, 2, . . . } be a realization of a multivariate point process where τ1 < τ2 < . . . are distinct and interpretable as the occurrence time of the ith event and zi ∈ {1, . . . , M } indicating the ith event type. Let the associated M -variate counting process be N t = Nt1 , . . . , NtM , where Ntm is the number of type-m events. Denote the unobservable immigrant or offspring status indicator as Mi , where Mi = 0 if the event is an immigrant otherwise it is an offspring event and Mi = 1. To specify the intensity of the MRHawkes process we need the unobservable index of the most recent immigrant arrival from component m before time t which we will denote Im (t) = max{i ; τi < t, Mi = 1, zi = m} and collect these to form the M -dimensional vector I(t) that contains the last immigrant index for all components at time t. The natural filtration of the multivariate point process is denoted by F = {Ft ; t ≥ 0}, so that Ft = σ {N s ; s ≤ t}. The conditional intensity vector λ(t) is all that is needed to fully specify the multivariate point process model. The intensity for the mth component of the multivariate renewal Hawkes (MRHawkes) process λm (t), t ≥ 0 relative to the enlarged filtration F˜t = σ {N s , I(s); s ≤ t}, t ≥ 0 takes the following form

λm (t) =

h i ˜ E dNm (t)|Ft− dt

= µm (t − τIm (t) ) +

X

ηm,zj hm,zj (t − τj )

j : τj 0 being the hazard rate function of the i.i.d. (independent and identically distributed) waiting times between successive immigrants. The constant ηm,zj ≥ 0 is called the branching ratio, indicating the average number of type-m children due to an event of type zj . The function hm,zj (·) > 0 is called the offspring density, indicating the density of the birth times of children given there is at least one child. The function ηm,zj hm,zj (·) is known as the excitation function, that indicates the excitation effect for component m due to component zj . It is assumed that the functions µm (·) all integrate to infinity and the 5

branching ratios ηm,zj are strictly smaller than unity. Furthermore, we assume the largest eigenvalue of the branching matrix, defined as H := (ηjk ; j, k ∈ {1, · · · , M }), to be strictly smaller than unity. This paper is concerned with the estimation of model parameters when the µm (·)’s and hm (·)’s are given a parametric form up to a finite dimensional parameter. The i.i.d. waiting times between immigrant arrivals form a renewal process for the mth component, whereas in the multivariate Hawkes process, type-m immigrants arrive according to a Poisson process. As a consequence the multivariate Hawkes process can be regarded as a specific case of the MRHawkes process when the immigrant renewal process is specified to have an exponential inter-renewal distribution and so the hazard functions µm (·) are only constants. A commonly used form of the inter-renewal hazard function is κm µm (t) = βm

t βm

κm −1 , t≥0

in which case the corresponding distributions are Weibull, and the κm and βm are referred to as the shape and scale parameters respectively. When the shape parameters are unity, the hazard functions are constant and the immigrant arrival processes are Poisson. We can conduct hypothesis tests on the shape parameters to see if any of them are statistically different from unity and so the MRHawkes deviates from the multivariate Hawkes process. Figure 1 presents a specific realization of a bivariate RHawkes process. The figure displays the intensity of a simulated path with Weibull inter-event waiting time distributions. The shape parameters for the two components are κ1 = 3 and κ2 = 1/3. The offspring densities are exponential with mean waiting time of ten. The figure also displays the intensity for each individual component λ1 (t) and λ2 (t) contributing to the total intensity of the process. Given a positive cross-exciting effect, it is evident that the two intensities tend to track one another fairly well as the other component can be the trigger that ignites its own self-excitation. The figure also presents the barcode plot for each component together with the pooled event times. The first component tends to exhibit evenly distributed event times due to the large shape parameter while the second component has a rather small shape parameter and so the times tend to cluster heavily. The choice of parameters here illustrates the flexible specification of the immigration process that the MRHawkes model provides compared to the original Hawkes (1971) model as such processes can range from 6

highly dispersed to having highly regular spacing. Total Component 1 Component 2

Figure 1: A specific realization of a bivariate renewal Hawkes process. The figure displays the intensity function λ(t) and the associated component intensities λ1 (t) and λ2 (t). The figure also presents the barcode plot where bars in the first row indicate events from the first component, the second row from the second component and the last row is the pooled event times.

3

Direct Likelihood Evaluation

This section develops a recursive algorithm to evaluate the likelihood of the MRHawkes process model based on the observed data over the interval (0, T ], which consists of the event times τ1:n and event types z1:n . It is natural to evaluate the likelihood by conditioning on the history of the process up until each event time. That is, we condition on the previous event times and types. The likelihood is decomposed as a product of conditional densities using the chain rule as follows

L(τ1:n , z1:n |θ) = p(τ1 , z1 )

( n Y

) p(τi , zi |τ1:i−1 , z1:i−1 ) P(τn+1 > T |τ1:n , z1:n ),

(2)

i=2

where τ1:i is short for (τ1 , . . . , τi ) and z1:i for (z1 , . . . , zi ). The form of the conditional intensity given in (1) requires the last immigrant arrival index vector to be conditioned upon. This allows the intensity and indeed the inter-event waiting time distribution between events to have an easily computable expression. Under this condition we can evaluate

7

the expression µm (t − τIm (t) ) and so by conditioning on the most recent immigrant arrivals for each component m = 1, . . . , M , denoted by j = (j1 , . . . , jM ) taking values in {0, 1, . . . , , i − 1}M , the following decompositions are obtained p(τi , zi |τ1:i−1 , z1:i−1 ) =

X

di (j) × pi (j),

i = 1, 2, . . . , n,

(3)

∀j

P (τn+1 > T |τ1:n , z1:n ) =

X

Sn+1 (j) × pn+1 (j),

(4)

∀j

where di (j) := p(τi , zi | τ1:i−1 , z1:i−1 , I(τi ) = j),

(5)

Sn+1 (j) := P (τn+1 > T |τ1:n , z1:n , I(τn+1 ) = j) ,

(6)

pi (j) := P(I(τi ) = j|τ1:i−1 , z1:i−1 ).

(7)

The last immigrant index vector j requires some further constraints to be valid. Each element of j must be unique unless it takes the value zero. This restriction implies that the last immigrant for different components do not occur simultaneously. The exception occurs when no from a component had arrived by time τi and so the elements of j correspond to those event types. In the following we delve into some notation that follows closely from the work of Chen and Stindl (2017) adapted to the multivariate context. We will denote the cuRt mulative immigrant hazard function for type-m events as Um (t) = 0 µm (s)ds, the offspring distribution function for individuals of type-m given the parent is a type-n event as Rt Hm,n (t) = 0 hm,n (s)ds and the cumulative offspring effects for type-m events as Φm (t) = Rt P φ (s)ds = j : τj 2, the conditional densities

and survival probabilities given in (5) and (6) are directly computable and given by the following expressions di (j) = e−

PM

Sn+1 (j) = e−

PM

m=1 (Um (τi −τjm )−Um (τi−1 −τjm ))−(Φ(τi )−Φ(τi−1 )) m=1 (Um (T −τjm )−Um (τn −τjm ))−(Φ(T )−Φ(τn ))

{µzi (τi − τjzi ) + φzi (τi )},

.

(8) (9)

It remains to calculate the conditional probabilities pi (j) given in (7). By conditioning on the vector of last immigrants I(τi−1 ) and Bayes’ theorem, the following recursion is obtained pi (j) =

X

=

X

P (I(τi ) = j| τ1:i−1 , z1:i−1 , I(τi−1 ) = j 0 ) P (I(τi−1 ) = j 0 |τ1:i−1 , z1:i−1 )

j0

P (I(τi ) = j| τ1:i−1 , z1:i−1 , I(τi−1 ) = j 0 ) ×

j0

p(τi−1 , zi−1 |τ1:i−2 , z1:i−2 , I(τi−1 ) = j 0 ) P (I(τi−1 ) = j 0 |τ1:i−2 , z1:i−2 ) p (τi−1 , zi−1 |τ1:i−2 , z1:i−2 ) X di−1 (j 0 ) × pi−1 (j 0 ) = P (I(τi ) = j| τ1:i−1 , z1:i−1 , I(τi−1 ) = j 0 ) × , p(τi−1 , zi−1 |τ1:i−2 , z1:i−2 ) 0

(10)

j

where the summation index j 0 takes values in {0, . . . , i − 2}M . An important observation to make is that at most one component of the last immigrant vector I(τi ) can equal i − 1 while the remaining components must remain the same as in I(τi−1 ), according to whether Mi−1 = 1 or Mi−1 = 0 and the event type zi−1 . The following Markov type property between I(τi ) and I(τi−1 ) holds

I(τi ) =

  I(τi−1 )

if Mi−1 = 1

(11)

 δz (I(τi−1 )) if Mi−1 = 0 i−1 where we define δm (v) = v + ((i − 1) − eTm v)em to be the function that returns the same input vector v except that the mth component vm takes the value i − 1. The function δm (·) 9

updates the last immigrant vector to indicate that a type-m immigrant has arrived at time τi−1 . The thinning property (Daley and Vere-Jones, 2003) states that the probability that an observed event comes from one of the independent sub-processes, which consists of the combination of event type and immigrant or offspring status in the MRHawkes case, is equal to the proportion of that sub-processes contribution to the total conditional intensity. On the arrival of an offspring event the last immigrant indicator function remains the same and so j = j 0 . On the other hand, when a type-m immigrant arrives we see that j = δm (I(t)). Now by conditioning on the last immigrant vector at time τi−1 the following equation holds    P (Mi−1 = 1|τ1:i−1 , z1,i−1 , I(τi−1 ) = j 0 ) j = j0    P (I(τi ) = j|τ1:i−1 , z1:i−1 , I(τi−1 ) = j 0 ) = P (Mi−1 = 0|τ1:i−1 , z1:i−1 , I(τi−1 = j 0 )) j = δzi−1 (I(τi−1 ))     0 else where P (Mi−1 = 1|τ1:i−1 , z1,i−1 , I(τi−1 ) = j 0 ) =

φzi−1 (τi−1 ) µzi−1 (τi−1 − τjz0 i−1 ) + φzi−1 (τi−1 )

(12)

and P (Mi−1 = 0|τ1:i−1 , z1:i−1 , I(τi−1 ) = j 0 ) =

µzi−1 (τi−1 − τjz0 i−1 ) µzi−1 (τi−1 − τjz0 i−1 ) + φzi−1 (τi−1 )

.

(13)

Now by combining (10), (12) and (13), calculation of the latest immigrant probabilities in (7) reduces to the following recursion,  φzi−1 (τi−1 ) di−1 (j) pi−1 (j)   × , j = j0    µzi−1 (τi−1 − τjzi−1 ) + φzi−1 (τi−1 ) p (τi−1 , zi−1 |τ1:i−1 , z1:i−1 ) i−2 pi (j) = X µzi−1 (τi−1 − τjz0 i−1 ) di−1 (j 0 ) pi−1 (j 0 )   × , j = δzi−1 (τi−1 )   j 0 =0 µzi−1 (τi−1 − τjz0 i−1 ) + φzi−1 (τi−1 ) p (τi−1 , zi−1 |τ1:i−1 , z1:i−1 ) zi−1

(14) for i = 3, . . . , n + 1. To evaluate the likelihood function at some given parameter values the conditional

10

density p(τi , zi |τ1:i−1 , z1:i−1 ) and pi (j) are computed using the bivariate recursion developed in (3) and (14), and the expression for di (j) given by (8). The initial condition is p2 (ek ) equals one if z1 = k otherwise it equals zero, for all k = 1, . . . , M , where ek is the unit vector with 1 in position k and zero elsewhere. The survival probability P(τn+1 > T |τ1:n , z1:n ) is calculated using (4), (9), and pn+1 (j). Once all these expression have be evaluated, the relevant terms can be substituted into (2) to compute the likelihood. Given the parametric forms for the immigrant hazard rate functions and offspring densities, the likelihood can be directly evaluated and maximized with general purpose optimization routines to obtain the MLEs. However computational difficulties may arise when directly maximizing the log-likelihood function as in the multidimensional setting the likelihood for such models are typically multi-modal or exhibit extremely flat log-likelihood surfaces. We suggest during the optimization procedure it would be prudent to use a number of different initial parameters for the parameter search and then select the optimizers with the largest loglikelihood values. Remark 3.1. The required computational time for likelihood evaluation of the MRHawkes process is a polynomial function of the number of events n, or O(nM +1 ). The storage of the last immigrant probabilities requires an M -dimension matrix of size nM . Therefore, the computation of the likelihood is practically infeasible in applications with a large n. However, these probabilities become insignificant for very distant event times and so to speed up the likelihood evaluation algorithm we can assume that these probabilities are immaterial and truncate the last immigrant probabilities. The modified algorithm only considers the last B event times to be possible immigrants and so the storage is always reduce to an M -dimensional matrix of size B M . With this truncation, the time required to compute the likelihood is also reduced to O(n2 ) in general, or O(n) when the offspring distribution is exponential. The tuning parameter B represents a trade off between computational time and computational accuracy. In practice it is advisable to use several B values to make sure the truncation is not having a material effect on the final parameter estimates.

11

4

Evaluating Model Performance

A natural next step is to assess how well the model fits to the data. There are two aspects of the model that need to be considered, the temporal patterns of the events and the distribution of the event types. For the former, we use the Rosenblatt (1952) residuals, similar to that used in Chen and Stindl (2017). For the latter, we use the universal residuals introduced by Brockwell (2007), a generalized version of the Rosenblatt residuals to accommodate distributions with discontinuities. When the model specification is correct, the residuals form an i.i.d. sequence of uniform random variables on the unit interval. Therefore, to assess model fit, we can examine the residual sequence for uniformity and independence. Specifically, the Rosenblatt residuals for event times are defined as Wi = Fi (τi |τ1:i−1 , z1:i−1 ), where Fi (t|τ1:i−1 , z1:i−1 ) is the conditional distribution function of τi given τ1:i−1 and z1:i−1 . Analogous to Eq. 10 in Chen and Stindl (2017), the Wi are given by Wi = Fi (τi |τ1:i−1 , z1:i−1 ) = 1 −

X

pi (j)Si (j)

(15)

∀j

where pi (j) is the last immigrant probabilities which were computed in the likelihood evaluation in (14) and Si (j) is given similarly to (9) by the following ( Si (j) = exp −

M X

) (Um (τi − τjm ) − Um (τi−1 − τjm )) − (Φ(τi ) − Φ(τi−1 )) ,

∀j.

m=1

The universal residuals for event types z1:n are defined as Vi = (1 − Ui )Gi (zi − |z1:i−1 , τ1:i ) + Ui Gi (zi |z1:i−1 , τ1:i ), i = 1, . . . , n, where U1:n is an auxiliary sequence of i.i.d. uniform random variables on the unit interval, independent of N (T ), z1:n and τ1:n ; Gi (z|τ1:i , z1:i−1 ) is the (discrete) distribution function of the event type zi given previous event types z1:i−1 and previous and current event times τ1:i ; and Gi (z − |τ1:i , z1:i−1 ) denote the left limit of Gi (·|τ1:i , z1:i−1 ) at z. To compute the conditional distribution function Gi , we need the conditional probabilities of the event

12

types. From our calculations in Section 3, we note that P

∀j d(j, m)pi (j) , P(zi = m|τ1:i , z1:i−1 ) = PM P d(j, z)p (j) i z=1 ∀j

where d(j, zi ) = di (j) as given previously in (5).

5

Prediction

This section considers the problem of predicting the occurrence time and event type of the next event after the censoring time T based on the observations up until the censoring time. One solution to this problem is given by the plug-in predictive density. Using the conditional probabilities derived in the evaluation of the likelihood, the joint conditional predictive density of the next occurrence time and event type (τN (T )+1 , zN (T )+1 ) with respect to the product measure L ⊗ C, where L and C are the Lebesgue and counting measures respectively, is given by P

p (τ, z|τ1:n , z1:n , τ > T ) =

pn+1 (j) dn+1 (j) , P (τn+1 > T |τ1:n , z1:n ) j

τ > T, z ∈ {1, . . . , M } ,

(16)

where pn+1 (j) are calculated using (14) as before, the denominator is computed in (4) and dn+1 (j) are given similar to (8) by the following dn+1 (j) = e−(

PM

m=1

Um (τ −τjm )−Um (τn −τjm ))−(Φ(τ )−Φ(τn ))

{µz (τ − τjz ) + φz (τ )} .

(17)

The predictive density for the next event occurrence is computed by marginalizing the joint density in (16) which is achieved by taking a sum over all possible event types z to obtain the following P p (τ |τ1:n , z1:n , τ > T ) =

P {p (j) n+1 j z dn+1 (j)} , P (τn+1 > T |τ1:n , z1:n )

τ > T.

(18)

The predictive density depends on the unknown parameters of the model and so to overcome this we replace the unknown parameters by the MLEs obtained in Section 3, giving rise to the plug-in predictive density. 13

Another approach for future event prediction is to simulate the number of events from the censoring T to a future time point T˜. Predictive simulations require us to take into account the actual observations up to time T by conditioning on N (T ) = n and the values of τ1:n and z1:n . The algorithm works as follows. First we simulate the M -dimensional event index of the last immigrant arrivals before the censoring time, according to the conditional distribution P (I(T ) = j|τ1:n , z1:n , τn+1 > T ) = pn+1 (j). Second, we simulate the next immigrant arrival for each component according to the appropriate conditional inter-renewal distribution given it is greater than the duration between the simulated last immigrant arrival and the censoring time. Third, we simulate the future arrival times of immigrants of different components by time T˜ according to the respective inter-renewal distributions. Fourth, we simulate the offspring processes up to time T˜ for each of the immigrants in the interval (T, T˜) according to the algorithm to be described in Section 6.1 below. Last, we simulate the arrival times in (T, T˜] of the offspring descending from events prior to the censoring time T according to a non-stationary multivariate Hawkes process (NSMHP) with baselines intensity functions νj (·) = φj (T + ·), j = 1, . . . , M and excitation functions gmn (·) = ηmn hmn (·), m, n = 1, . . . , M , using the algorithm described below in Section 6.1. This procedure allows the future to be simulated many times given the model parameters and so this leads to another approach to the first prediction problem. The future can be simulated many times and then from these simulations we can extract the next event time and event type. In fact, any quantity of interest can be extracted from these simulations. For example, we can construct prediction intervals for the number of events in a prediction window by extracting the appropriate quantiles of the simulated numbers of events in the prediction window. It should be noted that this method does not take into account the uncertainty in the parameter estimates and thus leads to overly confident predictions. We shall show through simulations that this effect is inconsequential when there is enough data to ensure accurate estimation of model parameters.

14

6

Simulations

This section conducts simulation studies to assess the numerical performance of the maximum likelihood estimator (MLE) of the MRHawkes process model developed in Section 3 and assess the predictive performance of the model using simulated data. This section also explains how to efficiently simulate the process up to a predetermined censoring time T by utilizing the linear nature of the intensity.

6.1

Simulation algorithm for the MRHawkes process

Simulation of the MRHawkes process model can be efficiently implemented using the cascading algorithm motivated by the cluster process representation of the MRHawkes process. To simulate the occurrence times and event types to a predetermined censoring time T , we first simulate the immigrant arrival times up to time T for each event type as the cumulative sums of i.i.d. positive random variables with the appropriate hazard rate function. For each event type z ∈ {1, . . . , M }, denote the simulated immigrant arrival times 0 0 0 ≤ T . We simulate the corresponding offspring for each of the < · · · < τz,n < τz,2 by τz,1 z

immigrants i = 1, . . . , nz up to time T , which is achieved by simulating a non-stationary multivariate Hawkes process (NSMHP) with baseline intensity functions νj (t) = ηjz hjz (t), j = 1, . . . , M , and excitation functions gmn (t) = ηmn hmn (t), m, n ∈ {1, . . . , M }, on the 0 0 , T ]. ], and then translate the event times into the interval (τz,i interval (0, T − τz,i

The NSMHP on interval (0, T ] with baseline intensity functions νj (·), j = 1, . . . , M and excitation functions gmn (·), m, n = 1, . . . M , can be simulated with a cascading algorithm as follows. First we simulate the generation 0 events of types j = 1, . . . M on (0, T ] according to independent Poisson processes with intensity functions νj (·), j = 1, . . . , M ; then keep simulating generation i (i = 1, 2, . . . ) events as long as the number of generation i − 1 events of any type is non-zero. For each event type n = 1, 2, . . . , M we simulate the generation i events of types m = 1, . . . , M according to M independent Poisson processes with respective intensity functions gmn (·). When this recursive process stops, return events of all generations with their respective type labels as the events of the NSMHP on the interval (0, T ].

15

6.2

Simulation Studies

This section will assess the numerical performance of the statistical inferential methods developed in Section 3. The simulations performed in this paper have Weibull renewal immigrant inter-event waiting times with shape parameter κm and scale parameter βm . The offspring densities are chosen to be exponential with shape parameters (or mean) γmn . The bivariate version of the MRHawkes process model will be analyzed and so M = 2. For the first Weibull renewal distribution, the shape and scale parameter are κ1 = 3 and β1 = 1.2 and for the second κ2 = 1/3 and β2 = 0.2. These two processes correspond to evenly distributed immigrant arrivals and high levels of burstiness and clustering. The scale parameters for the renewal immigrant distributions are selected so that the expected waiting time between immigrants of the same type is close to one. For the endogenous aspects of the process, the exponential offspring densities are chosen to have a mean waiting time in the set γmn ∈ {0.5, 1, 3}. This parameter selection is chosen to exhibit offspring waiting times that are shorter than, equal to and longer than the expected immigrant inter-event waiting times. Further we assume that the offspring densities for offspring events of the same type have a common shape parameter, that is, γ11 = γ12 and γ22 = γ21 . The branching ratios for the self-excitation effects are chosen to be either ηs = 0.3 or ηs = 0.7, corresponding to low and high levels of self-excitation respectively. The cross-excitation effects have branching ratios ηc = 0.1 or ηc = 0.2, ensuring that the branching matrix H has spectral radius less than one, ensuring stability of the process. For each combination of our chosen parameters, the MRHawkes process was simulated 500 times with varying censoring times T indicated in the table to ensure the average length of the realizations was roughly 1000. For each simulated data set, the MLE was computed by directly maximizing the negative log-likelihood function using the quasi-Newton method, BFGS (Broyden-Fletcher-Goldfard-Shanno). The computations were implemented using the R language R Core Team (2016), with the aid of the optim function. The computations are conducted on Intel Xeon X5675 processors (12M cache, 3.06 GHz, 6.4GT/S QPI). The results of the simulations are presented in Table 1 in which we report; the mean of the parameter estimates (Est.), the empirical standard error of the parameter estimates (SE), the average of the standard errors obtained by inverting the approximate Hessian

16

True Est. SE ˆ SE CP.

κ1 3 3.314 1.113 0.803 0.944


3 3.119 0.522 0.479 0.950


3 3.027 0.278 0.262 0.952


3 3.021 0.311 0.294 0.952

β1 1.2 1.173 0.169 0.132 0.950 RT 1.2 1.200 0.0909 0.0836 0.948 RT 1.2 1.197 0.0519 0.0484 0.928 RT 1.2 1.195 0.0597 0.0542 0.960 RT

κ2 β2 γ1 1/3 0.2 1 0.331 0.232 1.028 0.0288 0.0870 0.248 0.0294 0.0751 0.206 0.948 0.972 0.950 = 20.2 hrs T = 170 1/3 0.2 1 0.332 0.231 1.151 0.0356 0.137 0.592 0.0334 0.0899 0.465 0.952 0.984 0.952 = 20.8 hrs T = 170 1/3 0.2 0.5 0.326 0.221 0.527 0.0218 0.0641 0.169 0.0195 0.0515 0.145 0.950 0.976 0.948 = 19.2 hrs T = 360 1/3 0.2 3 0.326 0.218 3.932 0.0216 0.0554 3.509 0.0195 0.0497 1.955 0.934 0.962 0.970 = 20.8 hrs T = 360

γ2 1 1.097 0.814 0.435 0.986 Spr(H) 1 1.030 0.193 0.177 0.950 Spr(H) 3 3.304 2.640 1.155 0.990 Spr(H) 0.5 0.506 0.101 0.0973 0.960 Spr(H)

η11 0.7 0.686 0.0568 0.0517 0.952 = 0.75 0.3 0.282 0.0700 0.0663 0.942 = 0.75 0.3 0.294 0.0378 0.0357 0.948 = 0.40 0.3 0.287 0.0558 0.0507 0.952 = 0.40

η12 0.2 0.212 0.0911 0.0838 0.948 AL. = 0.1 0.107 0.0310 0.0293 0.940 AL. = 0.1 0.103 0.0330 0.0298 0.956 AL. = 0.1 0.113 0.0624 0.0481 0.966 AL. =

η21 0.1 0.101 0.0257 0.0251 0.950 1022 0.2 0.213 0.0827 0.0799 0.940 1000 0.1 0.103 0.0398 0.0360 0.948 1003 0.1 0.102 0.0215 0.0223 0.954 1016

η22 0.3 0.291 0.0826 0.0789 0.964 0.7 0.683 0.0587 0.0576 0.950 0.3 0.302 0.0608 0.0588 0.954 0.3 0.306 0.0490 0.0485 0.954

Table 1: Results of the Maximum Likelihood Estimation on simulated data. ˆ the average length of the realizations (AL), the mean running time for the matrix (SE), optimization and computation of the approximate Hessian matrix (RT) and the empirical coverage probability (CP) of the approximate 95% confidence intervals. The results in Table 1 suggest that the maximum likelihood parameter estimates exhibit consistency as the estimates show relatively little bias. The standard error estimates capture the true variance of the estimates quite well as they are very close to the empirical standard errors. In the situations where the spectral radius of the branching matrix H is larger, that is when Spr(H) = 0.75, the immigration scale parameters tend to have a much larger empirical bias and standard error than when the spectral radius is lower. This observation is to be expected as the total number of immigrants is much smaller in this case due to the higher levels of excitation effects and the length of the realizations remain relatively fixed.

17

The expected number of type-1 immigrants is 159 and type-2 immigrants is 142 when the spectral radius is 0.75. When the spectral radius is only 0.40 the expected number of type-1 immigrants is 336 while type-2 immigrants is 300. When the arrival of an offspring event is more frequent relative to immigrants, there tends to be an overestimation of the shape parameter κm and the scale parameter βm exhibits quite a large bias. The branching ratios ηmn tend to be well estimated for the range of scenarios considered. The top two panels presented in Table 1 have mean waiting times for offspring generation γm well estimated with the estimate showing little to no bias. This is a result of the large number of total offspring events present in the realizations. In the lower two panels the mean waiting time for an offspring event is larger (γm = 3) and the estimates tend to show much larger empirical bias. We suggest that the reason for this is twofold. First, the larger shape parameter value causes a flatter likelihood surface and as such optimization of the likelihood function is challenging and produces very large standard errors. Secondly, the larger offspring waiting time, with respect to the immigrant inter-event waiting times, makes untangling the offspring and immigrant event times challenging due to the long tailed effects. The shorter mean waiting time (γm = 0.5), the less potent this effect. This effect is noticeable as the empirical bias is reduced greatly. Notice as the shape parameter γm decreases the empirical bias tends to shrink.

6.3

Modified Likelihood Evaluation Algorithm

This section will perform a simulation study on the modified likelihood evaluation algorithm discussed in Remark 3.1. Table 2 presents the results of the first simulation model discussed in Table 1 with the estimations performed with values of B in the set {100, 200, 300, 400, ∞}. The case where immigrants arrive more uniformly across time (κ1 = 3) only requires a relatively small value of B. It can be seen that looking back only 100 event times is fairly sufficient for accurate estimates as the parameters governing the first component κ1 , β1 , γ1 , η11 and η12 are very close to the true MLEs and so are their standard errors and coverage probabilities. On the other hand when immigrants arrive in burst or cluster heavily we need to consider looking at further distant events, roughly 400 event times in this simulation. The type-2 immigrant inter-event waiting times in this

18

simulation model can be quite large in comparison to the offspring waiting times and the type-1 immigrant inter-event waiting times. The component two parameters κ2 , β2 , γ2 , η21 and η22 continue to get closer to the true MLEs as the tuning parameter B gets larger, with quite good agreement when B = 400.


κ1 3 3.314 1.113 0.803 0.944

β1 κ2 1.2 1/3 1.173 0.331 0.169 0.0288 0.132 0.0294 0.950 0.948

Est. SE ˆ SE CP.

3.315 1.113 0.804 0.944

1.173 0.348 0.169 0.0277 0.132 0.0292 0.950 0.952

Est. SE ˆ SE CP.

3.314 1.113 0.803 0.944

1.173 0.335 0.169 0.0276 0.132 0.0294 0.950 0.946

Est. SE ˆ SE CP.

3.314 1.113 0.803 0.944

1.173 0.332 0.169 0.0281 0.132 0.0294 0.950 0.946

Est. SE ˆ SE CP.

3.314 1.113 0.803 0.944

1.173 0.331 0.169 0.0284 0.132 0.0294 0.950 0.942

β2 γ1 γ2 η11 η12 0.2 1 1 0.7 0.2 0.232 1.028 1.097 0.686 0.212 0.0870 0.248 0.814 0.0568 0.0911 0.0751 0.206 0.435 0.0517 0.0838 0.972 0.950 0.986 0.952 0.948 RT = 20.2 hrs B = Inf (MLE) 0.279 1.028 1.122 0.686 0.212 0.128 0.248 0.511 0.0568 0.0912 0.0796 0.206 0.425 0.0517 0.0837 0.970 0.950 0.962 0.952 0.948 RT = 0.407 hrs B = 100 0.247 1.028 1.120 0.686 0.212 0.109 0.248 1.077 0.0568 0.0911 0.0778 0.206 0.444 0.0517 0.0838 0.972 0.950 0.992 0.952 0.948 RT = 1.185 hrs B = 200 0.238 1.028 1.100 0.686 0.212 0.0986 0.248 0.827 0.0568 0.0911 0.0766 0.206 0.437 0.0517 0.0838 0.972 0.950 0.988 0.952 0.948 RT = 2.63 hrs B = 300 0.235 1.028 1.098 0.686 0.212 0.0941 0.248 0.819 0.0568 0.0911 0.0762 0.206 0.435 0.0517 0.0838 0.978 0.950 0.986 0.952 0.948 RT = 4.988 hrs B = 400

η21 0.1 0.101 0.0257 0.0251 0.950

η22 0.3 0.291 0.0826 0.0789 0.964

0.0869 0.282 0.0264 0.0777 0.0247 0.0776 0.944 0.956 0.0986 0.287 0.0261 0.0853 0.0250 0.0790 0.950 0.974 0.101 0.0257 0.0251 0.948

0.290 0.0823 0.0789 0.966

0.101 0.0257 0.0251 0.946

0.291 0.0825 0.0789 0.964

Table 2: Results of the Maximum Likelihood Estimation on simulated data with the likelihood truncated so that only the last B events are considered as possible immigrants. From the simulation study, it is evident that the choice of B will depend on the immigrant inter-event waiting times compared to the offspring waiting times together with the level of self- and cross-excitation. When the offspring waiting time is short or the immigrant waiting time is longer then a larger number of events must be considered as potential last immigrant arrivals and so a larger value of B is needed. However, if immigrants occur quite 19

regularly then only recent events needs to be considered as possible last immigrants and so B can be much smaller in this case. The choice of tuning parameter value B presents a trade off between the accuracy of the parameter estimates compared to the true MLE and time required for its estimation. One possible method to determine an appropriate value of B is to consider the difference between parameter estimates for different values of B and if the difference is immaterial for the particular application then this value of B is appropriate.

6.4

Assessing Predictive Performance

The predictive performance of the simulation based prediction procedure discussed in Section 5 will now be assessed. We consider making predictions using the simulated data in Table 1. We aim to predict the number of events that occur in the prediction window (170, 283] which is two thirds the length of the observation period for the 500 simulated data sets from the top panel in Table 1. To assess the performance, the predicted sample paths based on simulating the future with the estimated model parameters using the method of maximum likelihood are compared to the true parameter simulated path. For each simulated data set, the future is simulated 500 times. The resulting 95% prediction interval for the 500 sample paths contains the true number of events in 86.52% of all cases. This is slightly lower than the 95% that we would expect but the randomness of the parameter estimates has not been taken into account. The length of the observed sample paths are on average 1000 events but the large number of parameters and the large standard errors can hinder its predictive performance. When the realizations is long enough this should not be overly detrimental to the accuracy of the predictions. To account for the randomness inherent in the parameter estimates, the sampling distribution for θ can be utilized. For each simulated path, a new parameter θˆj is simulated from the multivariate normal distribution with mean θˆ and variance-covariance matrix obtained from the approximate Hessian matrix from the numerical optimization procedure. The simulation study suggest that the estimates are relatively normal as the empirical coverage probability is fairly consistent at the 95% level suggesting asymptotic normality. Although we should observe that when the shape parameter of the offspring density is large, this

20

assumption might be unreasonable. As the observed multivariate point process becomes longer the randomness is essentially reduced and has modest impact on the prediction interval.

7

Applications

7.1

Analysis of Earthquakes in Fiji and Vanuatu

This section conducts an analysis of the arrival times of earthquakes arising in two Pacific island countries Fiji and Vanuatu. The study considers magnitude 5.5 or greater earthquakes measured on the Richter scale for the 25 year period from 01/01/1991 to 31/12/2015. The data for this analysis was obtained from the earthquakes archive from the United States Geological Survey (USGS) which consists of 1076 earthquakes occurrences. Over this period 646 earthquakes occur in the area of Fiji which we will call type-1 events and 428 occur in the area of Vanuatu which will be type-2 events. Figure 2 presents a plot of earthquake occurrences surrounding Fiji and Vanuatu during this period where Fiji is on the right and Vanuatu is on the left. We aim to understand the interactions between these two neighboring locations and their propensity for earthquakes. We model the earthquake data with an MRHawkes process model with two components. The renewal immigration inter-event waiting time distributions are Weibull together with exponential offspring generation. The model incorporates an interaction between the two neighboring countries and allows for an ease of interpretation between immigrant arrivals (main shocks) and their descendants (after shocks) together with location. It is likely that some useful information will be lost if we attempt to capture this relationship in the univariate setting. The MLEs for the MRHawkes model are as follows κ ˆ 1 = 0.470 ,

βˆ1 = 19.97,

κ ˆ 2 = 0.342 ,

βˆ2 = 10.36,

γˆ2 = 566 ,

ηˆ11 = 0.428,

ηˆ12 = 0.367,

ηˆ21 = 0.382, ηˆ22 = −0.0375.

(0.0654) (269)

(3.47)

(0.152)

(0.0221) (0.245)

(2.51)

(0.111)

γˆ1 = 394 , (155)

(0.164)

where the standard errors are in brackets. The multivariate Hawkes process was also fit to the data for comparison with the same exponential offspring generation and results in the 21

Big earthquakes around Fiji (right) and Vanuatu (left), Jan 1991 − Dec 2015 ●

−20

latitude

−15

●

● ●● ● ● ●● ● ●● ●●● ● ●●●●● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ●●● ●● ●● ●● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ●● ● ●● ●● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ●●●● ● ●●● ●● ● ●● ● ●● ●● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ●● ●●● ● ● ● ● ●● ● ● ●● ● ●● ●●● ● ● ●●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ●● ● ● ● ●●● ● ●●●● ●● ● ● ● ● ● ●● ● ●● ● ●●● ●● ● ●● ●● ● ● ●● ● ● ●●●●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●● ●●● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ●●● ●● ● ●●

●● ●● ●

● ●

● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ●●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●●●●● ●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ●●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●●● ●●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ●● ●● ●●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●

● ●

●

● ●

● ● ●● ● ●● ●

● ● ●

● ● ●

●

● ●

●

● ● ●

●● ●● ●

●

●

●●

● ●

● ● ● ● ●●● ●

−25

●

170

●

● ●

● ●

●

165

● ● ●

● ●●

175

●

180

185

longitude

Figure 2: Large magnitude earthquakes arising in Fiji and Vanuatu during 1991 to 2015. Open circles are the location of an earthquake in Vanuatu while closed circles in Fiji. following parameter estimates µ ˆ1 = 14.84, (0.61)

µ ˆ2 = 27.26,

γˆ1 = 0.0600,

(1.64)

(0.0198)

γˆ2 = 0.320 , (0.0930)

ηˆ11 = 0.0536, ηˆ12 = −0.00332, ηˆ21 = 0.00128, ηˆ22 = 0.221 . (0.0111)

(0.00563)

(0.00502)

(0.0288)

The Rosenblatt residuals were calculated to assess the MRHawkes models performance. The uniform quantile plot and autocorrelation function (ACF) plot are presented in Figure 3. The uniformity of the residuals looks to be satisfied as the theoretical and empirical quantiles have good agreement. This is reinforced with a large P-value of 0.83 for the Kolmogorov-Smirnoff (KS) test of uniformity. The residuals also exhibit little to no serial correlation up to lag 30, with a P-value for the Ljung-Box test of independence of 0.47. However the multivariate Hawkes process uniformity of the residual is well satisfied with a P-value of 0.72 for the K-S test but the Ljung-Box test fails with a small P-value of only 0.01. The universal residuals of the event types were computed and the P-value of the KS test is found to be 0.8549 and the Ljung-Box test returns a P-value of 0.4067 suggesting 22

Residual Series

0.0

0.2

0.4

0.6

0.8

1.0 0.6

0.8

P−value of Ljung−Box test = 0.47

0.0

0.2

0.4

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

P−value of K−S test = 0.83

ACF

0.8 0.6 0.4 0.2 0.0

Empirical quantiles

1.0

Uniform QQ plot

1.0

0

Theoretical quantiles

5

10

15

20

25

30

Lag

Figure 3: Uniform quantile plot and ACF plot for the computed Rosenblatt transform residuals for the Fiji and Vanuatu earthquake data set for the MRHawkes model. the model is capturing the distribution of location between the two countries. We used 100 series of auxiliary uniform random variables to compute the universal residuals and found that all but one passed the test at the 5% level with relatively large P-values. For the combined series of residuals the uniformity and independence is well satisfied with all the test passing at the 5% level, however the Hawkes model fails the Ljung-Box test on 38 (out of the 100) residual series. The Akaike information criterion (AIC) for the two models are 7775.5 for the MRHawkes model and 7820.3 for the Hawkes model. The AIC criterion together with the assessment of the residuals suggest the MRHawkes process is outperforming the multivariate Hawkes process and provides a superior quality of fit. Indeed the Hawkes model fails to capture any interaction between the two locations as the cross exciting branching ratios are not significantly different from zero when taking into account their standard errors. In the current context immigrants are regarded as main shocks while offspring events are interpreted as the after shocks causes by a main or after shock event. The MRHawkes process model suggest that main shocks occur in Fiji on average every β1 Γ(1 + 1/ˆ κ1 ) = 45.08 days while the mean main shock inter-event waiting time for Vanuatu is β2 Γ(1 +

23

1/ˆ κ2 ) = 56.49 days. This interpretation differs significantly to the Hawkes model whereby immigrants arrive on average every 14.84 and 27.26 days respectively and so occur more frequently. Once a shock occurs in Fiji it directly induces on average ηˆ11 = 0.428 aftershocks in Fiji and ηˆ21 = 0.382 aftershocks in Vanuatu. Earthquakes in Vanuatu will directly cause on average ηˆ12 = 0.367 earthquakes in Fiji and reduce the intensity of a shock in Vanuatu proportionally to ηˆ22 = −0.0375, which is not statistically different from zero. The offspring shape parameters γˆm are interpreted as the expected waiting time for an aftershock to occur for that component with γˆ1 = 394 for Fiji and γˆ2 = 566 for Vanuatu. Future earthquake occurrences can be predicted by utilizing the fitted model and the prediction procedures developed in Section 5. The performance of the predictive simulations are assessed by comparing the predictions with the observed earthquake occurrences over the prediction interval under consideration. The simulation based approach is used to predict considerable sized earthquakes in the areas of Fiji and Vanuatu from 01/01/2016 until 30/06/2017, the eighteen months directly proceeding the censoring time from fitting the model. Using the identified model with η22 set equal to zero, as it is not significantly different from zero and the predictive simulations require non-negative branching ratios, we simulate 10,000 realizations of earthquake occurrences conditional on the earthquake times and types by the censoring time. The point wise median and lower and upper 2.5 percentiles of the simulated paths of the process together with the actual count is presented in Figure 4 for both Fiji and Vanuatu as well as the combined count. It is observed that the sample paths for Fiji falls well within the prediction interval for the entire period and the median tends to track the observed path quite well. In Vanuatu we see an unusually large number of earthquakes occurring over this period, however it still falls within our prediction interval by the end of the prediction window. The total number of counts is comfortably within the prediction interval for the entire eighteen months. The waiting time until the next earthquake occurrence can be analyzed using the predictive density function. The probability that an earthquake occurs with the next 20, 40, 60 and 80 day period is given by 65.72%, 87.18%, 94.91% and 97.88% with the next earthquake actually occurring in Fiji in only 17.77 days. We compared the predictive density for the next event time with the simulations and found quite good agreement between the pre-

24

700 660 620

1150

observed predicted

1100

Jan 2013

Jan 2016

Jul 2017

Date

Jan 2016

Jul 2017

430 400

No. of Events

Jan 2013

460

Vanuatu Counts 1050

No. of Events

Fiji Counts No. of Events

Observed and Predicted Event Counts

Jan 2013

Date

Jan 2016

Jul 2017

Date

Figure 4: Actual and predicted earthquake occurrences. The solid curve is the actual earthquake counts, and the dashed curves show the point predictions and 95% prediction intervals at different time points. dictive density and the histogram of the simulated next event time, although there is some disparity due to the simulations being truncated at eighteen months. One other quantity of interest is to predict the location of the next earthquake. Extracting the location of the first earthquake from the 10,000 simulations reveals that the first earthquake occurred in Fiji on 60.68% of all realizations which suggest that the next earthquake is most likely to occur in Fiji which again agrees with the actual data.

7.2

Modeling trade-throughs using bivariate RHawkes processes

Market participants typically attempt to hide or minimize their market impact by submitting orders based on the liquidity available in the order book. Instead of executing large orders and revealing their intentions to the market, traders typically split and restrict the size of their order to the quantity available at the best limit price. This ensures that the price does not change unfavorably against them and thereby controls to some extent the 25

market impact of their order. However in some instances the speed of execution exceeds the cost of the market impact and so large orders are submitted with quantities greater than what is available at the first limit. Such transactions are termed trade-throughs. A trade-through is a transaction that occurs at least at the second level of limit orders in an order book and hence provides valuable information about price dynamics and market micro-structure. Empirical studies highlight that trade-throughs tend to occur in clusters and so selfexciting processes are a natural choice to model this phenomenon. Pomponio and Abergel (2013) examine the clustering effect of trade-throughs by comparing the inter-event arrival time of the next trade-through for the stock BNP Paribas. A clustering effect is evident when the next trade-through arrives faster after a trade-through than after any regular trade. To see this, they computed the empirical arrival time distribution of the next tradethrough by conditioning on whether the current trade is a trade-through or any regular trade. The waiting time distribution until the next trade-through had a higher peak for shorter waiting times when the current trade is a trade-through. Further Muni Toke and Pomponio (2011) computed the mean waiting time between trade-throughs and found that by conditioning on the current trade being a trade-through the mean waiting time was only 36.9 seconds compared to 51.8 seconds if it was just any trade. This suggest that trade-throughs are generally more likely to be followed by another trade-through and occur more closely in time. They also show that there is no asymmetrical effect for the side of the book the trade-through occurred. Irrespective of the sign of the trade, the the mean waiting time is shorter if it was a trade-through rather than a trade. They alsoshow that the cross-excitation effects of trade-throughs is rather weak than same side clustering as the mean waiting time for a same side trade-through is shorter than a trade-through on the other side of the limit order book. In previous attempts to model trade-throughs, Muni Toke and Pomponio (2011) analyze the Thomson-Reuters tick-by-tick data of the Euronext-traded limit order book for the stock BNP Paribas (BNPP.PA) for the 109 trading days from 1st June 2010 to 29th October 2010. The data contains the timestamps, volume and price of the trades and the volume, price and side of the order book for the quotes. The Euronext Paris is open from 9am to

26

5:30 pm local time (07:00 to 15:30 GMT). For each trading day, they extract the series of timestamps (τiA )i≥1 and (τiB )i≥1 of trade-throughs for the ask and bid side of the limit book. The non-stationarity of trading throughout the day requires Muni Toke and Pomponio (2011) to only consider trades that occur between 9:30am to 11:30am local time where the number of trade-throughs during this period remains relatively constant throughout the period of analysis. They show that the bivariate Hawkes process with an exponentially decaying kernel is able to fit the majority of the two hour trading periods and show that the cross-influence of bid and ask trade-throughs is particularly weak. Motivated by their work we aim to model the same trade-throughs data for the stock BNP Paribas by using the bivariate RHawkes process. Instead of only analyzing the two hour window we aim to analyze the entire trading day. Typically opening trades tend to exhibit drastically different features than the rest of the trading day and so for this reason, transactions that occur during the first half an hour of the day from 9am to 9:30am are removed from the analysis. The analysis will consider trade-throughs occurring during the trading day from 9:30am to 5:30pm. For the 109 trading days, the mean number of tradesthroughs over this period was 756 with an average and standard deviation of 367 and 217 for the ask side and 389 and 204 for the bid side. The first and third quantiles are fairly comparable with 126 and 859 for the ask side and 140 and 860 for the bid side. In Figure 5 we present a plot of the expected inter-event waiting time between tradethroughs conditioned on the time of day that the trade-through occurred. The expectation is estimated using a cubic regression spline approach used by Engle and Russell (1998), where the knots are set at each hour of the trading day, with an extra knot at the middle of the last hour to account for the quickly changing level of trading activity near market close. The figure displays a clear diurnal pattern with the opening of the market being quite active with trade-throughs occurring roughly every 20 seconds. Activity then reduces in the middle of the day with trade-throughs only occurring every 60 to 70 seconds. Activity then picks up again prior to close with trade-throughs occurring roughly every 20 seconds again. The non-stationary nature of the arrival times of trade-throughs over a trading day is clearly evident, and so we apply a data transformation similar to that used by Engle and Russell (1998) to account for the level of trading activity, by discounting the observed

27

duration by a factor proportional to the corresponding expected duration subject to the constraint that the sum of the adjusted durations in a day is the same as the original

60 50 40 20

30

Expected Duration

70

durations.

9:30

10:30

11:30

12:30

1:30

2:30

3:30

4:30

5:30

Hour of Day

Figure 5: Nonparametric estimate of the daily pattern for trade-through durations. To model the adjusted point process we utilize Weibull renewal processes for the immigrant arrivals, and the offspring densities are chosen to be exponential with those for offspring events of the same type having a common shape parameter. The parameter estimates for the 109 trading days are found by directly minimizing the negative log-likelihood function and the results are presented in Tables 3. The table contains the mean, median, lower 2.5 percentile Q0.025 , upper 2.5 percentile Q0.975 and the standard deviation of the estimates. The parameters governing the bid and ask side of the process are fairly similar and so both sides of trade-throughs are displaying very similar features. The immigration process for both the bid and ask trade-throughs tend to exhibit heavy clustering and over-dispersion relative to a Poisson process with the estimated shape parameter of the Weibull distribution κ ˆ remaining below one on most trading days. If we take the median as an estimate for the true parameter then the mean waiting time between exogenous ask trade-through is 109.3 seconds while for bid side trade-through is 106.4. The branching ratio parameters for the self- and cross-excitation are fairly similar on either side of the 28

book with the median value for the cross-exciting branching ratio being roughly a third of the self-exciting branching ratio.

κA βA κB βB γA γB ηAA ηAB ηBA ηBB

Mean Median 0.801 0.816 108.8 97.55 0.797 0.806 95.91 86.99 22.75 0.0257 18.17 0.0228 0.120 0.113 0.0543 0.0342 0.0446 0.0269 0.131 0.121

Q0.025 Q0.975 0.320 1.006 18.82 251.8 0.474 0.990 23.70 220.7 0.00919 294.4 0.00903 119.1 0.0293 0.220 0.00544 0.329 0.00633 0.229 0.0688 0.247

StdDev 0.148 61.47 0.130 50.74 126.7 134.2 0.0674 0.0965 0.0711 0.0569

Table 3: Statistics summary for the maximum-likelihood estimates for the diurnally adjusted BNP Paribas trade-throughs data. To assess the quality of the fit to the data, the Rosenblatt residuals were computed. At the 1% level, the bivariate RHawkes model passed the K-S test of uniformity on 80.65% of the trading days. For comparison we also fit the bivariate Hawkes process to the transformed data with exponential offspring densities. At the 1% level the K-S test of uniformity was passed on only 36.70% of the trading. daysFurther we assess the goodness-of-fit of the event type distribution with the aid of the universal residuals. The residuals are assessed for uniformity on the interval [0, 1] and are found to pass 74 (67.89%) of all cases at the 1% level which suggest the model is able to adequately model the distribution of the side on which a trade-through occurs. This suggests that the Weibull MRHawkes model is providing a better fit to the data than the classical multivariate Hawkes model. To ascertain the need for the bivariate renewal Hawkes process we would like to analyze the need for cross-excitation effects and also whether we depart from Poisson immigration. To tackle the first question we compute the z-score for the cross-exciting branching ratios ηAB and ηBA under the assumption that the cross-exciting effects are zero. Assuming the parameter estimates are asymptotically normal which is suggested by the simulation study, we compare the z-score to the value 1.96. The parameters ηAB and ηBA are statistically different from zero for 92 (84.40%) and 87 (79.82%) trading days. Figure 6 present the plot of the z-scores across the 109 trading days where the top panel is the influence of 29

bid trade-throughs on ask trade-throughs ηAB and the bottom panel is the opposite effect. Although the cross-excitation exist, similar to Muni Toke and Pomponio (2011) the cross influence effect between the two sides of the market tends to be relatively small compare to

4 2 4 2 0

z−score

6

0

z−score

6

8

the self-exciting effects although this is not always the case. To answer the second question

01 Jun

21 Jun

12 Jul

02 Aug

23 Aug

13 Sep

04 Oct

29 Oct

Figure 6: Time series plot of the z-scores for the cross-exciting branching ratios across the 109 trading days for the stock BNP Paribas. Top panel: the influence of bid trade-throughs on the ask side of the market. Bottom panel: the influence of ask trade-throughs on the bid side of the market. we need to examine the shape parameter of the Weibull renewal distribution for both sides of the market. Figure 7 presents the time series plot of the shape parameter κ ˆ for both sides of the market together with a shaded 95% confidence interval. The value of κA and κB are mostly different from one, with the 95% confidence intervals not containing the value one 76.15% and 80.73% of the time. This suggest that we depart from Poisson immigration for the arrival of bid and ask trade-throughs. Supplementary Materials The R code implementing the proposed methods to simulate MRHawkes processes, to calculate the likelihood, to calculate the Rosenblatt residuals, and to make predictions, is contained in the R package “MRHawkes”. The R script file “fivaqks.R” contains the code to analyze the earthquake data while trade-throughs contains the code to analyze the trade-through data. See the text file “readme.txt” for a description 30

1.0 0.6 0.2

0.6

1.0

0.2

^A κ ^B κ

01 Jun

21 Jun

12 Jul

02 Aug

23 Aug

13 Sep

04 Oct

29 Oct

Figure 7: Time series plot of the MLEs for the shape parameters of the two Weibull immigration parameters over the 109 trading days 01/06/10 - 29/10/10. Solid curve: MLE; shaded region: point-wise 95% confidence intervals. of the other files. (mrhmleSupp.tar.gz)

References Bacry, E., S. Delattre, M. Hoffmann, and J. Muzy (2013). Some limit theorems for hawkes processes and application to financial statistics. Stochastic Processes and their Applications 123 (7), 2475 – 2499. A Special Issue on the Occasion of the 2013 International Year of Statistics. Bowsher, C. G. (2007). Modelling security market events in continuous time: Intensity based, multivariate point process models. Journal of Econometrics 141 (2), 876 – 912. Brockwell, A. (2007). Universal residuals: A multivariate transformation. Statistics & probability letters 77 (14), 1473–1478. Chen, F. and P. Hall (2013, 12). Inference for a nonstationary self-exciting point process with an application in ultra-high frequency financial data modeling. Probab. 50 (4), 1006–1024.

31

J. Appl.

Chen, F. and T. Stindl (2017). Direct likelihood evaluation for the renewal hawkes process. Journal of Computational and Graphical Statistics 0 (ja), 0–0. Daley, D. J. and D. Vere-Jones (2003). An Introduction to the Theory of Point Processes Volume I: Elementary Theory and Methods (2nd ed.). New York: Springer-Verlag. Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39 (1), 1–38. Embrechts, P., T. Liniger, and L. Lin (2011). Multivariate hawkes processes: an application to financial data. Journal of Applied Probability 48 (A), 367378. Engle, R. F. and J. R. Russell (1998). Autoregressive conditional duration: A new model for irregularly spaced transaction data. Econometrica 66 (5), 1127–1162. Godoy, B. I., V. Solo, J. Min, and S. A. Pasha (2016, March). Local likelihood estimation of time-variant hawkes models. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4199–4203. Hawkes, A. G. (1971). Spectra of some self-exciting and mutually exciting point processes. Biometrika 58 (1), 83–90. Hawkes, A. G. and D. Oakes (1974). A cluster process representation of a self-exciting process. Journal of Applied Probability 11 (3), 493–503. Mohler, G. O., M. B. Short, P. J. Brantingham, F. P. Schoenberg, and G. E. Tita (2011). Self-exciting point process modeling of crime. Journal of the American Statistical Association 106 (493), 100–108. Muni Toke, I. and F. Pomponio (2011). Modelling trades-through in a limited order book using hawkes processes. Economics: The Open-Access, Open-Assessment E-Journal 6, 1–23. Pomponio, F. and F. Abergel (2013). Multiple-limit trades: empirical facts and application to leadlag measures. Quantitative Finance 13 (5), 783–793. R Core Team (2016). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Rosenblatt, M. (1952). Remarks on a multivariate transformation. The Annals of Mathematical Statistics 23 (3), 470–472.

32

Roueff, F., R. Von Sachs, and L. Sansonnet (2016, June). Locally stationary Hawkes processes. Stochastic Processes and their Applications 126 (6), Pages 1710–1743. Wheatley, S., V. Filimonov, and D. Sornette (2016). The hawkes process with renewal immigration & its estimation with an em algorithm. Computational Statistics & Data Analysis 94 (C), 120–135.

33