On Asymptotic Inference in Stochastic Differential Equations with Time ...

1 downloads 0 Views 948KB Size Report
Oct 13, 2017 - asymptotic normality of the corresponding posterior distribution. ... (2016b), but their goal was to develop asymptotic theory of Bayes factors for.
On Asymptotic Inference in Stochastic Differential Equations with Time-Varying Covariates Trisha Maitra and Sourabh Bhattacharya∗ Abstract

arXiv:1605.03330v2 [math.ST] 13 Oct 2017

In this article, we introduce a system of stochastic differential equations (SDEs) consisting of time-dependent covariates and consider both fixed and random effects set-ups. We also allow the functional part associated with the drift function to depend upon unknown parameters. In this general set-up of SDE system we establish consistency and asymptotic normality of the M LE through verification of the regularity conditions required by existing relevant theorems. Besides, we consider the Bayesian approach to learning about the population parameters, and prove consistency and asymptotic normality of the corresponding posterior distribution. We supplement our theoretical investigation with simulated and real data analyses, obtaining encouraging results in each case. Keywords: Asymptotic normality; Functional data; Gibbs sampling; Maximum likelihood estimator; Posterior consistency; Random effects.

1

Introduction

Systems of stochastic differential equations (SDEs) are appropriate for modeling situations where “within” subject variability is caused by some random component varying continuously in time; hence, SDE systems are also appropriate for modeling functional data (see, for example, Zhu et al. (2011), Ramsay and Silverman (2005) for some connections between SDE and functional data analysis). When suitable time-varying covariates are available, it is then appropriate to incorporate such information in the SDE system. Some examples of statistical applications of SDE-based models with time-dependent covariates are Oravecz et al. (2011), Overgaard et al. (2005), Leander et al. (2015). However, systems of SDE based models consisting of time-varying covariates seem to be rare in the statistical literature, in spite of their importance, and their asymptotic properties are hitherto unexplored. Indeed, although asymptotic inference in single, fixed effects SDE models without covariates has been considered in the literature as time tends to infinity (see, for example, Bishwal (2008)), asymptotic theory in systems of SDE models is rare, and so far only random effects SDE systems without covariates have been considered, as n, the number of subjects (equivalently, the number of SDEs in the system), tends to infinity (Delattre et al. (2013), Maitra and Bhattacharya (2016c), Maitra and Bhattacharya (2015)). Such models are of the following form: dXi (t) = b(Xi (t), φi )dt + σ(Xi (t))dWi (t),

with Xi (0) = xi , i = 1, . . . , n,

(1.1)

where, for i = 1, . . . , n, Xi (0) = xi is the initial value of the stochastic process Xi (t), which is assumed to be continuously observed on the time interval [0, Ti ]; Ti > 0 assumed to be known. The function b(x, φ), which is the drift function, is a known, real-valued function on R × Rd (R is the real line and d is the dimension), and the function σ : R 7→ R is the known diffusion coefficient. The SDEs given by (1.1) are driven by independent standard Wiener processes {Wi (·); i = 1, . . . , n}, and {φi ; i = 1, . . . , n}, which are to be interpreted as the random effect parameters associated with the n individuals, which are assumed by Delattre et al. (2013) to be independent of the Brownian motions and independently and identically distributed (iid) random variables with some common distribution. For the sake of convenience Delattre et al. (2013) (see also Maitra and Bhattacharya (2016c) and Maitra and Bhattacharya (2015)) assume b(x, φi ) = φi b(x). Thus, the random effect is a multiplicative factor of the drift function; also, the function b(x) is assumed to be independent of parameters. In this article, we generalize the multiplicative factor to include time-dependent covariates; we also allow b(x) to depend upon unknown parameters. ∗

Trisha Maitra is a PhD student and Sourabh Bhattacharya is an Associate Professor in Interdisciplinary Statistical Research Unit, Indian Statistical Institute, 203, B. T. Road, Kolkata 700108. Corresponding e-mail: [email protected].

1

Notably, such model extension has already been provided in Maitra and Bhattacharya (2016a) and Maitra and Bhattacharya (2016b), but their goal was to develop asymptotic theory of Bayes factors for comparing systems of SDEs, with or without time-varying covariates, emphasizing, when time-varying covariates are present, simultaneous asymptotic selection of covariates and part of the drift function free of covariates, using Bayes factors. In this work, we deal with parametric asymptotic inference, both frequentist and Bayesian, in the context of our extended system of SDEs. We consider, separately, fixed effects as well as random effects. The fixed effects set-up ensues when coefficients associated with the covariates are the same for all the subjects. On the other hand, in the random effects set-up, the subject-wise coefficients are assumed to be a random sample from some distribution with unknown parameters. It is also important to distinguish between the iid situation and the independent but non-identical case (we refer to the latter as non-iid) that we consider. The iid set-up is concerned with the case where the initial values xi and time limit Ti are the same for all i, and the coefficients associated with the covariates are zero, that is, there are no covariates suitable for the SDE-based system. This set-up, however, does not reduce to the iid set-up considered in Delattre et al. (2013), Maitra and Bhattacharya (2016c) and Maitra and Bhattacharya (2015) because in the latter works b(x) was assumed to be free of parameters, while in this work we allow this function to be dependent on unknown parameters. The non-iid set-up assumes either or both of the following: presence of appropriate covariates and that xi and Ti are not the same for all the subjects. In the classical paradigm, we investigate consistency and asymptotic normality of the maximum likelihood estimator (M LE) of the unknown parameters which we denote by θ, and in the Bayesian framework we study consistency and asymptotic normality of the Bayesian posterior distribution of θ. In other words, we consider prior distributions π(θ) of θ and study the properties of the corresponding posterior Q π(θ) ni=1 fi (Xi |θ) Qn πn (θ|X1 , . . . , Xn ) = R (1.2) i=1 fi (Xi |ψ)dψ ψ∈Θ π(ψ) as the sample size n tends to infinity. Here fi (·|θ) is the density corresponding to the i-th individual and Θ is the parameter space. In what follows, after introducing our model and the associated likelihood in Section 2, we investigate asymptotic properties of M LE in the iid and non-iid contexts in Sections 3 and 4 respectively. Then, in Sections 5 and 6 we investigate asymptotic properties of the posterior in the iid and non-iid cases, respectively. In Section 7 we consider the random effects set-up and provide necessary discussion to point towards validity of the corresponding asymptotic results. We demonstrate the applicability of our developments to practical and finite-sample contexts using simulated and real data analyses in Sections 8 and 9, respectively. We summarize our contribution and provide further discussion in Section 10. a.s. P L Notationally, “ → ”, “→” and “→” denote convergence “almost surely”, “in probability” and “in distribution”, respectively.

2

The SDE set-up

We consider the following system of SDE models for i = 1, 2, . . . , n: dXi (t) = φi,ξ (t)bβ (Xi (t))dt + σ(Xi (t))dWi (t)

(2.1)

where Xi (0) = xi is the initial value of the stochastic process Xi (t), which is assumed to be continuously observed on the time interval [0, Ti ]; Ti > 0 for all i and assumed to be known. In the above, φi,ξ is the parametric function consisting of the covariates and the unknown coefficients ξ associated with the covariates, and bβ is a parametric function known up to the parameters β.

2

2.1

Incorporation of time-varying covariates

We assume that φi,ξ (t) has the following form: φi,ξ (t) = φi,ξ (z i (t)) = ξ0 + ξ1 g1 (zi1 (t)) + ξ2 g2 (zi2 (t)) + · · · + ξp gp (zip (t)),

(2.2)

where ξ = (ξ0 , ξ1 , . . . , ξp ) is a set of real constants and z i (t) = (zi1 (t), zi2 (t), . . . , zip (t)) is the set of available covariate information corresponding to the i-th individual, depending upon time t. We assume that z i (t) is continuous in t, zil (t) ∈ Z l where Z l is compact and gl : Z l → R is continuous, for l = 1, . . . , p. Let Z = Z 1 × · · · × Z p . We let Z = {z(t) ∈ Z : t ∈ [0, ∞) such that z(t) is continuous in t} . Hence, z i ∈ Z for all i. The function bβ is multiplicative part of the drift function free of the covariates. Note that ξ consists of p + 1 parameters. Assuming that β ∈ Rq , where q ≥ 1, it follows that our parameter set θ = (β, ξ) belongs to the (p + q + 1)-dimensional real space Rp+q+1 . The true parameter set is denoted by θ 0 .

2.2

Likelihood

We first define the following quantities: Z Ui,θ = 0

Ti

φi,ξ (s)bβ (Xi (s)) dXi (s), σ 2 (Xi (s))

Z

Ti

Vi,θ = 0

φ2i,ξ (s)b2β (Xi (s)) σ 2 (Xi (s))

ds

(2.3)

for i = 1, . . . , n. Let C Ti denote the space of real continuous functions (x(t), t ∈ [0, Ti ]) defined on [0, Ti ], endowed with the σ-field CTi associated with the topology of uniform convergence on [0, Ti ]. We consider the distribution P xi ,Ti ,zi on (CTi , CTi ) of (Xi (t), t ∈ [0, Ti ]) given by (2.1) We choose the dominating measure Pi as the distribution of (2.1) with null drift. So,   Vi,θ dP xi ,Ti ,zi . (2.4) = fi (Xi |θ) = exp Ui,θ − dPi 2

3

Consistency and asymptotic normality of M LE in the iid set-up

In the iid set up we have xi = x and Ti = T for all i = 1, . . . , n. Moreover, the covariates are absent, that is, ξi = 0 for i = 1, . . . , p. Hence, the resulting parameter set in this case is θ = (β, ξ0 ).

3.1

Strong consistency of M LE

Consistency of the M LE under the iid set-up can be verified by validating the regularity conditions of the following theorem (Theorems 7.49 and 7.54 of Schervish (1995)); for our purpose we present the version for compact parameter space. Theorem 1 (Schervish (1995)) Let {Xn }∞ n=1  be conditionally iid given θ with density f1 (x|θ) with respect to a measure ν on a space X 1 , B 1 . Fix θ 0 ∈ Θ, and define, for each M ⊆ Θ and x ∈ X 1 , Z(M, x) = inf log ψ∈M

f1 (x|θ 0 ) . f1 (x|ψ)

Assume that for each θ 6= θ 0 , there is an open set Nθ such that θ ∈ Nθ and that Eθ0 Z(Nθ , Xi ) > −∞. ˆ n is the M LE of θ Also assume that f1 (x|·) is continuous at θ for every θ, a.s. [Pθ0 ]. Then, if θ ˆ corresponding to n observations, it holds that lim θ n = θ 0 , a.s. [Pθ0 ]. n→∞

3

3.1.1

Assumptions

We assume the following conditions: (H1) The parameter space Θ = B × Γ such that B and Γ are compact. (H2) bβ (·) and σ(·) are C 1 (differentiable with continuous first derivative) on R and satisfy b2β (x) ≤ K1 (1 + x2 + kβk2 ) and σ 2 (x) ≤ K2 (1 + x2 ) for all x ∈ R, for some K1 , K2 > 0. Now, due to (H1) the latter boils down to assuming b2β (x) ≤ K(1 + x2 ) and σ 2 (x) ≤ K(1 + x2 ) for all x ∈ R, for some K > 0. We further assume: (H3) For every x, let bβ be continuous in β = (β1 , . . . , βq ) and moreover, for j = 1, . . . , q, ∂bβ (x) ∂βj sup 2 ≤ c (1 + |x|γ ) , σ (x) β∈B for some c > 0 and γ ≥ 0. (H4) b2β (x) σ 2 (x)

 ≤ Kβ 1 + x2 + kβk2 ,

(3.1)

where Kβ is continuous in β. 3.1.2

Verification of strong consistency of M LE in our SDE set-up

To verify the conditions of Theorem 1 in our case, note that assumptions (H1) – (H4) clearly imply continuity of the density f1 (x|θ) in the same way as the proof of Proposition 2 of Delattre et al. (2013). It follows that Uθ and Vθ are continuous in θ, the property that we use in our proceedings below. Now consider, f1 (X|θ 0 ) Z(Nθ , X) = inf log θ 1 ∈Nθ f1 (X|θ 1 )     Vθ 0 Vθ 1 − inf Uθ1 − = Uθ0 − θ 1 ∈Nθ 2 2     Vθ Vθ ≥ Uθ0 − 0 − inf Uθ1 − 1 ¯θ 2 2 θ 1 ∈N     Vθ∗1 (X) Vθ 0 = Uθ0 − − Uθ∗1 (X) − , 2 2

(3.2)

¯ where Nθ is an appropriate open  subset ofthe relevant compact parameter space, and Nθ is a closed Vθ1 ¯ θ due to continuity of Uθ subset of Nθ . The infimum of Uθ1 − 2 is attained at θ ∗1 = θ ∗1 (X) ∈ N and Vθ in θ. ˘θ . From Theorem 5 of Maitra and Bhattacharya (2016c) it Let Eθ0 (Vθ1 ) = V˘θ1 and Eθ0 (Uθ1 ) = U 1

4

follows that the above expectations are continuous in θ 1 . Using this we obtain     Vθ∗1 (X)=ϕ1 Vθ∗1 (X) = Eθ∗1 (X)|θ0 EX|θ∗1 (X)=ϕ1 ,θ0 Uθ∗1 (X)=ϕ1 − Eθ0 Uθ∗1 (X) − 2 2 ! ˘ ˘ϕ − Vϕ1 = Eθ∗1 (X)|θ0 U 1 2 !# " ˘ϕ V 1 ˘ϕ − ≤ Eθ∗1 (X)|θ0 sup U 1 2 ¯θ ϕ1 ∈N ! V˘ ∗ ˘ϕ∗ − ϕ1 = Eθ∗1 (X)|θ0 U 1 2 ! V˘ ∗ ˘ϕ∗ − ϕ1 , = U 1 2 where

ϕ∗1

¯ θ is where the supremum of ∈N

the last step (3.3) follows.  Noting that Eθ0 Uθ0 −

Vθ 0 2



 ˘ϕ − U 1

 ˘ϕ∗ − and U 1

V˘ϕ∗ 1

2

V˘ϕ1 2



(3.3)

is achieved. Since ϕ∗1 is independent of X,

 are finite due to Lemma 1 of Maitra and Bhat-

ˆ n a.s. tacharya (2016a), it follows that Eθ0 Z(Nθ , X) > −∞. Hence, θ → θ 0 [Pθ0 ], as n → ∞. We summarize the result in the form of the following theorem: Theorem 2 Assume the iid setup and conditions (H1) – (H4). Then the M LE is strongly consistent in ˆ n a.s. the sense that as n → ∞, θ → θ 0 [Pθ0 ].

3.2

Asymptotic normality of M LE

To verify asymptotic normality of M LE we invoke the following theorem provided in Schervish (1995) (Theorem 7.63): Theorem 3 (Schervish (1995)) Let Θ be a subset of Rp+q+1 , and let {Xn }∞ n=1 be conditionally iid P ˆ n be an M LE. Assume that θ ˆ n → θ under Pθ for all θ. Assume given θ each with density f1 (·|θ). Let θ that f1 (x|θ) has continuous second partial derivatives with respect to θ and that differentiation can be passed under the integral sign. Assume that there exists Hr (x, θ) such that, for each θ 0 ∈ int(Θ) and each k, j, ∂2 ∂2 sup log f1 (x|θ 0 ) − log f1 (x|θ) ≤ Hr (x, θ 0 ), (3.4) ∂θk ∂θj kθ−θ 0 k≤r ∂θk ∂θj with lim Eθ0 Hr (X, θ 0 ) = 0.

r→0

Assume that the Fisher information matrix I(θ) is finite and non-singular. Then, under Pθ0 ,   √  L ˆ n − θ0 → n θ N 0, I −1 (θ 0 ) . 3.2.1

Assumptions

Along with the assumptions (H1) – (H4), we further assume the following: (H5) The true value θ 0 ∈ int (Θ). (H6) The Fisher’s information matrix I(θ) is finite and non-singular, for all θ ∈ Θ. 5

(3.5)

(3.6)

h i h i 2 = ∂β∂k ∂βl bβ (x) for k, l = 1, . . . , q, (H7) Letting b0β (x) = ∂β∂ k bβ (x) for k = 1, . . . , q; b00β (x) kl h i k ∂3 = and b000 (x) b (x) for k, l, m = 1, . . . , q, there exist constants 0 < c < ∞, 0 < β β ∂βk ∂βl ∂βm klm γ1 , γ2 , γ3 , γ4 ≤ 1 such that for each combination of k, l, m = 1, . . . , q, for any β 1 , β 2 ∈ Rq , for all x ∈ R, bβ (x) − bβ (x) ≤ c k β 1 − β 2 kγ1 ; 1 2 h h i i 0 bβ1 (x) − b0β2 (x) ≤ c k β 1 − β 2 kγ2 ; k k h h i i 00 00 bβ1 (x) − bβ2 (x) ≤ c k β 1 − β 2 kγ3 ; kl kl h h i i 000 000 − b (x) b (x) ≤ c k β 1 − β 2 kγ4 . β1 β2 klm

klm

3.2.2

Verification of the above regularity conditions for asymptotic normality in our SDE set-up P

ˆ n has been established. Hence, θ ˆ n → θ under In Section 3.1.2 almost sure consistency of the M LE θ Pθ for all θ. With assumptions (H1)–(H4), (H7), Theorem B.4 of Rao (2013) and the dominated convergence theorem, interchangability of differentiation and integration in case of stochastic integration and usual integration respectively can be assured, from which it can be easily deduced that differentiation can be passed under the integral sign, as required by Theorem 3. With the same arguments, it follows that in 2 our case ∂θ∂k ∂θj log f1 (x|θ) is differentiable in θ = (β, ξ0 ), and the derivative has finite expectation due to compactness of the parameter space and (H7). Hence, (3.4) and (3.5) clearly hold. In other words, asymptotic normality of the M LE, of the form (3.6), holds in our case. Formally, Theorem 4 Assume the iid setup and conditions (H1) – (H7). Then, as n → ∞, the M LE is asymptotically normally distributed as (3.6).

4

Consistency and asymptotic normality of M LE in the non-iid set-up

We now consider the case where the processes Xi (·); i = 1, . . . , n, are independently, but not identically distributed. In this case, ξ = (ξ0 , ξ1 , . . . , ξp ) where at least one of the coefficients ξ1 , . . . , ξp is non-zero, guaranteeing the presence of at least one time-varying covariate. Hence, in this set-up θ = (β, ξ). Moreover, following Maitra and Bhattacharya (2016c), Maitra and Bhattacharya (2015) we allow the initial values xi and the time limits Ti to be different for i = 1, . . . , n, but assume that the sequences {T1 , T2 , . . .} and {x1 , x2 , . . .} are sequences entirely contained in compact sets T and X, respectively. Compactness ensures that there exist convergent subsequences with limits in T and X; for notational convenience, we continue to denote the convergent subsequences as {T1 , T2 , . . .} and {x1 , x2 , . . .}. Thus, let the limts be T ∞ ∈ T and x∞ ∈ X. Henceforth, we denote the process associated with the initial value x and time point t as X(t, x) and so for x ∈ X and T ∈ T, we let Z T φξ bβ (X(s, x)) Uθ (x, T, z) = dX(s, x); (4.1) σ 2 (X(s, x)) 0 Z T φξ b2β (X(s, x)) Vθ (x, T, z) = ds. (4.2) σ 2 (X(s, x)) 0 Clearly, Uθ (xi , Ti , z i ) = Ui,θ and Vθ (xi , Ti , z i ) = Vi,θ , where Ui,θ and Vi,θ are given by (2.3). In this non-iid set-up we assume, following Maitra and Bhattacharya (2016a), that

6

(H8) For l = 1, . . . , p, and for t ∈ [0, Ti ], n

1X gl (zil (t)) → cl (t); n

(4.3)

1X gl (zil (t))gm (zim (t)) → cl (t)cm (t), n

(4.4)

i=1

and, for l, m = 1, . . . , p; t ∈ [0, Ti ], n

i=1

as n → ∞, where cl (t) are real constants. For x = xk , T = Tk and z = z k , we denote the Kullback-Leibler distance and the Fisher’s information as Kk (θ 0 , θ) (Kk (θ, θ 0 )) and Ik (θ), respectively. Then the following results hold in the same way as Lemma 11 of Maitra and Bhattacharya (2016a). Lemma 5 Assume the non-iid set-up, (H1) – (H4) and (H8). Then for any θ ∈ Θ, Pn k=1 Kk (θ 0 , θ) lim = K(θ 0 , θ); n→∞ n Pn k=1 Kk (θ, θ 0 ) = K(θ, θ 0 ); lim n→∞ Pnn k=1 Ik (θ) lim = I(θ), n→∞ n

(4.5) (4.6) (4.7)

where the limits K(θ 0 , θ), K(θ, θ 0 ) and I(θ) are well-defined Kullback-Leibler divergences and Fisher’s information, respectively. Lemma 5 will be useful in our asymptotic investigation in the non-iid set-up. In this set-up, we first investigate consistency and asymptotic normality of M LE using the results of Hoadley (1971).

4.1

Consistency and asymptotic normality of M LE in the non-iid set-up

Following Hoadley (1971) we define the following: Ri (θ) = log

fi (Xi |θ) fi (Xi |θ 0 )

=0

if fi (Xi |θ 0 ) > 0 otherwise.

(4.8)

Ri (θ, ρ) = sup {Ri (ψ) : kψ − θk ≤ ρ}

(4.9)

Vi (r) = sup {Ri (θ) : kθk > r} .

(4.10)

Following Hoadley (1971) we denote by ri (θ), ri (θ, ρ) and vi (r) toPbe expectations of Ri (θ), Ri (θ, ρ) and Vi (r) under θ 0 ; for any sequence {ai ; i = 1, 2, . . .} we denote ni=1 ai /n by a ¯n . P ˆn → Hoadley (1971) proved that if the following regularity conditions are satisfied, then the MLE θ θ0 : (1) Θ is a closed subset of Rp+q+1 . (2) fi (Xi |θ) is an upper semicontinuous function of θ, uniformly in i, a.s. [Pθ0 ]. (3) There exist ρ∗ = ρ∗ (θ) > 0, r > 0 and 0 < K ∗ < ∞ for which 7

(i) Eθ0 [Ri (θ, ρ)]2 ≤ K ∗ ,

0 ≤ ρ ≤ ρ∗ ;

(ii) Eθ0 [Vi (r)]2 ≤ K ∗ . (4)

(i) lim r¯n (θ) < 0, n→∞

θ 6= θ 0 ;

(ii) lim v¯n (r) < 0. n→∞

(5) Ri (θ, ρ) and Vi (r) are measurable functions of Xi . Actually, conditions (3) and (4) can be weakened but these are more easily applicable (see Hoadley (1971) for details). 4.1.1

Verification of the regularity conditions

Since Θ is compact in our case, the first regularity condition clearly holds. For the second regularity condition, note that given Xi , fi (Xi |θ) is continuous by our assumptions (H1) – (H4), as already noted in Section 3.1.2; in fact, uniformly continuous in θ in our case, since Θ is compact. Hence, for any given  > 0, there exists δi () > 0, independent of θ, such that kθ 1 − θ 2 k < δi () implies |f (Xi |θ 1 ) − f (Xi |θ 2 )| < . Now consider a strictly positive function δx,T (), continuous in x ∈ X and T ∈ T, such that δxi ,Ti () = δi (). Let δ() = inf δx,T (). Since X and T are compact, x∈X,T ∈T

it follows that δ() > 0. Now it holds that kθ 1 − θ 2 k < δ() implies |f (Xi |θ 1 ) − f (Xi |θ 2 )| < , for all i. Hence, the second regularity condition is satisfied. Let us now focus attention on condition (3)(i). Vi,θ Vi,θ0 − Ui,θ0 + 2 2 Vi,θ Vi,θ0 ≤Ui,θ + − Ui,θ0 + . (4.11) 2 2  Let us denote ψ ∈ Rp+q+1 : kψ − θk ≤ ρ by S(ρ, θ). Here 0 < ρ < ρ∗ (θ), and ρ∗ (θ) is so small that S(ρ, θ) ⊂ Θ for all ρ ∈ (0, ρ∗ (θ)). It then follows from (4.11) that   Vi,θ0 Vi,θ − Ui,θ0 + . (4.12) sup Ri (ψ) ≤ sup Ui,θ + 2 2 ψ∈S(ρ,θ) θ∈S(ρ,θ) Ri (θ) =Ui,θ −

The supremums in (4.12) are finite due to compactness of S(ρ, θ). Let the supremum be attained at some θ ∗ where θ ∗ = θ ∗ (Xi ). Then, the expectation of the square of the upper bound can be calculated in the ¯θ in this case will be S(ρ, θ). Since under Pθ , finiteness of moments of same way as (3.3) noting that N 0 all orders of each term in the upper bound is ensured by Lemma 10 of Maitra and Bhattacharya (2016a), it follows that Eθ0 [Ri (θ, ρ)]2 ≤ Ki (θ), (4.13) where Ki (θ) = K(xi , Ti , z i , θ), with K(x, T, z, θ) being a continuous function of (x, T, z, θ), continuity being again a consequence of Lemma 10 of Maitra and Bhattacharya (2016a). Since because of compactness of X, T and Θ, Ki (θ) ≤

K(x, T, z, θ) < ∞,

sup x∈X,T ∈T,z∈Z,θ∈Θ

regularity condition (3)(i) follows. To verify condition (3)(ii), first note that we can choose r > 0 such that kθ 0 k < r and {θ ∈ Θ : kθk > r} 6= ∅. It then follows that sup Ri (θ) ≤ sup Ri (θ) for every i ≥ 1. The right hand {θ∈Θ:kθk>r}

θ∈Θ

side is bounded by the same expression as the right hand side of (4.12), with only S(ρ, θ) replaced with Θ. The rest of the verification follows in the same way as verification of (3)(i). 8

To verify condition (4)(i) note that by (4.5) Pn i=1 Ki (θ 0 , θ) lim r¯n = − lim = −K(θ 0 , θ) < 0 n→∞ n→∞ n

for θ 6= θ 0 .

(4.14)

In other words, (4)(i) is satisfied. Verification of (4)(ii) follows exactly in a similar way as verified in Maitra and Bhattacharya (2016c) except that the concerned moment existence result follows from Lemma 10 of Maitra and Bhattacharya (2016a). Regularity condition (5) is seen to hold by the same arguments as in Maitra and Bhattacharya (2016c). In other words, in the non-iid SDE framework, the following theorem holds: Theorem 6 Assume the non-iid SDE setup and conditions (H1) – (H4) and (H8). Then it holds that P ˆn → θ 0 , as n → ∞. θ

4.2

Asymptotic normality of M LE in the non-iid set-up

0 (x, θ) = Let ζi (x, θ) = log fi (x|θ); also, let ζi0 (x, θ) be the (p+q+1)×1 vector with k-th component ζi,k ∂ 00 00 ∂θk ζi (x, θ), and let ζi (x, θ) be the (p + q + 1) × (p + q + 1) matrix with (k, l)-th element ζi,kl (x, θ) = ∂2 ∂θk ∂θl ζi (x, θ).

For proving asymptotic normality in the non-iid framework, Hoadley (1971) assumed the following regularity conditions: (1) Θ is an open subset of Rp+q+1 . P ˆn → (2) θ θ0 .

(3) ζi0 (Xi , θ) and ζi00 (Xi , θ) exist a.s. [Pθ0 ]. (4) ζi00 (Xi , θ) is a continuous function of θ, uniformly in i, a.s. [Pθ0 ], and is a measurable function of Xi . (5) Eθ [ζi0 (Xi , θ)] = 0 for i = 1, 2, . . ..   (6) Ii (θ) = Eθ ζi0 (Xi , θ)ζi0 (Xi , θ)T = −Eθ [ζi00 (Xi , θ)], where for any vector y, y T denotes the transpose of y. ¯ ¯ (7) I¯n (θ) → I(θ) as n → ∞ and I(θ) is positive definite. 3 0 (8) Eθ0 ζi,k (Xi , θ 0 ) ≤ K2 , for some 0 < K2 < ∞. (9) There exist  > 0 and random variables Bi,kl (Xi ) such that n o 00 (i) sup ζi,kl (Xi , ψ) : kψ − θ 0 k ≤  ≤ Bi,kl (Xi ). (ii) Eθ0 |Bi,kl (Xi )|1+δ ≤ K2 , for some δ > 0. Condition (8) can be weakened but is relatively easy to handle. Under the above regularity conditions, Hoadley (1971) prove that   √  L ˆ n − θ0 → n θ N 0, I¯ −1 (θ 0 ) . (4.15)

9

4.2.1

Validation of asymptotic normality of M LE in the non-iid SDE set-up

Condition (1) holds also for compact Θ; see Maitra and Bhattacharya (2016c). Condition (2) is a simple consequence of Theorem 6. Conditions (3), (5) and (6) are clearly valid in our case because of interchangability of differentiation and integration, which follows due to (H1) – (H4), (H7) and Theorem B.4 of Rao (2013). Condition (4) can be verified in exactly the same way as condition (2) of Section 4.1 is verified; measurability of ζi00 (Xi , θ) follows due to its continuity with respect to Xi . Condition (7) simply follows from (4.7). Compactness, continuity, and finiteness of moments guaranteed by Lemma 10 of Maitra and Bhattacharya (2016a) imply conditions (8), (9)(i) and 9(ii). In other words, in our non-iid SDE case we have the following theorem on asymptotic normality. Theorem 7 Assume the non-iid SDE setup and conditions (H1) – (H8). Then (4.15) holds, as n → ∞.

5

Consistency and asymptotic normality of the Bayesian posterior in the iid set-up

5.1

Consistency of the Bayesian posterior distribution

As in Maitra and Bhattacharya (2015) here we exploit Theorem 7.80 presented in Schervish (1995), stated below, to show posterior consistency. Theorem 8 (Schervish (1995)) Let {Xn }∞ n=1  be conditionally iid given θ with density f1 (x|θ) with respect to a measure ν on a space X 1 , B 1 . Fix θ 0 ∈ Θ, and define, for each M ⊆ Θ and x ∈ X 1 , Z(M, x) = inf log ψ∈M

f1 (x|θ 0 ) . f1 (x|ψ)

Assume that for each θ 6= θ 0 , there is an open set Nθ such that θ ∈ Nθ and that Eθ0 Z(Nθ , Xi ) > −∞. Also assume that f1 (x|·) is continuous at θ for every θ, a.s. [Pθ0 ]. For  > 0, define C  = {θ : K1 (θ 0 , θ) < }, where   f1 (X1 |θ 0 ) K1 (θ 0 , θ) = Eθ0 log (5.1) f1 (X1 |θ) is the Kullback-Leibler divergence measure associated with observation X1 . Let π be a prior distribution such that π(C  ) > 0, for every  > 0. Then, for every  > 0 and open set N0 containing C  , the posterior satisfies lim πn (N0 |X1 , . . . , Xn ) = 1, a.s. [Pθ0 ]. (5.2) n→∞

5.1.1

Verification of posterior consistency

The condition Eθ0 Z(Nθ , Xi ) > −∞ of the above theorem is verified in the context of Theorem 1 in Section 3.1.2. Continuity of the Kullback-Liebler divergence follows easily from Lemma 10 of Maitra and Bhattacharya (2016a). The rest of the verification is the same as that of Maitra and Bhattacharya (2015). Hence, (5.2) holds in our case with any prior with positive, continuous density with respect to the Lebesgue measure. We summarize this result in the form of a theorem, stated below. Theorem 9 Assume the iid set-up and conditions (H1) – (H4). Let the prior distribution π of the parameter θ satisfy dπ dν = g almost everywhere on Θ, where g(θ) is any positive, continuous density on Θ with respect to the Lebesgue measure ν. Then the posterior (1.2) is consistent in the sense that for every  > 0 and open set N0 containing C  , the posterior satisfies lim πn (N0 |X1 , . . . , Xn ) = 1,

n→∞

10

a.s. [Pθ0 ].

(5.3)

5.2

Asymptotic normality of the Bayesian posterior distribution

As in Maitra and Bhattacharya (2015), we make use of Theorem 7.102 in conjunction with Theorem 7.89 provided in Schervish (1995). These theorems make use of seven regularity conditions, of which only the first four, stated below, will be required for the iid set-up. 5.2.1

Regularity conditions – iid case

(1) The parameter space is Θ ⊆ Rq+1 . (2) θ 0 is a point interior to Θ. (3) The prior distribution of θ has a density with respect to Lebesgue measure that is positive and continuous at θ 0 . (4) There exists a neighborhood N0 ⊆ Θ of θ 0 on which `n (θ) = log f (X1 , . . . , Xn |θ) is twice continuously differentiable with respect to all co-ordinates of θ, a.s. [Pθ0 ]. Before proceeding to justify asymptotic normality of our posterior, we furnish the relevant theorem below (Theorem 7.102 of Schervish (1995)). Theorem 10 (Schervish (1995)) Let {Xn }∞ n=1 be conditionally iid given θ. Assume the above four regularity conditions; also assume that there exists Hr (x, θ) such that, for each θ 0 ∈ int(Θ) and each k, j, ∂2 ∂2 sup log f1 (x|θ 0 ) − log f1 (x|θ) ≤ Hr (x, θ 0 ), (5.4) ∂θk ∂θj kθ−θ 0 k≤r ∂θk ∂θj with lim Eθ0 Hr (X, θ 0 ) = 0.

r→0

(5.5)

Further suppose that the conditions of Theorem 8 hold, and that the Fisher’s information matrix I(θ 0 ) ˆ n the M LE associated with n observations, let is positive definite. Now denoting by θ  ˆ n ) if the inverse and θ ˆ n exist −`00n (θ −1 Σn = (5.6) Iq+1 if not, where for any t, `00n (t)

 =

 ∂2 `n (θ) , ∂θi ∂θj θ=t

(5.7)

−1 and Iq+1 is the identity matrix of order   q + 1. Thus, Σn is the observed Fisher’s information matrix. −1/2 ˆ n , it follows that for each compact subset B of Rq+1 and each  > 0, Letting Ψn = Σn θ−θ

it holds that

 lim Pθ0

n→∞

 ˜ n ) >  = 0, sup πn (Ψn |X1 , . . . , Xn ) − φ(Ψ

(5.8)

Ψn ∈B

˜ denotes the density of the standard normal distribution. where φ(·) 5.2.2

Verification of posterior normality

Observe that the four regularity conditions of Section 5.2.1 trivially hold. The remaining conditions of Theorem 10 are verified in the context of Theorem 3 in Section 3.2.2. We summarize this result in the form of the following theorem.

11

Theorem 11 Assume the iid set-up and conditions (H1) – (H7). Let the prior distribution π of the parameter θ satisfy dπ tothe dν = g almost everywhere on Θ, where g(θ) is any density with respect  −1/2 ˆ Lebesgue measure ν which is positive and continuous at θ 0 . Then, letting Ψn = Σn θ − θ n , it follows that for each compact subset B of Rq+1 and each  > 0, it holds that   ˜ lim Pθ0 sup πn (Ψn |X1 , . . . , Xn ) − φ(Ψn ) >  = 0. n→∞

6

(5.9)

Ψn ∈B

Consistency and asymptotic normality of the Bayesian posterior in the non-iid set-up

For consistency and asymptotic normality in the non-iid Bayesian framework we utilize the result presented in Choi and Schervish (2007) and Theorem 7.89 of Schervish (1995), respectively.

6.1

Posterior consistency in the non-iid set-up

We consider the following extra assumption for our purpose. (H9) There exist strictly positive functions α1∗ (x, T, z, θ) and α2∗ (x, T, z, θ) continuous in (x, T, z, θ), such that for any (x, T, z, θ), Eθ [exp {α1∗ (x, T, z, θ)Uθ (x, T, z)}] < ∞, and Eθ [exp {α2∗ (x, T, z, θ)Vθ (x, T, z)}] < ∞, Now, let ∗ α1,min = ∗ α2,min =

inf

α1∗ (x, T, z, θ),

(6.1)

inf

α2∗ (x, T, z, θ)

(6.2)

x∈X,T ∈T,z∈Z,θ∈Θ

x∈X,T ∈T,z∈Z,θ∈Θ

and  ∗ ∗ α = min α1,min , α2,min , c∗ ,

(6.3)

where 0 < c∗ < 1/16. ∗ ∗ Compactness ensures that α1,min , α2,min > 0, so that 0 < α < 1/16. It also holds due to compactness that for θ ∈ Θ, sup Eθ [exp {αUθ (x, T, z)}] < ∞. (6.4) x∈X,T ∈T,z∈Z,θ∈Θ

and sup

Eθ [exp {αVθ (x, T, z)}] < ∞.

(6.5)

x∈X,T ∈T,z∈Z,θ∈Θ

This choice of α ensuring (6.4) and (6.5) will be useful in verification of the conditions of Theorem 12, which we next state. ∞ Theorem 12 (Choi and Schervish (2007)) Let {Xi }∞ i=1 be independently distributed with densities {fi (·|θ)}i=1 , with respect to a common σ-finite measure, where θ ∈ Θ, a measurable space. The densities fi (·|θ) are assumed to be jointly measurable. Let θ 0 ∈ Θ and let Pθ0 be the joint distribution of {Xi }∞ i=1 when θ 0 ∞ is the true value of θ. Let {Θn }n=1 be a sequence of subsets of Θ. Let θ have prior π on Θ. Define the

12

following: fi (Xi |θ 0 ) , fi (Xi |θ) Ki (θ 0 , θ) = Eθ0 (Λi (θ 0 , θ)) Λi (θ 0 , θ) = log

%i (θ 0 , θ) = V arθ0 (Λi (θ 0 , θ)) . Make the following assumptions: (1) Suppose that there exists a set B with π(B) > 0 such that P∞ %i (θ0 ,θ) (i) < ∞, ∀ θ ∈ B, i=1 i2 (ii) For all  > 0, π (B ∩ {θ : Ki (θ 0 , θ) < , ∀ i}) > 0. ∞ (2) Suppose that there exist test functions {Φn }∞ n=1 , sets {Ωn }n=1 and constants C1 , C2 , c1 , c2 > 0 such that P∞ (i) n=1 Eθ 0 Φn < ∞,

(ii)

sup θ∈Θcn ∩Ωn

Eθ (1 − Φn ) ≤ C1 e−c1 n ,

(iii) π (Ωcn ) ≤ C2 e−c2 n . Then, πn (θ ∈ Θcn |X1 , . . . , Xn ) → 0 6.1.1

a.s. [Pθ0 ].

(6.6)

Validation of posterior consistency

First note that, fi (Xi |θ) is given by (2.4). From the proof of Theorem 6, using finiteness of moments of fi (Xi |θ 0 ) all orders associated with Ui,θ and Vi,θ , it follows that log fi (Xi |θ) has an upper bound which has finite first and second order moments under θ 0 , and is uniform for all θ ∈ B, where B is any compact subset of Θ. Hence, for each i, %i (θ 0 , θ) is finite. Using compactness, Lemma 10 of Maitra and Bhattacharya (2016a) and arguments similar to that of Section 3.1.1 of Maitra and Bhattacharya (2015), it easily follows that %i (θ 0 , θ) < κ, for some 0 < κ < ∞, uniformly in i. Hence, choosing a prior that gives positive probability to the set B, it follows that for all θ ∈ B, ∞ X %i (θ 0 , θ) i=1

i2

∞ X 1 0, let Θδ = {(β, ξ) : K(θ, θ 0 ) < δ}, where K(θ, θ 0 ), defined as in (4.6), is the proper Kullback-Leibler divergence. Let the prior distribution π of the parameter θ satisfy dπ dν = h almost everywhere on Θ, where h(θ) is any positive, continuous density on Θ with respect to the Lebesgue measure ν. Then, as n → ∞, πn (θ ∈ Θcδ |X1 , . . . , Xn ) → 0 a.s. [Pθ0 ]. (6.8)

6.2

Asymptotic normality of the posterior distribution in the non-iid set-up

Below we present the three regularity conditions that are needed in the non-iid set-up in addition to the four conditions already stated in Section 5.2.1, for asymptotic normality given by (5.8). 6.2.1

Extra regularity conditions in the non-iid set-up

(5) The largest eigenvalue of Σn goes to zero in probability. (6) For δ > 0, define N0 (δ) to be the open ball of radius δ around θ 0 . Let ρn be the smallest eigenvalue of Σn . If N0 (δ) ⊆ Θ, there exists K(δ) > 0 such that ! lim Pθ0

n→∞

sup

ρn [`n (θ) − `n (θ 0 )] < −K(δ)

= 1.

(6.9)

θ∈Θ\N0 (δ)

(7) For each  > 0, there exists δ() > 0 such that lim Pθ0

n→∞

sup θ∈N0 (δ()),kγk=1

! 1 1 1 + γ T Σn2 `00n (θ)Σn2 γ <  = 1.

(6.10)

Although intuitive explanations of all the seven conditions are provided in Schervish (1995), here we briefly touch upon condition (7), which is seemingly somewhat unwieldy. First note that condition (6) ˆ n , so that θ ˆ n ∈ N0 (δ), as n → ∞. Thus, in (7), for sufficiently large ensures consistency of the M LE θ 00 00 ˆ n ), for all θ ∈ N0 (δ()). Hence, from the definition of Σ−1 n and sufficiently small δ(), `n (θ) ≈ `n (θ n 1 1 1 1 T 00 2 00 2 2 2 given by (5.6), it follows that Σn `n (θ)Σn ≈ −Iq+1 so that, since kγk = 1, 1 + γ Σn `n (θ)Σn γ ≈ 1 − kγk2 < , for all θ ∈ N0 (δ()) and for large enough n. Now it is easy to see that the role of condition (7) is only to formalize the heuristic arguments. 6.2.2

Verification of the regularity conditions

Assumptions (H1) – (H9), along with Kolmogorov’s strong law of large numbers, are sufficient for the regularity conditions to hold; the arguments remain similar as those in Section 3.2.2 of Maitra and Bhattacharya (2015). We provide our result in the form of the following theorem. 14

Theorem 14 Assume the non-iid set-up and conditions (H1) – (H9). Let the prior distribution π of the parameter θ satisfy dπ dν = h almost everywhere on Θ, where h(θ) is any density with respect to  the −1/2 ˆ Lebesgue measure ν which is positive and continuous at θ 0 . Then, letting Ψn = Σn θ − θ n , for each compact subset B of Rp+q+1 and each  > 0, the following holds:   ˜ lim Pθ0 sup πn (Ψn |X1 , . . . , Xn ) − φ(Ψn ) >  = 0. n→∞

7

(6.11)

Ψn ∈B

Random effects SDE model

We now consider the following system of SDE models for i = 1, 2, . . . , n: dXi (t) = φξi (t)bβ (Xi (t))dt + σ(Xi (t))dWi (t)

(7.1)

Note that this model is the same as described in Section 2 except that the parameters ξ i now depend upon i. Indeed, now φξi (t) is given by φξi (t) = φξi (z i (t)) = ξ0i + ξ1i g1 (zi1 (t)) + ξ2i g2 (zi2 (t)) + · · · + ξpi gp (zip (t)),

(7.2)

where ξ i = (ξ0i , ξ1i , . . . , ξpi )T is the random effect corresponding to the i-th individual for i = 1, . . . , n, and z i (t) is the same as in Section 2.1. We let bβ (Xi (t), φξi ) = φξi (t)bβ (Xi (t)). Note that our likelihood is the product over i = 1, . . . , n, of the following individual densities:   Vi,ξi ,β , fi,ξi ,β (Xi ) = exp Ui,ξi ,β − 2 where Z Ui,ξi ,β =

0

Ti

φξi (s)bβ (Xi (s)) σ 2 (Xi (s))

Z dXi (s)

and

Vi,ξi ,β =

Ti

φ2ξi (s)b2β (Xi (s))

0

σ 2 (Xi (s))

ds.

β β T Now, let mβ (z(t), x(t)) = (mβ 0 , m1 (z1 (t), x(t)), . . . , mp (zp (t), x(t))) be a function from Z × β R → Rp+1 where mβ 0 ≡ 1 and mk (z(t), x(t)) = gk (zk (t))bβ (x(t)); k = 1, . . . , p. With this notation, the likelihood can be re-written as the product over i = 1, . . . , n, of the following: i T β i fi,ξi ,β (Xi ) = exp((ξ i )T Aβ i − (ξ ) B i ξ )

where

mβ (z(s), Xi (s)) dXi (s) σ 2 (Xi (s))

(7.4)

T mβ (z(s), Xi (s)) mβ (z(s), Xi (s)) ds σ 2 (Xi (s))

(7.5)

Aβ i = and Bβ i

Z = 0

Ti

Ti

(7.3)

Z 0

are (p + 1) × 1 random vectors and positive definite (p + 1) × (p + 1) random matrices respectively. We assume that ξ i are iid Gaussian vectors, with expectation vector µ and covariance matrix Σ ∈ S p+1 (R) where S p+1 (R) is the set of real positive definite symmetric matrices of order p + 1. The parameter set is denoted by θ = (µ, Σ, β) ∈ Θ ⊂ Rp+1 × S p+1 (R) × Rq . To obtain the likelihood involving θ we refer to the multidimensional random effects set-up of Delattre et al. (2013). Following Lemma 2 of Delattre et al. (2013) it then follows in our case that, for each β β −1 i ≥ 1 and for all θ, B β i + Σ , Ip+1 + B i Σ, Ip+1 + ΣB i are invertible.

15

 −1 −1 β Setting Rβ = (Ip+1 + B β i i Σ) B i we obtain 1

1 exp − 2





Bβ i

µ− fi (Xi |θ) = q det(Ip+1 + B β Σ) i       T −1 1 β β β × exp Ai Ai Bi 2

−1

Aβ i

T 

Rβ i

 −1   −1 β β µ − Bi Ai

!

(7.6)

as our desired likelihood after integrating (7.3) with respect to the distrbution of ξ i . With reference to Delattre et al. (2013) in our case β β −1 γi (θ) = (Ip+1 + ΣB β i ) (Ai − B i µ)

and

−1 β Ii (Σ) = (Ip+1 + ΣB β i ) Bi .

Hence, Proposition (10)(i) of Delattre et al. (2013) can be seen to be hold here in a similar way by β replacing Ui and Vi by Aβ i and B i respectively. Asymptotic investigation regarding consistency and asymptotic normality of M LE and Bayesian posterior consistency and asymptotic posterior normality in both iid and non-iid set-ups can be established as in the one dimensional cases in Maitra and Bhattacharya (2016c) and Maitra and Bhattacharya β (2015) with proper multivariate modifications by replacing Ui and Vi with Aβ i and B i respectively, and exploiting assumptions (H1) – (H9).

8

Simulation studies

We now supplement our asymptotic theory with simulation studies where the data is generated from a specific system of SDEs with one covariate, with given values of the parameters. Specifically, in the classical case, we obtain the distribution of the M LEs using parametric bootstrap, along with the 95% confidence intervals of the parameters. We demonstrate in particular that the true values of the parameters are well-captured by the respective 95% confidence intervals. In the Bayesian counterpart, we obtain the posterior distributions of the parameters along with the respective 95% credible intervals, and show that the true values fall well within the respective 95% credible intervals.

8.1

Distribution of M LE when n = 20

To demonstrate the finite sample analogue of asymptotic distribution of M LE as n → ∞, we consider n = 20 individuals, where the i-th one is modeled by dXi (t) = (θ1 + θ2 zi1 (t))(θ3 + θ4 Xi (t))dt + σdWi (t),

(8.1)

for i = 1, . . . , 20. We fix our diffusion coefficient as σ = 1. We consider the initial value X(0) = 0 and the time interval [0, T ] with T = 1. Further, we choose the true values as θ1 = 1, θ2 = −1, θ3 = 2, θ4 = −2. We assume that the time dependent covariates zi1 (t) satisfy the following SDE dzi1 (t) = ξi1 zi1 (t))dt + dWi (t), iid

(8.2)

for i = 1, . . . , 20, where the coeffiicients ξi1 ∼ N (7, 1) for i = 1, . . . , 20. After simulating the covariates using the system of SDEs (8.2), we generate the data using the system of SDEs (8.1). In both the cases we discretize the time interval [0, 1] into 100 equispaced time points. The distributions of the M LEs of the four parameters are obtained through the parametric bootstrap method. In this method we simulated the data 1000 times by simulating as many paths of the Brownian motion, where each data set consists of 20 individuals. Under each data set we perform the “blockrelaxation” method (see, for example, Lange (2010) and the references therein) to obtain the M LE. 16

In a nutshell, starting with some sensible initial value belonging to the parameter space, the blockrelaxation method iteratively maximizes the optimizing function (here, the log-likelihood), successively, with respect to one parameter, fixing the others at their current values, until convergence is attained with respect to the iterations. Details follow. For the initial values of the θj for j = 1, . . . , 4, required to begin the block-relaxation method, (0) we simulate four N (0, 1) variates independently, and set them as the initial values θj ; j = 1, . . . , 4. (i)

Denoting by θj the value of θj at the i-th iteration, for i ≥ 1, and letting L be the likelihood, the block-relaxation method consists of the following steps: Algorithm 1 Block-relaxation for M LE in SDE system with covariates (i)

(1) At the i-th iteration, for j = 1, 2, 3, 4, obtain θj (i)

(i)

by solving the equation (i)

(i−1)

∂ log L ∂θj

= 0, con(i−1)

ditionally on θ1 = θ1 , θ2 = θ2 , . . . , θj−1 = θj−1 , θj+1 = θj+1 , . . . , θ4 = θ4   (i) (i) θ (i) = θ1 , . . . , θ4 .

. Let



ˆ = θ (i) , where θ ˆ stands (2) Letting k · k denote the Euclidean norm, if θ (i) − θ (i−1) ≤ 10−5 , set θ for the maximum likelihood estimator of θ = (θ1 , . . . , θ4 ).

(i) (i−1) (3) If, on the other hand, θ − θ

> 10−5 , increase i to i + 1 and continue steps (1) and (2). ˆ in Once we obtain the M LE by the above block-relaxation algorithm, we then plug-in θ = θ (8.1) and generate 1000 data sets from the resulting system of SDEs, and apply the block-relaxation algorithm to each such data set to obtain the M LE associated with the data sets. Thus, we obtain the distribution of the M LE using the parametric bootstrap method. ˆ (denoted by θˆi for i = 1, . . . , 4) are shown in Figure 8.1, The distributions of the components of θ where the associated 95% confidence intervals are shown in bold lines. As exhibited by the figures, the 95% confidence intervals clearly contain the true values of the respective components of θ.

8.2

Posterior distribution of the parameters when n = 20

We now consider simulation study for the Bayesian counterpart, using the same data set simulated from the system of SDEs given by (8.1), with same covariates simulated from  (8.2). We consider an empirical Bayes prior based on the M LE such that for j = 1, . . . , 4, θj ∼ N θˆj , σ ˆj2 independently, where θˆj is the M LE of θj and σ ˆj2 is such that the length of the 95% confidence interval associated with the distribution of the M LE θˆj , after adding one unit to both lower and upper ends of the interval, is the same as the length of the 95% prior credible interval [θˆj − 1.96ˆ σj − 1, θˆj + 1.96ˆ σj + 1]. In other words, we select σ ˆj such that the length of the corresponding 95% prior credible interval is the same as that of the enlarged 95% confidence interval associated with the distribution of the corresponding M LE. To simulate from the posterior distribution of θ, we perform approximate Bayesian computation (ABC) (Tava´re et al. (1997), Beaumont et al. (2002), Marjoram et al. (2003)), since the standard Markov chain Monte Carlo (MCMC) based simulation techniques, such as Gibbs sampling and MetropolisHastings algorithms (see, for example, Robert and Casella (2004), Brooks et al. (2011)) failed to ensure good mixing behaviour of the underlying Markov chains. Denoting the true data set by Xtrue , our method of ABC is described by following steps. Figure 8.2 shows the posterior distribution of the parameters θi , for i = 1, . . . , 4, where the 95% posterior credible intervals are shown in bold lines. Observe that all the true values of θj ; j = 1, . . . , 4, fall comfortably within the respective 95% posterior credible intervals.

17

Density of mle of (θ2)

600

density

400

60

0

0

20

200

40

density

80

800

100

1000

120

1200

Density of mle of (θ1)

0.990

0.995

1.000 θ^

1.005

1.010

−1.0010

−1.0005

1

−1.0000 θ^

−0.9995

−0.9990

2

Density of mle of (θ4)

density 0

0

5

1000

10

2000

15

density

20

3000

25

4000

30

35

5000

Density of mle of (θ3)

1.96

1.98

2.00 θ^

2.02

2.04

−2.0004

−2.0003

−2.0002

−2.0001

−2.0000

−1.9999

−1.9998

−1.9997

θ^4

3

Figure 8.1: Distributions of the M LEs. Algorithm 2 ABC for SDE system with covariates (1) For j = 1, . . . , 4, we simulate the parameters θj from their respective prior distributions. (2) With the obtained values of the parameters we simulate the new data, which we denote by Xnew , using the system of SDEs (8.1). (3) We calculate the average Euclidean distance between Xnew and Xtrue and denote it by dx . (4) Until dx < 0.1, we repeat steps (1)–(3). (5) Once dx < 0.1, we set the corresponding θ as a realization from the posterior of θ with approximation error 0.1. (6) We obtain 10000 posterior realizations of θ by repeating steps (1)–(5).

18

Posterior of (θ2)

1.0

Posterior Density

0.0

0.0

0.5

0.5

Posterior Density

1.0

1.5

2.0

1.5

Posterior of (θ1)

0.5

1.0

1.5

2.0

−2.0

−1.5

(θ1)

−1.0

(θ2) Posterior of (θ4)

0.6

Posterior Density

0.4

0.4

0.0

0.2

0.2 0.0

Posterior Density

0.6

0.8

1.0

0.8

1.2

Posterior of (θ3)

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

−3.5

(θ3)

−3.0

−2.5

−2.0

(θ4)

Figure 8.2: Posterior distributions of the parameters.

19

−1.5

−1.0

−0.5

9

Application to real data

We now consider application of our SDE system consisting of covariates to a real, stock market data (467 observations from August 5, 2013, to June 30, 2015) for 15 companies. The data are available at www.nseindia.com. Each company-wise data is modeled by the availabe standard financial SDE models with the “fitsde” package in R. The minimum value of BIC (Bayesian Information Criterion) is found corresponding to the CKLS (Chan, Karolyi, Longstaff and Sander; see Chan et al. (1992)) model. Denoting the data by X(t), the CKLS model is described by dX(t) = (θ1 + θ2 X(t))dt + θ3 X(t)θ4 dW (t). In our application we treat the diffusion coefficient as a fixed quantity. So, we fix the values of θ3 and θ4 as obtained by the “fitsde” function, We denote θ3 = A, θ4 = B. We consider the “close price” of each company as our data X(t). IIP general index, bank interest rate and US dollar exchange rate are considered as time dependent covariates which we incorporate in the CKLS model. The three covariates are denoted by c1 , c2 , c3 , respectively. Now, our considered system of SDE models for national stock exchange data associated with the 15 companies is the following: i

dXi (t) = (θ1 + θ2 c1 (t) + θ3 c2 (t) + θ4 c3 (t))(θ5 + θ6 Xi (t))dt + Ai Xi (t)B dWi (t),

(9.1)

for i = 1, . . . , 15.

9.1

Distribution of M LE

We first obtain the M LEs of the 6 parameters θj for j = 1, . . . , 6 by the block-relaxation algorithm described by Algorithm 1, in Section 8.1. In this real data set up, the process starts with the initial value θj = 1 for j = 1, . . . , 6 (our experiments with several other choices demonstrated practicability of those choices as well) and in step (3) of Algorithm 1, the distance is taken as 0.1 instead of 10−5 . Then taking the M LEs as the value of the parameters θj for j = 1, . . . , 6 we perform the parametric bootstrap method where we generate the data 1000 times and with respect to each data set, obtain the M LEs of the six parameters by the block-relaxation method as already mentioned. Figure 9.1 shows the distribution of M LEs (denoted by θˆj for j = 1, . . . , 6) where the respective 95% confidence intervals are shown in bold lines. Among the covariates, c3 , that is, the US dollar exchange rate, seems to be less significant compared to the others, since the distribution of the M LE of the associated coefficient, θ3 , has highest density around zero, with small variability, compared to the other coefficients. Also note that the distribution of θˆ6 is highly concentrated around zero, signifying that the Xi (t) term in the drift function of (9.1) is probably redundant.

9.2

Posterior Distribution of the parameters

In the Bayesian approach all the set up regarding the real data is exactly the same as in Section 9, that is, each data is driven by the SDEs (9.1) where the covariates cj for j = 1, . . . , 3 are already mentioned in that section. In this case, we consider the priors for the 6 parameters to be independent normal with mean zero and variance 100. Since in real data situations the parameters are associated with greater uncertainties compared to simulation studies, somewhat vague prior as we have chosen here, as opposed to that in the simulation study case, makes sense. The greater uncertainty in the parameters in this real data scenario makes room for more movement, and hence, better mixing of MCMC samplers such as Gibbs sampling, in contrast with that in simulation studies. As such, our application of Gibbs sampling, were the full conditionals are normal distributions with appropriate means and variances, yielded excellent mixing. Although we chose the initial values as θj = 0.1; j = 1, . . . , 6, other choices also turned out to be very much viable. We perform 100000 20

Density of mle of (θ3)

3

density

density −0.9

−0.8

−0.7

−0.6 θ^1

−0.5

0

0

0

1

1

2

2

2

3

4

density

4

4

5

6

5

6

7

6

Density of mle of (θ2)

8

Density of mle of (θ1)

−0.4

0.0

0.2

0.4

0.6

−0.3

−0.2

−0.1

0.1

0.2

Density of mle of (θ5)

0.006

0.008

Density of mle of (θ6)

density

density 0.6

0.8

1.0 θ^4

1.2

1.4

0

0

0

1

50

1

2

100

2

density

3

150

3

4

200

4

5

250

5

Density of mle of (θ4)

0.0 θ^3

θ^2

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.004

−0.002

θ^5

Figure 9.1: Distribution of M LEs for the real data.

21

0.000

0.002 θ^ 6

0.004

0.3

Trace plot of (θ2)

Trace plot of (θ3)

θ3 −15

θ2

−30

−20

−10

−25

−15

−20

0

−10

θ1

−5

−10

10

−5

0

0

5

20

5

Trace plot of (θ1)

0

2000

4000

6000

8000

10000

0

2000

4000

Iteration

6000

8000

10000

0

2000

4000

Iteration

8000

10000

Trace plot of (θ5)

Trace plot of (θ6)

0.04

θ6

θ5 −10

−0.02

0

0.00

−1.0

0.02

10

θ4

−0.5

20

0.06

0.08

30

0.0

0.10

Trace plot of (θ4)

6000 Iteration

0

2000

4000

6000

8000

10000

0

Iteration

2000

4000

6000

8000

10000

Iteration

0

2000

4000

6000

8000

10000

Iteration

Figure 9.2: Trace Plot of the Parameters Gibbs sampling iterations to obtain our required posterior distributions of the 6 parameters. Figure 9.2 shows the trace plots of the 6 parameters associated with 10000 thinned samples obtained by plotting the output of every 10-th iteration. We emphasize that although we show the trace plots of only 10000 Gibbs sampling realizations to reduce the file size, our inference is based on all the 100000 realizations. From the trace plots, convergence of the posterior distributions of the parameters is clearly observed. Figure 9.2 displays the posterior densities of the 6 parameters, where the 95% credible intervals are indicated by bold lines. The posterior distribution of θ3 is seen to include zero in the highest density region; however, unlike the distribution of the M LE θˆ3 , the posterior of θ3 has a long left tail, so that insignificance of c3 is not very evident. The posterior of θ6 is highly concentrated around zero, agreeing with the M LE of θ6 that the term Xi (t) in the drift function is perhaps redundant. Note that the posterior of θ5 also inclues zero in its high-density region, however, it has a long left tail, so that the significance of θ5 , and hence, of the overall drift function, is not ruled out.

10

Summary and conclusion

In SDE based random effects model framework, Delattre et al. (2013) considered the linearity assumption in the drift function given by b(x, φi ) = φi b(x), assuming φi to be Gaussian random variables with mean µ and variance ω 2 , and obtained a closed form expression of the likelihood of the above parameters. Assuming the iid set-up, they proved convergence in probability and asymptotic normality of the maximum likelihood estimator of the parameters. Maitra and Bhattacharya (2016a) and Maitra and Bhattacharya (2016b) extended their model by incorporating time-varying covariates in φi and allowing b(x) to depend upon unknown parameters, 22

Posterior of (θ2)

Posterior of (θ3)

−20

−15

−10

−5

0

5

0.08

Posterior density

0.00

0.00

0.00

0.02

0.05

0.04

0.06

0.10

Posterior density 0.05

Posterior density

0.10

0.15

0.10

0.12

0.20

0.15

0.14

Posterior of (θ1)

−10

−5

0

5

θ1

10

15

−25

−20

−15

−10

Posterior of (θ5)

0

5

Posterior of (θ6)

200

Posterior density

Posterior density 5

0.04

0 0

10

20 θ4

30

0

100

0.02 0.00

Posterior density

10

300

0.06

400

15

Posterior of (θ4)

−5

θ3

θ2

−0.25

−0.20

−0.15

−0.10

−0.05

0.00

0.05

0.10

−0.010

−0.008

−0.006

−0.004

θ5

Figure 9.3: Posterior Distributions of the Parameters for real data

23

−0.002 θ6

0.000

0.002

0.004

but rather than inference regarding the parameters, they developed asymptotic model selection theory based on Bayes factors for their purposes. In this paper, we developed asymptotic theories for parametric inference for both classical and Bayesian paradigms under the fixed effects set-up, and provided relevant discussion of asymptotic inference on the parameters in the random effects set-up. As our previous investigations (Maitra and Bhattacharya (2016c), Maitra and Bhattacharya (2015), for instance), in this work as well we distinguished the non-iid set-up from the iid case, the latter corresponding to the system of SDEs with same initial values, time domain, but with no covariates. However, as already noted, this still provides a generalization to the iid set-up of Delattre et al. (2013) through generalization of b(x) to bβ (x); β being a set of unknown parameters. Under suitable assumptions we obtained strong consistency and asymptotic normality of the M LE under the iid set-up and weak consistency and asymptotic normality under the non-iid situation. Besides, we extended our classical asymptotic theory to the Bayesian framework, for both iid and non-iid situations. Specifically, we proved posterior consistency and asymptotic posterior normality, for both iid and non-iid set-ups. In our knowledge, ours is the first-time effort regarding asymptotic inference, either classical or Bayesian, in systems of SDEs under the presence of time-varying covariates. Our simulation studies and real data applications, with respect to both classical and Bayesian paradigms, have revealed very encouraging results, demonstrating the importance of our developments even in practical, finite-sample situations.

Acknowledgment We are sincerely grateful to the EIC, the AE and the referee whose constructive comments have led to significant improvement of the quality and presentation of our manuscript. The first author also gratefully acknowledges her CSIR Fellowship, Govt. of India.

References Beaumont, M. A., Zhang, W., and Balding, D. J. (2002). Approximate Bayesian Computation in Population Genetics. Genetics, 162, 2025–2035. Bishwal, J. P. N. (2008). Parameter Estimation in Stochastic Differential Equations. Lecture Notes in Mathematics, 1923, Springer-Verlag. Brooks, S., Gelman, A., Jones, G. L., and Meng, X.-L. (2011). Handbook of Markov Chain Monte Carlo. Chapman and Hall, London. Chan, K. C., Karolyi, G. A., Longstaff, F. A., and Sanders, A. B. (1992). An Empirical Comparison of Alternative Models of the Short-Term Interest Rate. The Journal of Finance, 47, 1209–1227. Choi, T. and Schervish, M. J. (2007). On Posterior Consistency in Nonparametric Regression Problems. Journal of Multivariate Analysis, 98, 1969–1987. Delattre, M., Genon-Catalot, V., and Samson, A. (2013). Maximum Likelihood Estimation for Stochastic Differential Equations with Random Effects. Scandinavian Journal of Statistics, 40, 322–343. Hoadley, B. (1971). Asymptotic Properties of Maximum Likelihood Estimators for the Independent not Identically Distributed Case. The Annals of Mathematical Statistics, 42, 1977–1991. Lange, K. (2010). Numerical Analysis for Statisticians. Springer, New York. Leander, J., Almquist, J., Ahlstr¨om, C., Gabrielsson, J., and Jirstrand, M. (2015). Mixed Effects Modeling Using Stochastic Differential Equations: Illustrated by Pharmacokinetic Data of Nicotinic Acid in Obese Zucker Rats. The AAPS Journal, 17, 586–596.

24

Maitra, T. and Bhattacharya, S. (2015). On Bayesian Asymptotics in Stochastic Differential Equations with Random Effects. Statistics and Probability Letters, 103, 148–159. Also available at “http://arxiv.org/abs/1407.3971”. Maitra, T. and Bhattacharya, S. (2016a). Asymptotic Theory of Bayes Factor in Stochastic Differential Equations: Part I. Available at “https://arxiv.org/abs/1503.09011”. Maitra, T. and Bhattacharya, S. (2016b). Asymptotic Theory of Bayes Factor in Stochastic Differential Equations: Part II. Available at “https://arxiv.org/abs/1504.00002”. Maitra, T. and Bhattacharya, S. (2016c). On Asymptotics Related to Classical Inference in Stochastic Differential Equations with Random Effects. Statistics and Probability Letters, 110, 278–288. Also available at “http://arxiv.org/abs/1407.3968”. Marjoram, P., Molitor, J., Plagnol, V., and Tava´re, S. (2003). Markov Chain Monte Carlo Without Likelihoods. Proceedings of the National Academy of Sciences, 100, 15324–15328. Oravecz, Z., Tuerlinckx, F., and Vandekerckhove, J. (2011). A hierarchical latent stochastic differential equation model for affective dynamics. Psychological Methods, 16, 468–490. Overgaard, R. V., Jonsson, N., Tornœ, C. W., and Madsen, H. (2005). Non-Linear Mixed-Effects Models with Stochastic Differential Equations: Implementation of an Estimation Algorithm. Journal of Pharmacokinetics and Pharmacodynamics, 32, 85–107. Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. Springer, New York. Rao, B. (2013). Semimartingales and their Statistical Inference. Chapman and Hall/CRC, Boca Ratan. Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods. Springer-Verlag, New York. Schervish, M. J. (1995). Theory of Statistics. Springer-Verlag, New York. Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. John Wiley & Sons, Inc., New York. Tava´re, S., Balding, D. J., Griffiths, R. C., and Donelly, P. (1997). Inferring Coalescence Times from DNA Sequence Data. Genetics, 145, 505–518. Zhu, B., Song, P. X.-K., and Taylor, J. M. G. (2011). Stochastic Functional Data Analysis: A Diffusion Model-Based Approach. Biometrics, 67, 1295–1304.

25