Linear and nonlinear filtering in mathematical finance: an ... - wseas.us

20 downloads 76 Views 392KB Size Report
Table 1 below shows the parameter values of single factor stochastic volatility models for two stocks. Table 1. Parameter values. IBM Citigroup φ 0.93. 0.89.
Proceedings of the 9th WSEAS International Conference on SYSTEMS THEORY AND SCIENTIFIC COMPUTATION (ISTASC '09)

Linear and nonlinear filtering in mathematical finance: an overview Paresh Date Department of Mathematical Sciences, Brunel University Uxbridge, Middlesex UB8 3PH, United Kingdom [email protected] Abstract: This paper presents a brief overview of Kalman filtering and its applications in mathematical finance. Results of recent empirical studies with market data are presented for stochastic volatility modelling and yield curve modelling. The paper also outlines rigorous as well as heuristic approaches to filtering of nonlinear time series. Keywords: Kalman filtering, time series models, maximum likelihood

1

Introduction

E(²n ²Tn ) = Q > 0 and E(wn wnT ) = R > 0 are constants or are known functions of time. Only the real valued variable yn is measured or is observable; the variable xn is of interest and needs to be estimated. We denote the estimate of xn based on information up to time tn−i as x ˆn|n−i for i ≥ 0 and we assume that the initial estimate x ˆ0|0 is known. The conditional variance of the estimate is correspondingly denoted by Pn|n−i and P0|0 > 0 is assumed to be known. With this notation, the following set of recursive equations is conventionally referred to as a Kalman filter:

The problem of estimating unobserved latent variables from observed market data arises frequently in mathematical finance. Kalman filter [21] and its generalizations have been the main tools for estimating the unobserved variables from the observed ones in econometrics and in engineering for several decades and its use is now becoming common in mathematical finance. Kalman filter is used in calibration of time series models, forecasting of variables and smoothing applications. This paper gives a brief overview of Kalman filtering theory and presents recent empirical results on two applications in finance. Some developments in approximate nonlinear filtering are also reviewed. The rest of the paper is organized as follows. In section 2.1, basic linear Gaussian filtering methodology is described. In section 2.2, the use of Kalman filter in calibration of time series models is outlined. Sections 3.1 and 3.2 present two case studies for applications of Kalman filtering in mathematical finance. Section 4 presents an overview of some approaches to approximate filtering in nonlinear systems. Finally, section 5 summarises the contributions and outlines directions of future research.

(3) (4) (5)

Pn+1|n = APn|n−1 AT + Q T − APn|n−1 C T Σ−1 n CPn|n−1 A .

(6)

Here, vn in (2) represent information which could not have been derived from data up to time tn−1 and are called innovations. Σn represents the covariance matrix of innovations. x ˆn+1|n is in fact a sample of conditional expectation of xn+1 based on information up to time tn , determined by the realized value of yn . This set of equations can be derived from the following relationship between conditional moments of jointly Gaussian variables (see, e.g. [14]): if x, y are jointly Gaussian,

Consider a discrete time, linear state space system xn+1 = Axn + B + ²n+1 , (1)

E(x|y) = E(x) + Σxy Σ−1 yy (y − E(y)), (7)

where, ²n , wn are zero mean, Gaussian and uncorrelated random variables at each time tn . A, B, C, D, ISSN: 1790-2769

Σn = CPn|n−1 C T + R,

x ˆn+1|n = Aˆ xn|n + B,

The linear Kalman filter

yn = Cxn + D + wn ,

(2)

x ˆn|n = x ˆn|n−1 + Pn|n−1 C T Σ−1 n vn ,

2 Basic linear filtering theory 2.1

vn = yn − (C x ˆn|n−1 + D),

E(x − E(x|y))2 = Σxx − Σxy Σ−1 yy Σyx ,

25

ISBN: 978-960-474-109-0

(8)

Proceedings of the 9th WSEAS International Conference on SYSTEMS THEORY AND SCIENTIFIC COMPUTATION (ISTASC '09)

where Σxy etc. are variance terms. It is instructive to compare equations (7) and (8) with equations (4) and (6) respectively. The reader is referred to specialist textbooks such as [11] and [15] for more details. Given yn , x ˆn+1|n and (C x ˆn+1|n + D) serve as forecasts of xn+1 and yn+1 respectively.

2.2

where µ is the instantaneous drift, B(t) is the standard Wiener process and σ 2 (t) is the spot volatility or instantaneous volatility. µ is assumed to be constant while σ 2 (t) is assumed to be driven by a Wiener process uncorrelated with B(t). σ 2 (t) is obviously unobservable. An integral of spot volatility over a given period ∆ is called the actual volatility, σn2 : Z

Time series calibration using maximum likelihood

σn2

Let Fn denote all the measurements available until and including time tn . Then the probability density p(yn+1 |Fn ) for (1) is Gaussian with E(yn+1 |Fn ) = C x ˆn+1|n + D

is a noisy estimate of σn2 , where un is a zero mean i.i.d. sequence. This is a consistent estimate of σn2 as M → ∞ if c = 0 and is an unbiased estimate if c = µ = 0. µ is usually small for a small period ∆, e.g. a one day stock price return is dominated by volatility rather than by drift. c is a scaling factor to account for overnight returns being much larger than the intraday returns, see, e.g. [22]. Euler discretisation of a constant coefficient, linear Gaussian mean reverting stochastic process

p(yi |Fi−1 ).

log p(yi |Fi−1 )

i=1 N

=−

¢ 1 X¡ log |Σi | + vi> Σ−1 i vi , 2

dx(t) = a(b − x(t))dt + βdW (t),

i=1

2 σn+1 = φσn2 + Γ + Q²n+1 ,

zn

= σn2 + Cηn .

(10)

√ Here, φ = 1 − a∆, Γ = ab∆, Q = β ∆, a, b, β are constants and ²n , ηn are standard normal i.i.d. sequences. Similar models are used in earlier, seminal work on realized volatility in [2]. From this point onwards, we can use the calibration and forecasting procedures mentioned in the earlier sections, along with the realised volatility zn computed from intra-day asset prices, to calibrate the model and to forecast future volatility. Extensive numerical experiments for calibration and forecasting with stochastic volatility models were reported in [16] and [17]. We provide a selective summary of this work here. For numerical experiments, we used 5 minutes intra-day data for IBM and Citigroup stocks from

3 Applications in finance Stochastic volatility models

The first application of filtering in finance reviewed here is that of modelling volatility of stock prices using intra-day data. The return equation for the logstock price S l is given by dS l (t) = µdt + σ(t)dB(t), ISSN: 1790-2769

(9)

with x(t) = σ 2 (t) leads to a linear discrete time state space system similar to the one in (1):

when the constant terms are ignored. Given time series data y1 , y2 , . . . , yN , the quantities vi and Σi are found through Kalman filter recursions outlined in the previous section. The above function can then be maximized to find the parameter vectors B, D and matrices A, C, Q and R. The readers are referred to ([11], chapter 7) for more details on maximum likelihood-based calibration in Kalman filtering framework.

3.1

σ(u)2 du.

0

= σn2 + un ,

It is usually simpler to work with the logarithm of likelihood, which is given by log L(Y ) =

0

zn = (1 + c)

i=2

N X

(n−1)∆

σ(u) du −

¾ ½ M · X ∆j S l (n − 1)∆ + M j=1 ¾¸ ½ ∆(j − 1) 2 l , −S (n − 1)∆ + M

and

We can write the likelihood function (i.e., the joint probability function) for the set of observations Y = {y1 , y2 , . . . , yN } as N Y

=

2

σn2 is unobservable as well. The realised volatility zn , given by

V ar(yn+1 |Fn ) = Σn+1 .

L(Y ) = p(y1 )

Z

n∆

26

ISBN: 978-960-474-109-0

Proceedings of the 9th WSEAS International Conference on SYSTEMS THEORY AND SCIENTIFIC COMPUTATION (ISTASC '09)

August 1997 to January 2005 from commercial data provider PriceData. ∆ in both cases was one day and M was 78 (daily volatility is estimated based on 78 intra-day stock price readings). Days which contained no data were ignored, as is a common practice within econometrics. Any intra-day missing data was dealt with using standard methods for stochastic interpolation. Log-price data was further scaled by a factor of 100 so that the returns are expressed as percentage. We used maximum likelihood along with the output of the Kalman filter for estimation of parameters, as outlined in section 2.2. For initialization, we take steady Q2 Γ and P0 = 1−φ state values σ0 = 1−φ 2 . In-built routines from MATLAB were used for non-convex optimisation in calibration. Single factor as well as two factor state space models were calibrated using Kalman filter. The two factor model has Γ in (10) driven by a Gaussian stochastic process. The forecasting performance of these models was compared against GARCH type models calibrated from same data. GARCH (Generalized Auto Regressive Conditional Heteroskedasticity) type models (first introduced in [4]) are used commonly in the finance industry for short term volatility forecasting. Table 1 below shows the parameter values of single factor stochastic volatility models for two stocks.

Table 2 below illustrates the variance of errors obtained by a single factor stochastic volatility (SV) model and the GJR GARCH-t model (11) using this approach for forecasting, with k = 10. It can be seen that Kalman filter based stochastic volatility model outperforms the modified GARCH model. The experiments reported in [17] indicate that the latent state based volatility models usually outperform GARCH type models and provide a simple alternative to the established practice of using GARCH models in short term forecasting.

Table 2. Variance of 10-step ahead forecasting errors for IBM and Citigroup stock.

SV model GARCH-t

IBM 0.93 0.31 1.38 3.68

3.2

=

βσn2

+%+

αζn2

Citigroup 0.89 0.71 3.32 5.7

+

λζn2 max(0, ζn ),

rt = µ −

n X

Xi,t ,

i=1

dXi,t = −αi Xi,t +

(11)

n X

σij dzj,t ,

(12)

j=1

where ζn is the daily log-return which is assumed to have a Student-t distribution with ν degrees of freedom and β, α, %, λ are constants with α + β < 1. ISSN: 1790-2769

Linear Gaussian interest rate models: calibration and forecasting

Exponential affine term structure models is one of the oldest and the most widely studied class of dynamic interest rate models. The main advantage of these models is the fact that the yields can be expressed as affine functions of the short rate (i. e. instantaneously compounded interest rate). One factor linear Gaussian model was introduced in [25], which was later extended to multi-factor models in [1]. Linear Kalman filtering for interest rate models has been discussed in [1] and [9], among others. An extensive treatment of these models can be found in advanced graduate textbooks such as [18]. In the work presented here, we consider an nfactor, linear Gaussian short rate model of the form

The comparison of forecasting performance was based on the variance of k-step ahead forecasting er2 rors, i.e. on the sample mean of (ˆ σn+k|n −zn+k )2 . For 2 a given k > 1, the forecast σ ˆn+k|n can be constructed in various ways. One ad-hoc, but industry-standard 2 approach is scaling σ ˆn+1|n by k. As a benchmark for comparison, we use GJR GARCH model as proposed in [13], but with Student-t distributed returns as proposed in [3]. The volatility under this model can be expressed as 2 σn+1

Citigroup 2101.75 3402.81

The reader is referred to [16] for statistical properties of data, calibration of multi-factor models and detailed numerical results on forecasting.

Table 1. Parameter values φ Γ Q C

IBM 724.51 1142.25

where rt is the (unobservable) short rate, αi , σij , µ etc are constants and zj,t are independent standard Wiener processes. The price of a zero coupon bond at time t 27

ISBN: 978-960-474-109-0

Proceedings of the 9th WSEAS International Conference on SYSTEMS THEORY AND SCIENTIFIC COMPUTATION (ISTASC '09)

with time to maturity τk is given by · µ Z t+τ ¶¸ k P (t, τk ) = E exp − rs ds ,

Here λi are prices of risk which relate the drifts implied by the time series data and those implied by the cross sectional spot rate data at a particular point in time. The intended use of the model after calibration was short and medium term forecasting. Hence we consider the sample mean of the relative absolute error (MRAE) as our measure of error for each time to maturity τk :

t

where the expectation is assumed to be taken under appropriate risk neutral measure. The equivalent continuously compounded interest rate for the same time to maturity τk is called spot rate and is denoted by rt (τk |θ). For the chosen dynamics, theoretical spot rates are affine functions of the instantaneous interest rate rt : n X rt (τk |θ) = A0 (τk |θ) + Ai (τk |θ)Xi,t , (13)

N 1 X |observed yield-predicted yield| M RAE = . N observed yield i=1

This was computed over the relevant set of N observations (either in-sample or out-of sample), for one step ahead prediction of yields. It was found that the maximum out-of-sample error (over 7 yields) is only 2.31%, over a period of almost two years from calibration. The results are summarised in table 4, where the first column indicates the time to maturity for which the MRAE is computed. Similar, if less spectacular, results were obtained for the second data set consisting of 7 US T-bill yields for a more turbulent economic period of 1999-2001. The maximum out-ofsample error for a two factor model calibrated with US data was 3.5%. The reader is referred to [8] for full numerical results.

i=1

where Ai (·|·) are known functions and θ is the vector of parameters; see [1] or [8] for the exact form of Ai . From a commercial data provider such as Datastream in the UK, one may obtain the actual spot rates for different times to maturities, each day. We assume that our model is imperfect and the actual spot rates equal the corresponding theoretical spot rates plus a zero mean noise term: Rt,τk = rt (τk , θ) + ²t,k ,

(14)

with E(²t,τ ) = 0, E(²t,τ ²Tt,τ ) = h2k I and hk are conk k k stants. Discretised version of (12) along with (13)(14) form a linear state space system. Given observed Rt,τk for various maturities at each t, we can use Kalman filtering to calibrate this model and forecast spot rates. We provide a selective summary of numerical experiments reported in [8], where two different data sets were used. The first data set consisted of weekly data on 7 UK gilt yields i.e. government bond spot rates from January 2001 to June 2005 from Datastream. Out of 232 observations, 180 used for calibration and 42 were used for validation. One, two and three factor linear models were calibrated using Kalman filter and maximum likelihood method described earlier. The parameter values after calibration of a two factor model are reported in table 3.

Table 4. Mean relative absolute errors (UK data) τk 3m 6m 1y 2y 4y 8y 10y

4

0.7036 0.0080 0.4249 0.2768 0.0007 0.0017 0.0007 −0.0003

ISSN: 1790-2769

α2 σ22 λ2 h1 h3 h5 h7

out-of-sample 0.0118 0.0081 0.0216 0.0229 0.0152 0.0231 0.0213

Approximate nonlinear filtering

Several useful dynamic models are nonlinear and are of the form:

Table 3. Parameter values for two factor model α1 σ11 λ1 µ h2 h4 h6 σ12

in-sample 0.0111 0.0178 0.0331 0.0337 0.0262 0.0209 0.0161

0.2807 0.1806 -0.0414 0.0006 0.0014 0.0011 0.0010

xn+1 = g(xn ) + ²n+1 , yn = h(xn ) + wn ,

(15)

for static nonlinear functions g(·), h(·) and zero mean noise variables ²n , wn as before. We consider three different approaches for filtering xn in systems of this type. 28

ISBN: 978-960-474-109-0

Proceedings of the 9th WSEAS International Conference on SYSTEMS THEORY AND SCIENTIFIC COMPUTATION (ISTASC '09)

• The first and obvious approach to filter xn is to use extended Kalman filters. One can expand the dynamics in Taylor series about x ˆn|n−1 as

or sigma point filters. Methods of these type have been developed independently in engineering (see [20] for example) and in weather sciences, where they are referred to as ensemble Kalman filters (see [23] for example). The basic methodology behind all these filtering methods can be explained as follows. Scalar systems are considered here for notational simplicity.

∂g (xn − x ˆn|n−1 ), ∂x ∂h h(xn ) ≈ h(ˆ xn|n−1 ) + (xn − x ˆn|n−1 ), ∂x where the partial derivatives are evaluated at x ˆn|n−1 . This gives a linear approximation to the original nonlinear state space system, which, in turn allows us to use the techniques from sections 2.1 and 2.2 for calibration and forecasting. This method works reasonably well in systems with smooth nonlinearities. In reality, the above linearized system results from linearization of a continuous time system, followed by a discretization. This filter is called the local linearization filter in [19] and has been used in [5] in parameter estimation for forward rate models. g(xn ) ≈ g(ˆ xn|n−1 ) +

1. Given Pn|n−1 , x ˆn|n−1 , generate sigma (i)

2. Then use x ˆn+1|n =

M X

(i) p(i) n g(xn )+

i=1

• In most of the filtering applications, the aim is to compute Z µ = fx (xn )p(xn |yn−j )dx,

Ã

Σxy Σ−1 yy

yn −

M X

! (i) p(i) n h(xn )

,

i=1

with covariance matrices Σxy , Σyy defined (i) using the probability weights pn . Pn+1|n is similarly computed using (8).

where µ may be the conditional mean or the conditional variance of xn and j = 0 or j = 1. Apart from the linear Gaussian case, a closed-form expression for µ is not available in general and one has to resort to an approximation of the form m X (i) µ= fx (x(i) n )pn ,

It can be seen that the methods are based on constructing covariance matrices from the samples of distribution with correct first two moments, and then using the closed-form formulae (7)-(8) for the state and covariance update. The number of samples used tends to be significantly smaller than in particle filters. Ensemble filters use random sampling from a given distribution while sigma point filters use a deterministic algorithm to generate a discrete distribution matching specified moments. A nonlinear filtering heuristic which combines the desirable features of both the ensemble filters and sigma point filters has been reported in [6], which in turn is based on a sigma point generation algorithm first proposed in [7].

i=1 (i)

where the support points xn and the correspond(i) ing discrete probability weights pn approximate p(xn |yn−j ) in an appropriate sense. In sequential Monte Carlo filters (also called particle fil(i) (i) ters), the knowledge of pn , xn , yn and the dy(i) namics of the system are used to derive pn+1 and (i)

xn+1 sequentially. Methods of this type have been used in a variety of areas including speech recognition and image processing; see [10] for a review. Applications of particle filters in financial time series have been reported in [12] and [24], among others.

5

Summary

A brief review of filtering applications in financial time series has been presented in this paper, including some results of numerical studies in the author’s research group. Filtering in nonlinear financial time series is a very active area of research worldwide. A small sample of the growing body of literature on this

• Particle filters are computationally expensive to implement and are difficult to calibrate, especially for large state space dimensions. A computationally cheaper and increasing popular alternative to particle filters is using unscented ISSN: 1790-2769

(i)

points xn and probability weights pn via a deterministic algorithm or random samP (i) (i) pling, such that M ˆn|n−1 i=1 pn xn = x PM (i) (i) 2 and i=1 pn (xn − x ˆn|n−1 ) = Pn|n−1 .

29

ISBN: 978-960-474-109-0

Proceedings of the 9th WSEAS International Conference on SYSTEMS THEORY AND SCIENTIFIC COMPUTATION (ISTASC '09)

topic is mentioned here due to space constraints. Several problems of theoretical and practical interest remain to be explored; including designing computationally efficient filtering and calibration methods for jump-diffusion models. It is hoped that this review will stimulate more interest in research on filtering problems in financial time series.

[11] J. Durbin and S.J. Koopman, Time series analysis by state space methods, Oxford University Press, 2001. [12] P. Fearnhead, Using random quasi-Monte-Carlo within particle filters, with application to financial time series, J. Comput. Graph. Stat. 14, 2005, pp. 751–769. [13] L.R. Glosten, R. Jaganathan and D.E. Runkle, On the relation between the expected value and the volatility of the nominal excess return on stocks, J Finan. 48, 1993, pp. 1779–1801. [14] G. Grimmett and D. Stirzaker, Probability and random processes, Oxford University Press, 2004. [15] A.C. Harvey, Forecasting structural time series models and the Kalman filter, Cambridge University Press, 1989. [16] R. Hawkes, Linear state models for volatility estimation and prediction, 2007, PhD thesis, Brunel University, UK. [17] R. Hawkes and P. Date, Medium term horizon volatility forecasting: A comparative study, Appl. Stoch. Model. Bus. Ind. 23, 2007, pp. 465– 481. [18] J. James and N. Webber, Interest Rate Modelling, John Wiley, 2004. [19] J.C. Jimenez and T. Ozaki, Local linearization filters for nonlinear continuous-discrete state space models with multiplicative noise, Int J. Control 76, 2003, pp. 1159–1170. [20] S.J. Julier and J.K. Uhlmann, Unscented filtering and nonlinear estimation, Proc. IEEE 92, pp. 401–422, 2004. [21] R.E. Kalman, A new approach to linear filtering and prediction problems, J. Basic Eng. 82, 1960, pp. 35-45. [22] M. Martens, Measuring and forecasting S&P 500 index-futures volatility using high frequency data, J. Futures Markets 22, 2002, pp. 497–518. [23] H. Mitchell and P. Hotekamer, Data assimilation using an ensemble Kalman filter technique, Mon. Weather. Rev. 126, 1998, pp. 796–811. [24] M.K. Pitt and N. Shepherd, Filtering via simulation: auxiliary particle filters, J. Am. Stat. Assoc. 94, 1999, pp. 590–599. [25] O. Vasicek, An equilibrium characterisation of the term structure, J. Financ. Econ. 5, 1977, pp. 177–188.

References: [1] S. H. Babbs and K. B. Nowman, Kalman filtering of generalized Vasicek term structure models, J. Financ. Quant. Anal. 34, 1999, pp. 115– 130. [2] O.E. Barndorff-Nielsen and N. Shephard, Econometric analysis of realized volatility and its use in estimating stochastic volatility models, J. Roy. Stat. Soc. B. Stat. Meth. 64, 2002, pp. 253–280. [3] T. Bollerslev, A conditionally heteroskedastic time series model for speculative prices and rates of return, Rev. Econ. Statist., 69, 1987, pp. 542– 547. [4] T. Bollerslev, Generalized autoregressive conditional heteroskedasticity, J. Econometrics 31, 1986, pp. 307–327. [5] C. Chiarella, H. Hung and Tˆo Thuy-Duong, The volatility structure of the fixed income market under the HJM framework: A nonlinear filtering approach, Comput. Stat. Data An. 53, 2009, pp. 2075–2088. [6] P. Date, L. Jalen and R. Mamon, A new algorithm for latent state estimation in nonlinear time series models, Appl. Math. Comput. 203, 2008, pp. 224–232. [7] P. Date, R. Mamon and L. Jalen, A new moment matching algorithm for sampling from partially specified symmetric distributions, Oper. Res. Lett. 36, 2008, pp. 669–672. [8] P. Date and I.C. Wang, Linear Gaussian affine term structure models with unobservable factors: calibration and yield forecasting, Eur. J. Oper. Res. 195, 2009, pp. 156–166. [9] G. De Rossi, Kalman filtering of consistent forward rate curves: a tool to estimate and model dynamically the term structure, J. Empir. Finance, 11, 2004, pp. 277–308. [10] A. Doucet and A. de Frietas and N. Gordon (editors), Sequential Monte Carlo Methods in Practice, Springer 2001. ISSN: 1790-2769

30

ISBN: 978-960-474-109-0