Adaptive Control Variates for Pricing Multi ... - Cornell University

0 downloads 0 Views 257KB Size Report
Jun 12, 2006 - We explore a class of control variates for the American option pricing problem. ... numerical techniques for solving this problem have been studied for ..... our discretization scheme carefully so that an answer is within reach. ... The result now follows from the triangle inequality and the observation that, in one.
Adaptive Control Variates for Pricing Multi-Dimensional American Options Samuel M. T. Ehrlichman

Shane G. Henderson

June 12, 2006

Abstract We explore a class of control variates for the American option pricing problem. We construct the control variates by using multivariate adaptive linear regression splines to approximate the option’s value function at each time step; the resulting approximate value functions are then combined to construct a martingale that approximates a “perfect” control variate. We demonstrate that significant variance reduction is possible even in a high-dimensional setting. Moreover, the technique is applicable to a wide range of both option payoff structures and assumptions about the underlying risk-neutral market dynamics. The only restriction is that one must be able to compute certain one-step conditional expectations of the individual underlying random variables.

1

Introduction

Efficient pricing of American options remains a thorny issue in finance. This is true despite the fact that numerical techniques for solving this problem have been studied for decades – certainly at least since the binomial tree method of Cox et al. (1979). Both tree-based methods and PDE methods are very fast in low dimensions but do not extend well to higher-dimensional problems, arbitrary stochastic processes, or arbitrary payoff structures. In the last decade or so, attention has been turned to simulation techniques to solve such problems. What makes American options much more difficult to price than their European counterparts, of course, is the embedded optimal stopping problem. Many of the early papers on using simulation to price American options therefore focus on this aspect of the computation. Longstaff and Schwartz (2001) and Tsitsiklis and Van Roy (2001) use linear regression on a fixed set of basis functions to approximate the value of continuing (i.e., not exercising) at every time step, proceeding backwards in time from expiry. This in turn produces a stopping rule: exercise only if the (known) value of exercise exceeds the (approximate) value of continuing. The continuation value approximations obtained using these methods are not perfect, but they do yield feasible stopping policies. These policies therefore yield lower bounds on the true option price. Recently, Haugh and Kogan (2004), Rogers (2002), and Andersen and Broadie (2004) showed how to compute upper bounds on the option price via a martingale duality. Bolia and Juneja (2005) observed that this same martingale, if computed by function approximation instead of simulation-within-simulation, can serve as a simulation control variate and thereby provide variance reduction. This approach can be viewed as a special case of a class of martingale control variate methods introduced by Henderson and Glynn (2002). The method introduced by Bolia and Juneja (2005) relies on finding a particular set of basis functions. To avoid internal simulations it is necessary that the basis functions be such that one can easily compute 1

certain one-step conditional expectations. In related work, Rasmussen (2005) computes a control variate for the option price by using a carefully chosen European option (or several such options), evaluated at the exercise time of the American option being priced. Laprise et al. (2006) construct upper and lower piecewise linear approximations of the value function and compute the American option price using a sequence of portfolios of European options. Their method only works in one dimension, though. Our contribution is to identify a technique for computing the control variate that possesses the desired tractability property in a quite general setting. Moreover, construction of the control variate is more or less automatic; once it has been done for one pricing problem it can be extended to other problems without much effort. We demonstrate these extensions in detail for various basket options, barrier options, and Asian options in both a Black-Scholes and stochastic volatility (Heston 1993) model. Rogers (2002) commented that the selection of the dual martingale may be “more art than science.” We contend that our approach takes a bit of the art out of this process and injects, if not science, at least some degree of automation to the procedure. The remainder of this paper is organized as follows. Section 2 gives some mathematical preliminaries, recalls the pricing algorithm of Longstaff and Schwartz (2001), defines the martingales that we work with and clarifies their linkage with the pricing problem. Section 3 discusses multivariate adaptive regression splines Friedman (1991), or MARS, and discriminant analysis, which are the techniques we adapt to construct martingales. Section 5 gives a number of examples, and we offer some conclusions in Section 6.

2

Mathematical Framework

As in most papers that discuss simulation applied to American option pricing, we actually consider the problem of pricing a Bermudan option, which differs from its American counterpart in that it may be exercised only at a finite set of points in time. To simplify notation, we assume that these times are the evenly spaced steps t = 0, . . . , T . Let (Xt : t = 0, . . . , T ) be an Rd -valued process on a filtered probability space (Ω, F , P), where F = (Ft : t = 0, . . . , T ) is the natural filtration of (Xt ). We assume (Xt ) to be Markov, enlarging the state space if necessary to ensure this. We treat X0 as deterministic, so F0 is taken to be trivial. Let r be the riskless interest rate which we assume to be constant and, to simplify notation, normalized so that if s < t, the time-s dollar value of $1 to be delivered at time t is e−r(t−s) . We assume that the market is arbitrage-free and work exclusively with a risk-neutral (pricing) measure Q with the same null sets as P. See e.g., Duffie (2001) or Glasserman (2004) for details on risk-neutral pricing. Let the known function g : {0, . . . , T } × Rd satisfy g(t, ·) ≥ 0 and Eg 2 (t, Xt ) < ∞ for all t = 0, . . . , T . We interpret g(t, Xt ) to be the value of exercising the option at time t in state Xt . Let T (t) be the set of all F -stopping times valued in {t, . . . , T }. Then the Bermudan option pricing problem is to compute Q0 , where h i Qt = sup Et e−r(τ −t) g (τ, Xτ ) , τ ∈T (t)

for t = 0, . . . , T . We recall some theory about American and Bermudan options; again see e.g., Duffie (2001) for details. The above optimal stopping problem admits a solution τt∗ ∈ T (t), so that h i ∗ Qt = Et e−r(τt −t) g τt∗ , Xτt∗ for each t = 0, . . . , T . Moreover, the Qt ’s satisfy the backward recursion QT = g(T, XT ),  Qt = max g(t, Xt ), e−r Et Qt+1 , 2

for t = 0, . . . , T − 1. Therefore, the optimal stopping times τ0∗ , . . . , τT∗ satisfy τT∗ ≡ T, ( t ∗ τt = ∗ τt+1

if g(t, Xt ) ≥ e−r Et Qt+1 , otherwise,

for t = 0, . . . , T − 1. An easy consequence of this is ∗ s < t ≤ τs∗ =⇒ τs∗ = τs+1 = · · · = τt∗ .

2.1

(1)

The Longstaff-Schwartz Method

The least-squares Monte Carlo (LSM) method of Longstaff and Schwartz (2001) provides an approximation to the optimal stopping times (τt∗ ) and hence to the option price process (Qt ). Since the resulting stopping times (τt ) are suboptimal for the original problem, the value of following such a stopping strategy provides a lower bound on the true price process. Following the notation of Andersen and Broadie (2004), we denote the lower bound process by Lt = Et e−r(τt −t) g (τt , Xτt ) . We now recall the procedure by which LSM computes the stopping times (τt ) and hence (Lt ). Let φ0 , φ1 , . . . , be a collection of functions from Rd to R such that φ0 ≡ 1 and {φi (Xt ) : i = 0, 1, . . .} form a basis for L2 (Ω, σ(Xt ), Q) for all t = 1, . . . , T . The algorithm proceeds as follows. Denote φ = (φ0 , . . . , φk ), for some fixed k. Generate a set of N paths {Xt (n) : t = 0, . . . , T ; n = 1, . . . , N }. Set τT (n) = T and LT (n) = g(T, XT (n)) for n = 1, . . . , N . Then recursively estimate αt = argmin α

N X

2

1[g(t,Xt (n))>0] (α0 φ(Xt (n)) − Lt+1 (n)) ,

(2)

n=1

( t if g(t, Xt (n)) > [α0t φ(Xt (n))]+ τt (n) = τt+1 (n) otherwise, ( g(t, Xt (n)) if τt (n) = t, Lt (n) = e−r Lt+1 (n) otherwise, for t = T − 1, . . . , 0. The regression in (2) is performed only on those paths which have positive exercise value at time t, thus (we hope) producing a better fit on the paths that actually matter than we would obtain if we performed the regression on the complete set of paths. We shall comment on this point in Section 4.1 when we describe our version of the algorithm with the control variate. The idea behind the LSM algorithm is that if τt is close to the true optimal stopping time τt∗ , then the lowerbounding value process Lt is close to Qt . It is shown in Cl´ement et al. (2002) both that the approximations τt converge to τt∗ and that the approximations Lt converge to Qt as the number of basis functions used k → ∞.

2.2

Martingales and Variance Reduction

As we have already noted, the stopping times (τt ) obtained in the LSM method are suboptimal, and so the option prices (Lt ) implied by the algorithm are lower bounds on the true option prices (Qt ). To obtain an 3

upper bound we employ a martingale duality result developed independently by Haugh and Kogan (2004) and Rogers (2002). Let π = (πt : t = 0, . . . , T ) denote a martingale with respect to F . By the optional sampling theorem, for any t ≥ 0,   Qt = ert sup Et e−rτ g(τ, Xτ ) − πτ + πτ τ ∈T (t)

  = ert sup Et e−rτ g(τ, Xτ ) − πτ + ert πt τ ∈T (t)

≤ ert Et max

s=t,...,T

(3)  −rs  e g(s, Xs ) − πs + ert πt

=: Ut . The martingale π here is arbitrary, and any such choice yields an upper bound. We next give a class of martingales from which to choose. Let ht : Rd → R be such that E|ht (Xt )| < ∞ for each t = 0, 1, . . . , d. Define π0 = 0, and for t = 1, . . . , T , set πt =

t X

e−rs (hs (Xs ) − Es−1 hs (Xs )).

(4)

s=1

Evidently (πt ) is a martingale, and can be used to obtain an upper bound on the option price as in (3). Such martingales can also be used to great effect as control variates in estimating the lower bound process. Recall that Lt = Et e−r(τt −t) g(τt , Xτt ) for each t, and so conditional on Ft , we can compute Lt by averaging conditionally independent replicates of e−r(τt −t) g(τt , Xτt ). Proposition 2.1 shows that if we choose the function ht so that ht (Xt ) = Lt for each t, then the martingale difference πτt − πt is a perfect control variate, in the sense that it is perfectly correlated with e−r(τt −t) g(τt , Xτt ), conditional on Ft . This generalizes a comment in Bolia and Juneja (2005), who show that the proposition holds in the case t = 0. Proposition 2.1. Suppose that ht is chosen so that ht (Xt ) = Lt for each t = 1, . . . , T . Then the martingale π = (πt : t = 0, . . . , T ) defined in (4) satisfies πτt − πt = e−rτt g(τt , Xτt ) − e−rt Lt for each t = 0, 1, . . . , T . Proof. First observe that on the event [τt = t], both sides of the equality we are trying to prove are zero. Hence, it suffices to prove that the result holds in the continuation region, i.e.,  1[τt >t] (πτt − πt ) = 1[τt >t] e−rτt g (τt , Xτt ) − e−rt Lt . (5) Now, if s ∈ {t + 1, . . . , T }, then h i 1[τt ≥s] Es−1 Ls = 1[τt ≥s] Es−1 Es e−r(τs −s) g (τs , Xτs ) = Es−1 1[τt ≥s] e−r(τs −s) g (τs , Xτs ) = er Es−1 1[τt ≥s] e−r(τs−1 −(s−1)) g τs−1 , Xτs−1 = er 1[τt ≥s] Ls−1 , where the penultimate equality uses the fact that s < t ≤ τs =⇒ τs = τs+1 = · · · = τt ,

4



analogous to (1). Therefore, T X

πτt − πt =

1[τt ≥s] e−rs (Ls − Es−1 Ls )

s=t+1 T X

=

1[τt ≥s] e−rs (Ls − er Ls−1 )

s=t+1 T X

=

−rs

1[τt ≥s] e

Ls −

T −1 X

1[τt ≥s+1] e−rs Ls

s=t

s=t+1 −rT

= 1[τt =T ] e

LT +

T −1 X

 e−rs Ls 1[τt ≥s] − 1[τt ≥s+1] − 1[τt >t] e−rt Lt

s=t+1 T X

=

e−rs Ls 1[τt =s] − 1[τt >t] e−rt Lt .

s=t+1

So 1[τt >t] (πτt − πt ) = 1[τt >t]

T X

1[s=τt ] e−rs Ls − 1[τt >t] e−rt Lt

s=t+1 −rτs

= 1[τt >t] e

 Lτs − e−rt Lt ,

proving (5). Proposition 2.1 shows that conditional on Ft , we can estimate Lt with zero (conditional) variance by e−r(τt −t) g(τt , Xτt ) − ert (πτt − πt ). Since F0 is the trivial sigma field, by taking t = 0 we get a zero variance estimator of L0 , the lower bound on the option price at time 0. Of course, we cannot set ht (Xt ) ≡ Lt , since we are trying to compute Lt in the first place. But this ˆ t } such that observation motivates us to search for a set of functions {h ˆ t (Xt ) ≈ Lt h ˆ t (Xt ), and ˆ t for h for each t. (In this paper the approximation is in the mean-square sense.) Let us write L let the induced martingale be π ˆ = (ˆ πt : t = 0, 1, . . . , T ), where π ˆ0 = 0 and π ˆt =

t X

  ˆ s (Xs ) − Es−1 L ˆ s (Xs ) . e−rs L

s=1

We use the approximately optimal martingale π ˆ evaluated at time τ0 as a control variate in estimating L0 . We also use π ˆ to compute an upper bound on the true option price, as in (3). Andersen and Broadie (2004) show that the martingale π is the optimal one to use in computing the upper bound, and indeed that the inequality in (3) would actually be an equality if we had τt = τt∗ almost surely, for t = 0, . . . , T . This motivates the use of the approximate martingale π ˆ to compute the upper bound. We note that the same observation is made in Bolia and Juneja (2005). To compute the martingale π ˆ , we need to be able to compute the conditional expectation Es−1 hs (Xs ) efficiently. We restrict the class of functions {ht } considered so that these conditional expectations can be 5

evaluated without the need to resort to further simulation, in the same spirit as Bolia and Juneja (2005) and Rasmussen (2005). Bolia and Juneja (2005) use a particular parametric form for ht which is easily fit by least squares, but is tightly coupled with the specific stochastic process considered. Rasmussen (2005) chooses ht to be the value of a European option, or a combination of several European options, that are highly correlated with the American option being priced. Indeed, in many examples Rasmussen (2005) simply chooses h to be given by ht (Xt ) = Et g(T, XT ) so that πt = e−rt Et g(T, XT ) − Eg(T, XT ). The success of their method, therefore, depends on the ability to find particular European options which can be easily priced and which correlate well with the American option in question. Our method also involves the pricing of European options in a sense, although not necessarily options on traded assets. However, these options, which are chosen using MARS, are automatically selected and in general will be priced easily. We now explore MARS.

3

MARS and Extensions

Multivariate adaptive regression splines (Friedman 1991), or MARS, is a nonparametric regression technique that has enjoyed widespread use in a variety of applications since its introduction. For example, Chen et al. (1999) use MARS to approximate value functions of a stochastic dynamic programming problem, although for a different purpose than we do here. Given observed responses y(1), . . . , y(N ) ∈ R and predictors x(1), . . . , x(N ) ∈ Rd , MARS fits a model of the form y ≈ fˆ(x) = α0 +

M1 X

α1,m f1,m (x) +

m=1

M2 X

α2,m f2,m (x) + · · · +

Mp X

αp,m fp,m (x).

m=1

m=1

Each function f1,m takes one of two forms, h i h i  f1,m (x) ∈ x(i) − x(i) (n) , x(i) (n) − x(i) , +

+

for some i = 1, . . . , d, and some n = 1, . . . , N . Here, x(i) denotes the i’th coordinate of x. Each function fj,m for j > 1 is a product of functions used in previous sums so that the total degree is j. In our setting, we take p = 1 so the fitted model can be written y ≈ fˆ(x) = α0 +

Ji d X X

 h i αi,j qi,j x(i) − ki,j ,

(6)

+

i=1 j=1

where qi,j ∈ {−1, +1} and the knots ki,j are chosen from the data: ki,j ∈ {x(i) (n) : n = 1, . . . , N }, for each i = 1, . . . , d and each j = 1, . . . , Ji . A function with the form (6) may be called an additive linear spline. We present a simplified version of the MARS fitting algorithm here, as we are only concerned with the p = 1 case. Full details are given in Friedman (1991), and a summary can be found in Hastie et al. (2001). MARS produces a fitted model by proceeding in a stepwise manner. At each step, the algorithm attempts to add each possible pair of basis functions1      x(i) − x(i) (n) , x(i) (n) − x(i) +

+

1 In fact, the algorithm sorts the x (i)’s and skips a small number of observations between each knot it considers. This helps n to prevent over-fitting and offers some computational benefits as well.

6

in turn for n = 1, . . . , N and i = 1, . . . , d. It adds a basis function if the improvement in fit from adding that function exceeds a given threshold, up to a specified number of basis functions Mmax . Upon completion of this procedure, the algorithm prunes some of the basis functions it has selected if doing so will improve the weighted mean-square error criterion 2 PN  1 ˆ(xn ) y − f n n=1 N , (1 − CMN1 +1 )2 where C is a specified penalty parameter. Friedman (1991) argues that the computation time of the fitting algorithm has an upper bound proportional 4 to dN Mmax . Our implementation of MARS takes Mmax = 21∨(2d+1), so for d < 10 the computational time is simply proportional to dN ; for higher dimensions, it is proportional to d5 N . However, in our experiments, we have found that the threshold criterion is often met before Mmax basis functions are even considered, so even though the upper bounds discussed above are valid, they may be quite pessimistic.

3.1

Computing the Approximate Martingale

Suppose we have used MARS to fit ˆ t (x) = α0 + h

Ji d X X

 h i αi,j qi,j x(i) − ki,j

i=1 j=1

+

for each time step t = T, . . . , 1. Then for each t = 1, . . . , T, the t’th increment of the resulting approximate optimal martingale (ˆ πt ) is given by   Ji d X h i h i  X (i) (i) −rt αi,j qi,j Xt − ki,j − Et−1 qi,j Xt − ki,j , (7) π ˆt − π ˆt−1 = e +

i=1 j=1

+

where we have suppressed the dependence of the fitted parameters on the time step t in the notation. Having simulated, say, Xs (1), . . . , Xs (N 0 ), s = 1, . . . , T , it is evident how to compute the first term inside the sum in (7). The second term can be computed explicitly as long as we can compute expressions of the form    (i) Et−1 Xt − k . (8) +

But this is nothing but the expected value of a vanilla European call option on a single underlying random variable. Such conditional expectations can often be computed very easily. Even if the underlying random variables have complex dynamics, such as arises in a stochastic volatility model, we may be able to simplify the problem enough by selecting our discretization scheme carefully so that an answer is within reach. Typically, this will involve replacing the state variable Xt with some transformation of log Xt . See Section 5 for specific examples of how we compute the conditional expectation.

3.2

An Extension of MARS

The function approximation (6) is separable in {x(i) : i = 1, . . . , d}. Of course, the function Lt = Lt (Xt ) we are trying to approximate will not be separable in general. Indeed, even if the payoff function g is separable, we cannot expect that Lt (Xt ) will be separable except for the case t = T . For example, consider the case t = T − 1. Here,  QT −1 = QT −1 (XT −1 ) = max g (T − 1, XT −1 ) , e−r ET −1 [g(T, XT )] . 7

So even if g is separable, and even if ET −1 g(T, XT ) is separable (which it may not be if there is dependence in the components of XT ), QT −1 will typically not be separable, as the maximum of two separable functions need not be separable. Intuitively, separability of g is equivalent to the European version of the option being decomposable into options on the individual components of X. Separability of Qt for t < T , on the other hand, would mean that the decision of whether to exercise early could be made separately for these options, which is not the case. Since Lt can be made arbitrarily close to Qt by employing sufficiently many basis functions, it follows that Lt will not be separable either. Thus, the best we can ever hope for with the approximation (6) is to obtain an approximation to the projection of Lt = Lt (·) on the space of separable functions. In particular,  2 ˆ t − Lt may be large no matter how much effort is spent on computing L ˆt. E L In order to at least partially address this issue, we consider a more general form of the approximating multivariate linear spline, y ≈ fˆ(x) = α0 +

J X

  αj a0j x − kj + ,

(9)

j=1

where we have additional parameters aj ∈ Rd , j = 1, . . . , d, to estimate. This is quite similar to the form (6), except that now we consider linear combinations of the x’s as predictors. One can think of the aj vectors as giving a reparameterization of the state variables. If we a priori choose the aj ’s, then the problem essentially reduces to the previous one. But this would require user intervention. We prefer an automated procedure, although one can certainly reparameterize manually before invoking our approach. The following proposition indicates that it is possible to achieve good function approximations with expressions of the form (9). Proposition 3.1. Suppose X is an Rd -valued random variable, and f : Rd → R satisfies Ef 2 (X) < ∞. Then for any  > 0 there is a function fˆ of the form (9) such that E(f (X) − fˆ(X))2 < .  Proof. Jones (1987) shows that there exists a sequence of vectors am ∈ Rd : m = 1, 2, . . . such that  E f (X) −

m X

2 gj a0j X  −→ 0 

j=1

as m → ∞. Here, the functions (gm : R → R : m = 1, 2, . . . , ) are given recursively by   m−1 X gm (z) = E f (X) − gj (a0j X) a0m X = z  . j=1

 2 Pm Accordingly, choose m sufficiently great such that E f (X) − j=1 gj (a0j X) < /2. By induction, Egj2 (a0j X) < ∞ for j = 1, . . . , m. Since continuous functions are dense in L2 (Rudin 1987, Theorem 3.14), we conclude from the Stone-Weierstrass Theorem that there exist linear splines gˆj : R → R such that 2 E gj (a0j X) − gˆj (a0j X) < /2j+1 , for each j = 1, . . . , m. The result now follows from the triangle inequality and the observation that, in one dimension, any linear spline can be written in the form (9).

8

The expression (9) can be thought of as a specific example of a projection pursuit regression fit (Friedman and Stuetzle 1981; Friedman et al. 1983) using truncated linear splines as its univariate basis functions. Projection pursuit regression methods typically estimate the linear directions a1 , . . . , aJ and the remaining parameters simultaneously. This requires numerical optimization and can be slow. We instead adopt the simpler approach of Zhang et al. (2003), who first identify candidate aj ’s and then run the MARS fitting algorithm with x = (x(1), . . . , x(d)) replaced by (a01 x, . . . , a0J x). Zhang et al. (2003) provide two methods for selecting the ar ’s. We consider the method that uses linear discriminant analysis, or LDA (Fisher 1936). Given responses y(1), . . . , y(N ) ∈ R and predictors x(1), . . . , x(N ) ∈ Rd , choose some n ˜ ∈ {1, . . . , N } and define the corresponding LDA direction to be   X X 1 1 x(n) − x(n) , Sx−1  |{n : y(n) < y(˜ n)}| N − |{n : y(n) < y(˜ n)}| y(n)0] er(t+1) . We now provide evidence that this actually improves (or at least, does not worsen) our stopping time estimates. For the time being, let us ignore the term Zt . Evidently,    Et e−rτt+1 g τt+1 , Xτt+1 = Et e−rτt+1 g τt+1 , Xτt+1 − π ˆτt+1 − π ˆt , (12) since π ˆ is a martingale. Moreover, by Proposition 2.1,  πτt+1 − πt = (πτt+1 − πt+1 ) + (πt+1 − πt ) = e−rτt+1 g τt+1 , Xτt+1 − e−r(t+1) Et Lt+1 , so    Vart e−rτt+1 g τt+1 , Xτt+1 − πτt+1 − πt = Vart [Et Lt+1 ] = 0, where Vart [·] denotes the conditional variance Et (·)2 − Et2 (·). Therefore,       Vart e−rτt+1 g τt+1 , Xτt+1 − π ˆτt+1 − π ˆt = Vart πτt+1 − πt − π ˆτt+1 − π ˆt . Taking expectations and invoking the variance decomposition formula gives    E Vart e−rτt+1 g τt+1 , Xτt+1 − π ˆτt+1 − π ˆt       = Var πτt+1 − πt − π ˆτt+1 − π ˆt − Var Et πτt+1 − πt − π ˆτt+1 − π ˆt    = Var πτt+1 − πt − π ˆτt+1 − π ˆt . On the other hand, h i  E Vart e−rτt+1 g τt+1 , Xτt+1 = E Vart πτt+1 − πt + e−r(t+1) Et Lt+1   = E Vart πτt+1 − πt   = Var πτt+1 − πt . But π ˆ is a projection of π, and so      E Vart e−rτt+1 g τt+1 , Xτt+1 − π ˆτt+1 − π ˆt ≤ E Vart e−rτt+1 g τt+1 , Xτt+1 .

(13)

Equation (12) says that the regressor has the same conditional bias regardless of the presence of the control variate; equation (13) says that on average, the regressor with the control variate has lower conditional variance than the one without. Therefore, the control variate should improve the quality of the approximate stopping times. When the term Zt is reintroduced, it is not clear that these properties are maintained. Although the Ft -measurability of Zt implies    Et Zt e−rτt+1 g τt+1 , Xτt+1 = Et Zt e−rτt+1 g τt+1 , Xτt+1 − π ˆτt+1 − π ˆt , 10

so the conditional bias is still unchanged, the inequality corresponding to (13) may not hold and so we may not actually reduce variance by including the control variate. The point is that even though π ˆ is a projection of π, we cannot conclude that variance reduction occurs when we are restricted to the subset [g(t, Xt ) > 0]. Nevertheless, it is reasonable to assume that there is variance reduction except perhaps when the option is deep out of the money so that the event [g(t, Xt ) > 0] occurs with very low probability. We have found in our numerical experiments that there is a (modest) improvement in using the control variate to estimate the stopping time; we continue to perform the regression using (11), retaining the indicator function as a heuristic.

4.2

The Algorithm

We establish the notation we use in the description of the algorithm. Let φ = (φ0 = 1, φ1 , . . . , φk ) be a fixed set of basis functions which will be used in estimating the stopping time, as in (2). The fitted coefficients of this regression will be denoted αt ∈ Rk+1 , for t = 0, . . . , T − 1. We will denote by θ t the complete set of parameters specifying the fitted extended MARS model (9) for the approximate value function ˆ t (·) = h ˆ t (·; θ t ), for t = 1, . . . , T . The variables Y (n), n = 1, . . . , N1 , keep track of the cash flow along each h path; the variables cv(n) are the corresponding values of the control variate. Algorithm 1 Phase One: Fit Stopping Times and Control Variate 1: simulate (X0 (n), . . . , XT (n) : n = 1, . . . , N1 ) 2: Y (n) ← g(T, XT (n)), for n = 1, . . . , N1 3: cv(n) ← 0 4: for t = T − 1, . . . , 0 do 2 PN1  ˆ t+1 (n); θ) 5: θ t ← argminθ n=1 Y (n) − h(X h i ˆ t+1 (n); θ t ) − E h(X ˆ t+1 ; θ t ) | Xt (n) , for n = 1, . . . , N1 6: cv(n) ← cv(n) + h(X PN1 2 7: αt ← argminα n=1 1[g(t,Xt (n))>0] e−r (α0 φ(Xt (n)) − (Y (n) − cv(n))) 8: for n = 1, . . . , N1 do 9: if g(t, Xt (n)) > [α0t φ(Xt (n))]+ then 10: Y (n) ← g(t, Xt (n)); cv(n) ← 0 11: else 12: Y (n) ← e−r Y (n); cv(n) ← e−r cv(n) 13: end if 14: end for 15: end for

Algorithm 1 describes the first phase of the pricing method where the approximate stopping times and the control variate parameters are fit. Note that the “minimization” in line 5 of Algorithm 1 is not a true minimization, due to the adaptive nature of the MARS fitting procedure. Also note that the fit in line 7 is trivial when t = 0 since we have assumed that X0 is constant. Algorithm 2 describes the second phase wherein the lower and upper bounds on the option price are computed.

11

Algorithm 2 Phase Two: Compute Option Price Lower and Upper Bounds 1: simulate new paths  (X0 (n), . . . , XT (n)h: n = 1, . . . , N2 ) i Pt −rs ˆ ˆ s ; θ s ) | Xs−1 (n) , for n = 1, . . . , N2 , t = 1, . . . , T 2: π ˆt (n) ← s=1 e h(Xs (n); θ s ) − E h(X 3: 4: 5:

5

τ0 (n) ← T ∧ min{t = 0, . . . , T − 1 : g(t, Xt (n)) > α0t φ(Xt (n))}, for n = 1, . . . , N2 PN2 −rτ0 (n) e g(τ0 (n), Xτ0 (n) (n)) − π ˆτ0 (n) (n) L0 ← N12 n=1 PN2 1 U0 ← N2 n=1 maxt=0,...,T (g(t, Xt (n)) − π ˆt (n))

Numerical Examples

In this section, we describe how we have applied this algorithm to several multidimensional American option pricing problems, and we provide numerical results. In particular, we show how the conditional expectations (8) are computed. All computations were performed using the R language (R Development Core Team 2005). The MARS fitting algorithm was originally developed in the S language by Hastie and Tibshirani; it was ported to R by Leisch et al. (2005). R is an interpreted language and thus can be fairly slow. Additionally, raw computation times may reflect details of implementation (e.g., R’s garbage collection routines) and mask information that would be relevant in evaluating our algorithm. For this reason, we report the ratio of the computation time of the na¨ıve estimator with that of our estimator for a fixed degree of accuracy, which we now explain. In all experiments we fix the run lengths for Phase 1 and Phase 2 to 10,000 and 20,000 respectively, using common random numbers across experiments. We record the following quantities. ˆ t functions. r1 The time required in Phase 1 for both the LSM method and for MARS to fit the L r2 The time required in Phase 2 to compute the MARS-based estimators of the lower and upper bounds. r˜1 The time required in Phase 1 for the LSM method alone. r˜2 The time required in Phase 2 to compute the na¨ıve estimator of the lower bound. s2 An estimate of the variance of the MARS-based estimator of the lower bound. s˜2 An estimate of the variance of the na¨ıve estimator of the lower bound. ˆ 0 The MARS-based estimate of the lower bound. L We then compute the Phase 2 run lengths (˜ n and n for the na¨ıve and MARS-based estimators respectively) required to achieve a confidence interval half-width for the lower bound that is approximately 0.1% of the lower bound estimate. Hence n ˜=

1.962 s˜2 and ˆ2 0.0012 L 0

1.962 s2 n= . ˆ2 0.0012 L 0

We then compute approximations for the computational time corresponding to these run lengths, viz n ˜ r˜2 and 20, 000 n R = r1 + r2 . 20, 000 ˜ = r˜1 + R

12

Finally, we report ˜ TR = R/R as an estimate of the speed-up factor (or time reduction) of the MARS-based estimator over the na¨ıve estimator. We also report VR = s˜2 /s2 as the variance reduction factor. The former measure represents the true improvement in efficiency of the MARS-based estimator over the na¨ıve estimator, while the latter measure indicates the variance reduction without adjustment for computation time. In all examples, VR and TR are reported to two significant figures.

5.1

Asian Options

We begin by pricing Bermudan-Asian put options, under both the Black-Scholes and Heston (1993) models. In the Black-Scholes case, we have (Xt : t = 0, . . . , T ) = ((St , At ) : t = 0, . . . , T ), where S0 is given and S1 , . . . , ST are generated according to   1 St = St−1 exp r − σ 2 + σWt , 2 for independent standard normal variates W1 , . . . , WT . The average process (At : t = 0, . . . , T ) is given by A0 = 0 and, for t ≥ 1,       t 1 t−1 1 1 t−1 1X Ss = St + At−1 = St−1 exp r − σ 2 + σWt + At−1 . (14) At = t s=1 t t t 2 t The averaging dates are assumed to coincide with the possible exercise dates, excluding the date t = 0. In continuous time, the Heston (1993) model is given by p (1) dSt = µSt dt + Vt dWt ,   p p (1) (2) , dVt = κ(θ − Vt )dt + Vt σ ρdWt + 1 − ρ2 dWt

(15)

where W (1) and W (2) are independent Brownian motions. We approximate (15) in discrete time by applying the first-order Euler discretization to the logarithms of St and Vt . See Glasserman (2004, pp. 339–376) for details. This gives (Xt : t = 0, . . . , T ) = (St , Vt , At : t = 0, . . . , T ) where S0 and V0 are given, and ! p (1) (2) κθ 1 ρσWt + 1 − ρ2 σWt 2 p −κ− σ + Vt = Vt−1 exp Vt−1 2Vt−1 Vt−1 (16)   p 1 (1) , St = St−1 exp r − Vt−1 + Vt−1 dWt 2 (1)

(1)

(2)

(2)

where W1 , . . . , WT , W1 , . . . , WT are independent standard normal variates. (The process (At : t = 0, . . . , T ) is still given by (14).) The scheme (16) is not an exact discretization of (15); we ignore the discretization error and henceforth consider (16) to be the true dynamics of the underlying. The payoff function of the Bermudan-Asian put is given by g(0, ·) ≡ 0 and g(t, Xt ) = (K − At )+ 13

for t ≥ 1. Let us consider the Heston case, as the Black-Scholes case is an easy specialization thereof. As mentioned in Section 3.1, we apply the MARS algorithm not to Xt but to a transformation of Xt which replaces St and Vt by their logarithms and At by the logarithm of the geometric average 1 A˜t = exp t

t X

log Ss .

s=1

We do not include the LDA directions in the Asian case. This yields an approximation ˆt = L

JS X

JV JA    X X       + + . αS,j qS,j log St − kS,j αV,j qV,j log Vt − kV,j αA,j qA,j log A˜t − kA,j +

j=1

+

j=1

+

j=1

The marginal conditional distributions of log St , log Vt , and log A˜t given Ft−1 are Gaussian, with mean and variance given by     log St−1 + r − 12 Vt−1 St 1 2   Et−1 log  Vt  =   log Vt−1 + κθ/Vt−1 − κ − 2 σ /Vt−1   , 1 1 ˜ ˜ At t (t − 1) log At−1 + log St−1 + r − 2 Vt−1 (17)     St Vt−1 . Vart−1 log  Vt  =  σ 2 /V  t−1 2 ˜ 1/t Vt−1 At (The full covariance matrix is irrelevant for our purpose.) This allows us to compute the conditional expecˆ t easily. tations Et−1 L Table 1 shows our computational results for Bermudan-Asian options. For all examples, we considered an option maturing in 6 months with monthly exercise/averaging dates; the annualized risk-free rate was 12r = .06; the initial asset price was S0 = 100. For the Heston examples, the (annualized) model parameters were κ = 1.5, σ = .2, θ = .36, ρ = −.75, V0 = .4. The stopping times were fit using the polynomials of degree up to 4 in St , At , and (for the Heston model) Vt , for t = 1, . . . , T − 1. Table 1: Asian Option Results

BS BS BS BS

Model (σ = .3) (σ = .3) (σ = .6) (σ = .6) Heston Heston

K 95 115 95 115 95 115

Na¨ıve L0 2.77 (.07) 15.92 (.14) 7.88 (.15) 20.57 (.23) 5.04 (.11) 17.73 (.11)

MARS L0 2.73 (.00) 15.86 (.01) 7.80 (.01) 20.48 (.02) 4.96 (.01) 17.66 (.01)

MARS U0 2.78 (.00) 15.95 (.01) 7.94 (.01) 20.65 (.01) 5.06 (.01) 17.78 (.01)

VR 210 230 190 230 150 200

TR 85 44 71 56 61 50

Parenthesized values are 95% confidence interval half -widths.

In these examples, the reduction in variance is dramatic, ranging from about 150 times to 250 times variance reduction. Similarly, for an approximate 95% confidence interval with (relative) width .001, one needs to do about 50 times more work with the na¨ıve estimator than with the one using the MARS-based control variate. Finally, observe that the closeness of the (MARS) estimates of L0 and U0 suggests that the stopping time found by the LSM algorithm is quite good.

14

5.2

Basket Options

Next, we consider options on baskets of d assets whose prices are given by (Xt : t = 0, . . . , T ) = (St (i) : t = 0, . . . , T ; i = 1, . . . , d). Specifically, we test put options on the maximum and on the average of the assets, which have respective payoff functions ! d  1X d . x(i) gmax (t, x) = K − ∨i=1 x(i) + , gavg (t, x) = K − d i=1 +

The underlying assets are assumed to follow the multidimensional Black-Scholes model, which is discretized as   1 2 St (i) = St−1 (i) exp r − σi + σi Wt (i) , (18) 2 for i = 1, . . . , d, where Wt = (Wt (1), . . . , Wt (d)) is a sequence of independent (in time) multivariate normal random variates with mean zero, unit variance, and a specified correlation matrix. To simplify things, we assume that there is a single constant ρ ∈ (−1, 1) such that Cor (Wt (i), Wt (j)) = ρ for all 1 ≤ i < j ≤ d, and we take σ1 = · · · = σd = σ. We again take the annualized risk-free rate 12r to be .06 and the initial asset prices to be S0 (i) = 100, i = 1, . . . , d. The dimension d of the problem takes the values d = 2, 5, 10, 20. For the payoff function gavg , we take the basis functions φ for fitting the stopping time τ to be the polynomials of degree up to two in the d asset prices. For the function gmax , we take the basis functions to be the polynomials of degree up to two in the order statistics of the asset prices, which is similar to the choice of basis functions for such options in Longstaff and Schwartz (2001). We test both the control variate based on MARS and the control variate based on LDA-MARS as in section 3.2. For the LDA-MARS tests, we partition the sample paths at each time step t into three groups of approximately equal size corresponding to low, medium, and high values of g(τt , Xτt ), and take the first two eigenvalues of the matrix (10), resulting in a total of nine LDA directions. (These are included in addition to, not instead of, the canonical directions.) We apply the MARS and LDA-MARS fitting algorithms to the logarithm of St . The conditional distribution of log St given Ft−1 is multivariate Gaussian with mean log St−1 + r − 21 σ 2 , variance σ 2 , and correlation matrix given by Cort−1 (log St (i), log St (j)) = ρ for 1 ≤ i < j ≤ d. Therefore, for a direction a ∈ Rd , kak = 1, the conditional distribution of a0 log St given Ft−1 is Gaussian with mean and variance d X



 1 2 a(i) log St−1 (i) + r − σ , Et−1 a log St = 2 i=1   !2 d d X X Vart−1 a0 log St = σ 2 a0 Ca = σ 2 ρ a(i) + (1 − ρ) a2 (i) , 0

i=1

i=1

where C is the d × d matrix with 1 on the diagonal and ρ off the diagonal. This allows us to compute the conditional expectations (8). For the put on the average, the variance reduction using LDA-MARS is quite good, resulting in a speed-up factor of between about 5 and 50 for the uncorrelated case and between about 25 and 60 for the correlated case. There is some degradation of performance as the dimension increases from 2 to 20. It is natural to expect that LDA-MARS should perform significantly better than MARS for a put on the average of stocks, as there is one linear direction (namely, a = (1, . . . , 1)) that is likely to capture much of the variation in the value function. It is also plausible that the effect of the control variate is stronger when the assets are correlated, and that the degradation with dimension is smaller in that case as well, since under this correlation structure much of the variance of the assets’ returns is driven by a single factor. 15

Table 2: Basket Option Results: Put on Average d

σ

Na¨ıve L0

MARS L0

LMARS L0

LMARS U0

MVR

MTR

LMVR

LMTR

Uncorrelated asset prices (ρ = 0).

2 2 5 5 10 10 20 20

.3 .6 .3 .6 .3 .6 .3 .6

6.23 14.68 3.38 8.84 2.05 5.83 1.07 3.50

(.11) (.22) (.06) (.14) (.04) (.10) (.02) (.06)

6.25 14.67 3.38 8.80 2.04 5.81 1.12 3.59

(.05) (.10) (.04) (.08) (.03) (.06) (.02) (.04)

6.24 14.63 3.41 8.88 2.04 5.80 1.15 3.63

(.01) (.03) (.01) (.04) (.01) (.03) (.01) (.03)

6.38 15.13 3.66 9.85 2.25 6.80 1.31 4.49

(.01) (.03) (.01) (.03) (.01) (.03) (.00) (.02)

4.1 4.7 2.7 3.2 2.4 2.7 2.0 2.3

3.6 4.1 2.2 2.6 1.8 2.1 1.6 1.9

79.0 50.0 25.0 13.0 18.0 8.2 14.0 6.2

51.0 32.0 19.0 9.5 13.0 5.9 11.0 4.9

(.04) (.08) (.04) (.08) (.03) (.07) (.03) (.06)

7.80 17.53 6.65 15.19 6.23 14.27 6.00 13.80

(.01) (.03) (.01) (.03) (.01) (.03) (.01) (.03)

7.89 17.83 6.75 15.59 6.33 14.63 6.11 14.13

(.01) (.02) (.01) (.03) (.01) (.03) (.01) (.03)

10.0 10.0 8.4 8.5 9.1 9.4 11.0 11.0

8.7 8.9 6.8 7.0 7.4 7.5 8.7 8.3

110 86 96 53 85 57 46 43

64 49 58 34 50 33 34 27

Correlated asset prices (ρ = .45).

2 2 5 5 10 10 20 20

.3 .6 .3 .6 .3 .6 .3 .6

7.75 17.50 6.58 15.12 6.13 14.15 5.69 13.42

(.13) (.25) (.11) (.23) (.10) (.21) (.10) (.20)

7.80 17.55 6.65 15.19 6.22 14.25 5.96 13.74

Parenthesized values are 95% confidence interval half -widths. MTR/LMTR = MARS/LMARS time reduction.

MVR/LMVR = MARS/LMARS variance reduction.

Table 3: Basket Option Results: Put on Max d

σ

Na¨ıve L0

MARS L0

LMARS L0

LMARS U0

MVR

MTR

LMVR

LMTR

Uncorrelated asset prices (ρ = 0).

2 2 5 5

.3 .6 .3 .6

3.83 9.83 0.39 1.54

(.08) (.19) (.02) (.06)

3.83 9.83 0.39 1.54

(.06) (.11) (.02) (.05)

3.86 9.84 0.39 1.54

(.03) (.05) (.02) (.05)

4.68 10.99 0.81 2.94

(.02) (.04) (.02) (.04)

2.2 2.6 1.2 1.3

1.7 2.0 0.87 0.94

8.1 13.0 1.6 1.8

5.7 8.8 1.0 1.2

(.06) (.11) (.04) (.09) (.03) (.07) (.02) (.05)

5.40 12.83 2.30 6.16 1.10 3.22 0.45 1.52

(.03) (.05) (.03) (.06) (.02) (.06) (.02) (.04)

6.17 13.98 3.08 7.85 1.70 4.67 0.94 2.71

(.02) (.04) (.02) (.05) (.02) (.04) (.02) (.04)

3.5 4.1 2.2 2.4 1.8 2.0 1.6 1.8

2.8 3.3 1.7 2.0 1.4 1.6 1.3 1.5

13.0 19.0 4.5 5.0 3.0 3.3 2.0 2.3

9.0 12.0 2.9 3.4 2.0 2.4 1.6 1.8

Correlated asset prices (ρ = .45).

2 2 5 5 10 10 20 20

.3 .6 .3 .6 .3 .6 .3 .6

5.32 12.75 2.29 6.07 1.05 3.13 0.44 1.46

(.11) (.22) (.06) (.15) (.04) (.10) (.03) (.07)

5.38 12.84 2.30 6.17 1.10 3.23 0.45 1.50

Parenthesized values are 95% confidence interval half -widths. MVR/LMVR = MARS/LMARS variance reduction. MTR/LMTR = MARS/LMARS time reduction. For uncorrelated assets, d = 10 and d = 20 resulted in worthless options so those results are not reported.

16

The results are much less dramatic for the case of the put option on the maximum. Indeed, MARS (without LDA directions) performs quite poorly in the uncorrelated case. This is most likely due to the fact that the ˆ are poor approximations for the true payoff function gmax is highly non-separable, so the fitted functions L value functions L. In fact, not only is gmax non-separable, but it cannot even be represented exactly in the form (9). Thus, even when LDA directions are used, and even in the correlated assets case, the performance degrades quickly to a variance reduction factor of only around 2 or 3 as the dimension increases. Still, the method seems to be able to provide about a 50% decrease in computation time even in this case.

5.3

Barrier Options

Finally we test our method on a variety of barrier options: the up-and-out call, the up-and-out put, and the down-and-out put, all on a single asset. Unlike a vanilla Bermudan call, a Bermudan up-and-out call on an asset that does not pay dividends may have an optimal exercise policy other than the trivial one τ0 = T . Again, we test both the Black-Scholes and the Heston models. Although it would be possible to accommodate the path dependence of barrier options by expanding the state space, we adopt a different approach. Let B ⊂ Rd denote the region in which the option is knocked out. Assume the payoff function g satisfies g(·, x) ≡ 0 for all x ∈ B. For each t = 0, . . . , T , let νt be the first hitting time of B between t and T , or T + 1 if there is no such hitting time, i.e., νt = inf{s = t, . . . , T + 1 : (s, Xs ) ∈ {t, . . . , T } × B ∪ {T + 1} × Rd }. We now redefine our value function to be h i Qt = sup Et e−r(τ ∧νt −t) g (τ ∧ νt , Xτ ∧νt ) .

(19)

τ ∈T (t)

The stopping times τt∗ solving (19) satisfy τT∗ ∧ νT = T, ( t ∗ τ t ∧ νt = ∗ τt+1

if νt = t or if g(t, Xt ) ≥ e−r Et Qt+1 , otherwise,

for t = 0, . . . , T − 1. The suboptimal stopping times (τt : t = 0, . . . , T ) are defined analogously to those in Section 2.1. In this setting the analogous martingale π satisfies πτt ∧νt − πt = e−rτt ∧νt g (τt ∧ νt , Xτt ∧νt ) − e−rt Lt , similar to Proposition 2.1, and we have Q0 ≤ E max [g (t ∧ ν0 , Xt∧ν0 ) − πt∧ν0 ] . t=0,...,T

In other words, the martingale π evaluated only as far as the hitting time of the knock-out region, both for computing the control variate and the upper bound. This leads to Algorithms 3 and 4, which are modifications of Algorithms 1 and 2, respectively. The only difference between Algorithms 1 and 3 occurs on line 9, which in the barrier option case says to zero out the cash flow and the control variate upon exercise or knockout. Algorithm 4 differs from Algorithm 2 in that the exercise time τ0 is replaced with the minimum of the exercise time and knock out time τ0 ∧ ν0 . In our numerical experiments on Bermudan barrier options, the underlying dynamics of (Xt : t = 0, . . . , T ) = (St : t = 0, . . . , T ) or (Xt : t = 0, . . . , T ) = ((St , Vt ) : t = 0, . . . , T ) follow (18) or (16) accordingly, and the 17

Algorithm 3 Phase One (Barrier Option): Fit Stopping Times and Control Variate 1: simulate (X0 (n), . . . , XT (n) : n = 1, . . . , N1 ) 2: Y (n) ← g(T, XT (n)), for n = 1, . . . , N1 3: cv(n) ← 0 4: for t = T − 1, . . . , 0 do 2 PN1  ˆ t+1 (n); θ) Y (n) − h(X 5: θ t ← argminθ n=1 h i ˆ t+1 (n); θ t ) − E h(X ˆ t+1 ; θ t ) | Xt (n) , for n = 1, . . . , N1 6: cv(n) ← cv(n) + h(X PN1 2 1[g(t,Xt (n))>0] e−r (α0 φ(Xt (n)) − (Y (n) − cv(n))) 7: αt ← argminα n=1 8: for n = 1, . . . , N1 do 9: if g(t, Xt (n)) > [α0t φ(Xt (n))]+ or Xt (n) ∈ B then 10: Y (n) ← g(t, Xt (n)); cv(n) ← 0 11: else 12: Y (n) ← e−r Y (n); cv(n) ← e−r cv(n) 13: end if 14: end for 15: end for

Algorithm 4 Phase Two (Barrier Option): Compute Option Price Lower and Upper Bounds 1: simulate new paths  (0 (n), . . . , XT (n) : hn = 1, . . . , N2 ) i Pt −rs ˆ ˆ s ; θ s ) | Xs−1 (n) , for n = 1, . . . , N2 , t = 1, . . . , T 2: π ˆt (n) ← s=1 e h(Xs (n); θ s ) − E h(X 3: 4: 5: 6:

τ0 (n) ← T ∧ min{t = 0, . . . , T − 1 : g(t, Xt (n)) > α0t φ(Xt (n))}, for n = 1, . . . , N2 ν0 (n) ← (T + 1) ∧ min{t = 0, . . . , T : Xt (n) ∈ B} PN2 −r(τ0 (n)∧ν0 (n)) L0 ← N12 n=1 e g((τ0 (n) ∧ ν0 (n)), X(τ0 (n)∧ν0 (n)) (n)) − π ˆ(τ0 (n)∧ν0 (n)) (n) PN2 maxt=0,...,ν0 (n) (g(t, Xt (n)) − π ˆt (n)) U0 ← N12 n=1

18

appropriate parameters of the one-step conditional distributions are still given by (17) (without the At term). The model parameters are as described in Section 5.1. Tables 4 and 5 report the computational results for barrier options in the Black-Scholes and Heston models, respectively. The performance of the control variate for the barrier option examples seems to have huge variability, with variance reduction factors ranging from 8.5 to 350 just within the Black-Scholes cases. Why is there such a discrepancy in the quality of the algorithm between these two examples? The cases for which the control variate is very successful are the “up-and-out” puts, which knock out when the option is deep out-of-themoney. The less successful cases are the “up-and-out” call and the “down-and-out” put, which knock out in-the-money and as such have discontinuous payoff functions. Indeed, this discontinuity is notorious for causing headaches among traders, especially in currency markets, who must hedge these options. Our problem here is that MARS has difficulty fitting these functions as well as it fits the smoother “up-and-out” put value functions. We believe that with a little manual tweaking we could get the performance for the discontinuous cases to improve significantly. However, in the spirit of having a fully automated procedure we have not pursued this line of inquiry. An interesting future research project would be to develop a version of MARS that is more robust to discontinuities in the target function. Table 4: Barrier Option Results (Black-Scholes, σ = .3) Stgy. Call Call Put Put Put Put

K 95 115 95 115 95 115

B 130 130 70 70 130 130

Na¨ıve L0 13.38 (.16) 3.19 (.06) 6.61 (.11) 18.34 (.19) 6.98(.14) 17.72 (.20)

MARS L0 13.36 (.04) 3.19 (.02) 6.7 (.03) 18.45 (.02) 7.09 (.01) 17.86 (.01)

MARS U0 14.11 (.04) 3.44 (.02) 7.25 (.03) 19.30 (.04) 7.13 (.01) 17.92 (.01)

VR 18.0 8.5 14.0 79.0 180.0 350.0

TR 15.0 7.3 13.0 53.0 140.0 140.0

VR 9.9 5.0 5.2 9.6 170.0 170.0

TR 8.5 4.3 4.5 7.9 120.0 110.0

Parenthesized values are 95% confidence interval half -widths.

Table 5: Barrier Option Results (Heston) Stgy. Call Call Put Put Put Put

K 95 115 95 115 95 115

B 130 130 70 70 130 130

Na¨ıve L0 14.06 (.15) 3.36 (.06) 8.95 (.12) 22.12 (.20) 12.93(.24) 22.04 (.33)

MARS L0 13.98 (.05) 3.33 (.03) 8.99 (.05) 22.20 (.06) 13.10(.02) 22.34 (.03)

MARS U0 14.82 (.05) 3.66 (.02) 9.82 (.04) 23.72 (.06) 13.16 (.02) 22.51 (.02)

Parenthesized values are 95% confidence interval half -widths.

6

Conclusion

We have presented a new, automated procedure for finding control variates for the American option pricing problem. The key advantages of our method are its degree of applicability to many option types and stochastic processes, without requiring much additional implementation overhead, and its use of off-the-shelf software. Our method works extremely well for problems of moderate dimension (up to about 5), and for problems where much of the variability of the underlying processes can be explained with a moderate number 19

of parameters. Moreover, the method can “discover” such structure automatically as a result of using an adaptive fitting procedure. A possible area of future research would be to apply this technique in conjunction with quasi-Monte Carlo methodology. This would likely result in even greater variance reductions, although that remains to be seen. The good news is that the overall procedure would not change in any substantive way. Finally, this paper suggests that there is promise in applying techniques from the (vast) statistical data mining literature to the American option pricing problem. This is a direction we hope to continue to pursue.

Acknowledgments Samuel Ehrlichman has been supported by an NDSEG Fellowship from the U.S. Department of Defense and the ASEE. This work was supported in part by NSF Grant DMI-0400287.

References Andersen, L. and M. Broadie (2004). Primal-dual simulation algorithm for pricing multidimensional American options. Management Sci. 50 (9), 1222–1234. Bolia, N. and S. Juneja (2005). Function-approximation-based perfect control variates for pricing American options. In M. E. Kuhl, N. M. Steiger, F. B. Armstrong, and J. A. Jones (Eds.), Proceedings of the Winter Simulation Conference, Piscataway, New Jersey. Institute of Electrical and Electronics Engineers, Inc. Chen, V. C. P., D. Ruppert, and C. A. Shoemaker (1999). Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming. Oper. Res. 47 (1), 38– 53. Cl´ement, E., D. Lamberton, and P. Protter (2002). An analysis of a least squares regression method for American option pricing. Finance Stoch. 6 (4), 449–471. Cox, J. C., S. A. Ross, and M. Rubinstein (1979). Option pricing: A simplified approach. Journal of Financial Economics 7, 229–263. Duffie, D. (2001). Dynamic Asset Pricing Theory (Third ed.). Princeton, New Jersey: Princeton University Press. Fisher, R. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics 7 (2), 179–188. Friedman, J. H. (1991). Multivariate adaptive regression splines. Ann. Statist. 19 (1), 1–141. With discussion and a rejoinder by the author. Friedman, J. H., E. Grosse, and W. Stuetzle (1983). Multidimensional additive spline approximation. SIAM J. Sci. Statist. Comput. 4 (2), 291–301. Friedman, J. H. and W. Stuetzle (1981). Projection pursuit regression. J. Amer. Statist. Assoc. 76 (376), 817–823. Glasserman, P. (2004). Monte Carlo Methods in Financial Engineering, Volume 53 of Applications of Mathematics. New York: Springer-Verlag. Hastie, T., R. Tibshirani, and J. H. Friedman (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer Verlag. Haugh, M. B. and L. Kogan (2004). Pricing American options: a duality approach. Oper. Res. 52 (2), 258–270. 20

Henderson, S. G. and P. W. Glynn (2002). Approximating martingales for variance reduction in Markov process simulation. Math. Oper. Res. 27 (2), 253–271. Heston, S. L. (1993). A closed-form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies 6 (2), 327–43. available at http://ideas.repec.org/a/oup/rfinst/v6y1993i2p327-43.html. Jones, L. K. (1987). On a conjecture of Huber concerning the convergence of projection pursuit regression. Ann. Statist. 15 (2), 880–882. Laprise, S. B., M. C. Fu, S. I. Marcus, A. E. B. Lim, and H. Zhang (2006, January). Pricing American-style derivatives with European call options. Management Scuence 52 (1), 95–110. Leisch, F., K. Hornik, and B. D. Ripley (2005). mda: Mixture and flexible discriminant analysis. R Foundation for Statistical Computing. R package version 0.3-1. Longstaff, F. A. and E. S. Schwartz (2001). Valuing American options by simulation: A simple leastsquares approach. Review of Financial Studies 14 (1), 113–47. R Development Core Team (2005). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0. Rasmussen, N. S. (2005). Control variates for Monte Carlo valuation of American options. J. Comp. Finance 9 (1), 84–102. Rogers, L. C. G. (2002). Monte Carlo valuation of American options. Math. Finance 12 (3), 271–286. Rudin, W. (1987). Real and Complex Analysis (Third ed.). McGraw-Hill series in higher mathematics. Boston, MA: McGraw-Hill. Tsitsiklis, J. N. and B. Van Roy (2001, July). Regression methods for pricing complex American-style options. IEEE-NN 12, 694–703. Zhang, H., C.-Y. Yu, H. Zhu, and J. Shi (2003). Identification of linear directions in multivariate adaptive spline models. J. Amer. Statist. Assoc. 98 (462), 369–376.

21