parallel solution of high order finite difference ... - Semantic Scholar

104 downloads 0 Views 4MB Size Report
4. Table 5.1: Specification of experimental apparatus. Each workstation(WKS) is connected in a token ring 100MB ethernet. The configuration of. WKSs Wi with 8 ...
PARALLEL SOLUTION OF HIGH ORDER FINITE DIFFERENCE SCHEMES FOR PRICING MULTI-DIMENSIONAL AMERICAN OPTIONS

By Matthew F Dixon

SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF PARALLEL AND SCIENTIFIC COMPUTATION AT READING UNIVERSITY WHITEKNIGHTS, READING, BERKSHIRE, UNITED KINGDOM MARCH 2002

c Copyright by Matthew F Dixon, 2002 

READING UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE The undersigned hereby certify that they have read and recommend to the Faculty of Science Graduate Studies for acceptance a thesis entitled “Parallel Solution of High Order Finite Difference Schemes for Pricing Multidimensional American Options” by Matthew F Dixon in partial fulfillment of the requirements for the degree of Parallel and Scientific Computation.

Dated: March 2002

Supervisor: Vassil Alexandrov

Readers:

ii

Contents viii

List of Tables List of Figures

x

Abstract

i

Acknowledgements

ii

1 Introduction 1.1

Research in American Multi-dimensional Options . . . . . . . . . . . . . . . . . . . .

1

1.2

Free Boundary Problems

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.3

High Order Finite Difference Methods . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.4

Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.5

Preconditioners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

1.6

Distributed Computing

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

1.7

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

2 Scalable Finite Difference Methods 2.1

Pricing American Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.2

Numerical Properties of the Black-Scholes Equation . . . . . . . . . . . . . . . . . .

11

2.3

Finite Difference Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

2.3.1

Discretisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.3.2

Implicit Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.3.3

Overview of Enhanced Approximations . . . . . . . . . . . . . . . . . . . . .

15

2.4

Finite Difference Scheme Analysis

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2.4.1

Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2.4.2

Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

2.4.3

Local Truncation Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

iv

2.5

2.6

2.7

2.8

Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.5.1

Dirichlet Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.5.2

Neumann Conditions: BC1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.5.3

Linearity Conditions: BC2

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

Solving in Multiple Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.6.1

Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

2.6.2

Domain Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

High Order Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.7.1

Operator Extension: HO1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

2.7.2

High Order Approximation of the Truncation Error: HO2 . . . . . . . . . . .

24

2.7.3

High Order Compact Schemes: HOC . . . . . . . . . . . . . . . . . . . . . . .

25

2.7.4

Scalability of High Order Schemes . . . . . . . . . . . . . . . . . . . . . . . .

27

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

3 Scalable Solution to the Implicit Finite Difference Schemes 3.1 3.2

3.3 3.4 3.5

GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

3.1.1

Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

Parallelisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

3.2.1

Sparse Matrix-Vector Multiplication . . . . . . . . . . . . . . . . . . . . . . .

33

3.2.2

Inner Vector Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

3.2.3

Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

Parallelisation of GMRES(m) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

3.3.1

Alternative Parallelisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

Preconditioners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

3.4.1

Sparse Approximate Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

4 Implementation of a Parallel High Order Finite Difference Model 4.1

4.2

High Order Finite Difference Algorithms . . . . . . . . . . . . . . . . . . . . . . . . .

43

4.1.1

Multiple Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

4.1.2

Finite Difference Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

4.1.3

Boundary Node Shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

4.1.4

High Order Compact Methods . . . . . . . . . . . . . . . . . . . . . . . . . .

47

Parallel Optimisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

4.2.1

Matrix Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

4.2.2

Communication-Computation Overlap . . . . . . . . . . . . . . . . . . . . . .

51

v

4.2.3 4.3 4.4

Vector Localisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

SPAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

4.3.1

Block SPAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

5 Numerical Experiments 5.1

Benchmark Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

5.2

High Order Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

5.3

Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61

5.4

Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

5.4.1

SPAI Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

5.5

Parallel Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

5.6

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

6 Conclusion 6.1

Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

APPENDICES Additional Numerical Results A.1 Matrix Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

A.2 Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

A.3 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

A.4 Sparse Approximate Matrix Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

A.5 Block SPAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

A.6 Orthonormalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

96

Derivations B.1 Determination of the Characteristics of the Black-Scholes Equation . . . . . . . . . . 100 B.2 Derivation of the Fourth Order Truncation Error Term . . . . . . . . . . . . . . . . . 100 B.3 Derivation of the HOC Stencil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Source Code C.1 High Order Bounded Finite Difference Matrix Generator . . . . . . . . . . . . . . . . 102 C.1.1 Distributed Matrix Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 C.1.2 Finite Difference Matrix Generator Base Class . . . . . . . . . . . . . . . . . 115 C.1.3 Stencil Generator Base Class . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

vi

C.1.4 High Order Compact Stencil Generator Class . . . . . . . . . . . . . . . . . . 138 C.2 GMRES(m) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 C.2.1 Modified Gram-Schmidt Orthogonalisation . . . . . . . . . . . . . . . . . . . 144 C.2.2 Unmodified Gram-Schmidt Orthogonalisation . . . . . . . . . . . . . . . . . . 146 C.3 SPAI Load Distribution Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 C.4 Experiment Specification Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 C.4.1 Experiment Definition File . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 C.4.2 Diagnosis Definition File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 160

Bibliography

vii

List of Tables 2.1

Typical Black-Scholes parameter ranges for pricing option contracts on assets. . . . .

12

2.2

Definition of the central finite difference operators . . . . . . . . . . . . . . . . . . .

14

2.3

Definition of the fourth order HO1 central finite difference operators . . . . . . . . .

24

2.4

Expressions for the terms of the HOC stencil at time t + 1. . . . . . . . . . . . . . .

26

3.1

Complexity analysis of principle GMRES operations per iteration. . . . . . . . . . .

37

3.2

Parallel scalability of a single iteration of the modified Gram-Schmidt method over m recurrences for each high order scheme. . . . . . . . . . . . . . . . . . . . . . . . .

39

4.1

Array representation of a three dimensional HOC stencil showing the index and plane. 48

4.2

An example of the three CRS arrays . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

5.1

Specification of experimental apparatus. . . . . . . . . . . . . . . . . . . . . . . . . .

56

5.2

Default parameters used throughout the experiments. . . . . . . . . . . . . . . . . .

57

5.3

Benchmark latency and bandwidth measurements. . . . . . . . . . . . . . . . . . . .

57

5.4

Scalability analysis of matrix operations for CD, HO1 and HOC schemes where σ = 0.05. 61

5.5

Scalability analysis of matrix operations for CD, HO1 and HOC schemes where σ = 0.2. 61

5.6

Performance comparison of a (200 × 200)2 CD scheme with (20x20)2 HO1 and HOC

schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.7

Scalability analysis of operations on AHOC with different boundary conditions where σ = 0.05.

5.8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

Performance Analysis of preconditoners applied to AHOC with second order boundary conditions where n = 10 and σ = 0.175. . . . . . . . . . . . . . . . . . . . . . . . . .

5.9

62

67

Top: Processor scalability analysis of the SPAI preconditioned AHOC with Dirichlet boundary conditions; Middle: with Neumann conditions; Bottom: with second order conditions; where n = 10, σ = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

A.1 The effect of boundary conditions on convergence properties of each scheme for extreme volatility values using a (200x200)2 Coefficient matrix. . . . . . . . . . . . . .

84

A.2 Processor scalability applied to a 5D SPAI HOC Coefficient matrix with second order boundary conditions, using unit SPAI blocks where σ = 0.5. . . . . . . . . . . . . . . viii

94

A.3 Processor scalability applied to a 5D SPAI HOC Coefficient matrix with second order boundary conditions, using eight unit SPAI blocks where σ = 0.5. . . . . . . . . . . .

95

A.4 Distributed communication and complexity of GMRES(m) with modified Gram-Schmidt orthonormalisation applied to a SPAI preconditioned AHOC with second order boundary conditions, using eight unit SPAI blocks where σ = 0.5. . . . . . . . . . . . . . .

97

A.5 Distributed communication and complexity of GMRES(m) with unmodified GramSchmidt orthonormalisation applied to a SPAI preconditioned AHOC with second order boundary conditions, using eight unit SPAI blocks where σ = 0.5. . . . . . . .

ix

98

List of Figures 2.1

Top: Variation of the maximum condition number with the number of dimensions; Bottom: Comparison of the sparsity of A for different schemes with a modified pricing equation at boundary conditions (n = 10). . . . . . . . . . . . . . . . . . . . . . . . .

3.1

Data dependency analysis of sparse matrix-vector multiplication, shown here for two rows each with three non-zero elements where Z is the axis of lexical ordering. . . . .

3.2 3.3

submatrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

Data dependency diagram of the modified Gram-Schmidt method. . . . . . . . . . .

38

εHO1 εHOC .

Theoretical effect of bandwidth on

4.1

Mapping of a boundary spatial operator into the matrix band specifier (Shown for a

4.3

34

Parallelisation of the matrix-vector multiplication by partitioning the matrix into

3.4

4.2

28

. . . . . . . . . . . . . . . . . . . . . . . .

39

HO1 finite difference operator). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

Position of the HOC stencil nodes in a three dimensional hyperplane. . . . . . . . . .

49

Left: Comparison of the number of offsets

n(wio )

required by a 3D HOC and HO1

submatrix at any point in the local domain i; Right: Comparison of the number of submatrices spanned P by a 3D HO1 and HOC stencil at any point in the submatrix. 52 4.4

A simple example of SPAI left preconditioning, shown using two rows of M . . . . . .

53

4.5

Simple example of 2 × 2 block SPAI left preconditioning . . . . . . . . . . . . . . . .

54

5.1

Top: Comparison of peer to peer communication performance between two processors

on the same workstation (internal) and across two processors residing on different workstations (external); Below: Benchmark performance of double precision global reductions for different message sizes on a single eight processor machine. . . . . . . 5.2

Comparison of approximated eigenvalues (Ritz values) for each scheme where σ = 0.05, 0.2 respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.3

59

Estimated two-norm condition numbers at each Jacobi preconditioned GMRES iteration for each scheme where, from left to right, σ = 0.05 and σ = 0.5. . . . . . . . .

5.4

58

60

Estimated two-norm condition numbers at each GMRES iteration applying second order boundary conditions where σ = 0.05 and no preconditioner is used. . . . . . . x

63

5.5

Effect of SPAI on the approximate eigenvalues (Ritz values) of a HOC matrix with second order boundary conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.6

Sensitivity of a SPAI preconditioned 5D AHOC to volatility under different boundary conditions.

5.7

65

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

Performance comparison of a 5D SPAI preconditioned HOC scheme with second order boundary conditions approximated using modified and unmodified Gram-Schmidt orthogonalisation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5.8

69

Isoefficiency curves of the SPAI preconditioned AHOC with second order boundary conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

70

A.1 Structure of ACD with Dirichlet boundary conditions. . . . . . . . . . . . . . . . . .

77

A.2 Structure of AHO1 with Neumann boundary conditions. . . . . . . . . . . . . . . . .

78

A.3 Structure of AHOC with second order boundary conditions. . . . . . . . . . . . . . .

79

A.4 Effect of volatility on the element sizes of a 3D AHO1 with second order boundary conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

A.5 Effect of volatility on the element sizes of a 3D AHOC with second order boundary conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

A.6 Comparison of approximate eigenvalues of the HOC scheme with different boundary conditions where σ = 0.05. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

A.7 Effect of the preconditioner side on convergence of GMRES(m) applied to a 2D HOC matrix with second order boundary conditions. . . . . . . . . . . . . . . . . . . . . .

86

A.8 A comparison of two-norm residuals at each recurrence of a GMRES(m) method on a 2D matrix under second order boundary conditions (m = 100), for a HO1(left) and HOC scheme ( = 0.2, nb = 50). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

A.9 A comparison of the effect of SPAI on the orthonormality of the basis of the 3D AHOC Krylov subspace under second order boundary conditions (m = 100,  = 0.6). . . . .

87

A.10 A comparison of the effect of SPAI on the two-norm residuals at each recurrence of a GMRES(m) method on a 3D AHOC under second order boundary conditions (m = 100,  = 0.6). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

88

A.11 Effect of SPAI on the singular value distribution width at each GMRES iterative solution of a HOC Matrix with second order boundary conditions ( = 0.175). . . . .

89

A.12 The structure of M for a 3D HOC Scheme with second order boundary conditions where, from top left to bottom right,  = 0.1,  = 0.2,  = 0.6 and  = 0.8. . . . . . .

91

A.13 Comparison of the element values of the SPAI preconditioning matrix M with those of AHOC where  = 0.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

92

Abstract This thesis investigates how multi-dimensional American options can be priced with bounded high order finite difference schemes in combination with preconditioned Krylov subspace methods using distributed computers. Standard finite difference methods have been reported to scale to a maximum of three dimensions. We explain how high order methods can be applied to a D dimensional domain with n ˆ uniform grid points along each axis such that the algorithmic complexity of an o(ny ) solver is reduced by a factor of n ˆ

yD 2

without loss of accuracy and stability in the case of the High Order

Compact scheme (HOC). We show experimentally that the HOC scheme can be solved for up to five dimensions using GMRES(m) on 16 distributed processors, only if an optimised Sparse Approximate Inverse (SPAI) preconditioner is used and the volatility σ 2 > 0.25 under second order boundary conditions and negligible asset correlation. We also show that the efficiency of the HOC scheme increases with the number of dimensions. We conclude that the HOC scheme is a scalable approach that can be significantly improved by alternative methods of generating an orthonormal basis for the Krylov subspace.

i

Acknowledgements I thank my supervisor, Dr Vasil Alexandrov for his direction and support throughout the project. In particular, his suggestion to use SPAI and his support in interpreting the experiments were instrumental in achieving a scalable result. I would also like to thank Dr Kenneth Tan of the High Performance Computing Group for his advice on parallel communication and numerical software libraries. I thank Professor Mike Baines of the Mathematics department at Reading University for expressing an interest in my work and introducing me to other postgraduates interested in numerical analysis. The opportunity to discuss my work with them at this early stage was critical to the direction of this thesis. I also extend my gratitude to Lehman Brother’s for providing financial support and practical advice throughout the MSc placement with them.

ii

1

1 Introduction Parallel computing has traditionally been applied to pricing financial derivatives, characterised by stochastic differential equations. A simple and effective approach is to simulate the motion of the underlying asset using Monte Carlo methods. The motion is described as a Wiener process, a Markov process with a mean µ = 0 and volatility σ 2 = 1. The solution can be approximated by treating each stochastic variable independently, without the cost of increased convergence time as the number of variables increase. For this reason Monte Carlo methods are part of a class of methods described as embarrassingly parallel. Such methods offer a simple effective approach to pricing high dimensional derivatives in parallel, when a closed form solution exists, i.e. when analytical solutions can be found for the pricing process. In general, numerical modelling offers many computational challenges when closed formed solutions do not exist. This is often the case when a problem exists in which the state of the solution variable at time t is contingent on a random event at time t + n. This is the case with American options, a contract between two parties which gives the purchaser or seller the right to buy or exercise at a pre-determined price k at any time t over the period [0, T ] 1 . The American option is one of many increasingly used exotic financial derivatives, referred to as early exercise options, which are computationally complex to price. They require the use of more advanced computational methods to account for the degree of freedom imposed by early exercise. They are a generalisation of European options which can be analytically solved on account of a fixed exercise point at time T , the maturity of the derivative. Sensitivity analysis is performed to price and hedge

2

the derivatives. This requires multiple

cycles of the pricing process in the available period between close and next day start of trading which imposes near real-time computational constraints on the model. A central challenge addressed by this thesis is to exploit widely available distributed computing environments for multi-asset American option pricing.

1.1

Research in American Multi-dimensional Options

An imminent challenge for the financial computation community is the problem of pricing and hedging multi-dimensional options with early exercise features. The price of a multi-dimensional option is contingent on the value of the underlying asset(s) characterised by one or more random processes. Examples of multi-dimensional options include basket options, out-performance options, interest, foreign exchange and credit risk based contingent instruments. For a given asset, advanced 1A 2A

precise definition of the American option contract is given in Section 2.1 (on page 10) method of offsetting a fall in the asset’s price.

2

models not only represent the price as a stochastic process, but also the volatility and other properties such as, for example, credit default. For simplicity, we restrict our attention to multi-asset derivatives such as basket options in which two or more correlated assets are each characterised by a single random process. We assume a constant volatility for each asset price. An important use of basket options is to diversify risk associated with exposure to each member of the basket. Baskets can contain hundreds of assets, but are typically between ten and hundred strong. Numerous methods have been considered for pricing this class of problem including lattices, partial differential equation methods, variational inequalities and integral equality. Early exercise requires the formulation of backward time dynamic programming operators, whereby the optimal state S(t) at time t is contingent on a decision being made at time t + n where n ∈ [0, T − t].

Although, in general, these methods can tractably approximate the problem in this form, they do not scale to multiple dimensions without significant degradation of the approximation.

Conversely, Monte Carlo methods are the most effective method for pricing high dimensional problems with forward time trajectories but intractable when applied to backward time dynamic programs. Also, Monte Carlo approximations are sensitive to the random seed, producing large variations in the delta values, the sensitivity of the option value to the underlying. Consequently, it is difficult to accurately hedge the delta values. From a practical perspective, this is a fundamental limitation, especially when the basket is highly liquid and liable to be hedged more frequently. Another fundamental limitation is that the relationship of the error  in approximation to the number of trials n is O( √n ). It is thus not computationally feasible to price to high accuracy. Important enhancements to the Monte Carlo methods include the use of low discrepancy sequences in place of pseudo random number sequences [24]. Whilst these have been shown to converge faster than standard Monte Carlo simulations, they do not solve the inherent difficulty of pricing with early exercise features. To provide an overview, numerous methods for solving the backward time dynamic programs have been investigated. A simple solution to this problem is to form a discrete approximation of this dynamic program. Using this approach, Cox, Ross and Rubinstein [18] proposed the widely used CRR binomial model. Although simple and effective in delivering a coarse approximation, the method does not scale to higher dimensions nor can it be easily used to calculate hedging parameters referred to as the greeks. A promising area of research in multi-dimensional American option pricing, is the simulation of the dynamic program defined as an optimal stopping problem. In this form, the decision to buy or sell is based on the random variable S at discrete time intervals t, S(t) K, where S(t) is an initial unknown, t ∈ [0, T ] and K is the strike or contract price of the option. Notable areas of active

research are the stochastic mesh methods which offer good performance subject to the quality of

3

variance reduction [7]. The application of these methods to parallel computing has been investigated by Avramidis [1]. An alternative approach suggests reducing this problem to a free boundary single dimensional form [4]. This considers the net payoff in all dimensions when the asset is exercised. The approximation can be computed quickly, using standard numerical methods, but with loss of accuracy due to the inconsistency of the similarity transformation with the pricing process. Although an interesting avenue of research, we will draw upon more established theory. Numerical methods can be developed to scale to higher dimensions and applied to the stochastic differential pricing equation only if the asset movement is risk neutral and its gradient is continuous. This enables the stochastic differential equation to be represented in a deterministic form. Such methods include finite difference, element and boundary integral methods. Of these approaches, the flexibility of finite difference methods has invited the most attention from the financial computation community. For this reason, the finite difference method has been the focus of this thesis. Finite difference methods have been extensively developed in the field of computational fluid dynamics. This has motivated research into the scaling of finite differences on parallel computers. Consequently, software libraries have already been developed for solving finite difference methods in distributed computing environments. Our challenge will be to integrate American option pricing models into an existing finite difference solution framework. Recent separate advances in more scalable numerical methods and matrix solvers prompt us to apply for the first time, a new combination of methods to the efficient computation of multi-dimensional American options on distributed computers.

1.2

Free Boundary Problems

In the context of partial differential equations we refer to a moving boundary as a free boundary. The resolution of the free boundary imposed by the early exercise facility has been researched extensively for numerical methods applied to industrial problems, particularly those in fluid dynamics. Two fundamental approaches have been developed by the research community for applying numerical methods to the pricing of single dimensional American options. Formulating the finite difference approximation as a linear complementarity problem, Dempster [10] pioneered the application of linear programming methods such as the Simplex method to option pricing. Similarly, Coleman [33] applied a Newton interior point method to the linear complementarity problem and illustrated that the convergence properties were not dependent on the number of discretisation points. The second approach is to express the early exercise condition in a fixed form that can be used by existing methods for European option calculation. Nengiju [23] showed that the exercise

4

boundary can be formulated as piecewise exponential functions and used in binomial models to great computational effect. Wu [38] reduced the free exercise boundary problem to an initial value problem by applying front-fixing methods to the finite difference method. We prefer the first approach for its simplicity and popularity.

1.3

High Order Finite Difference Methods

Ever since the first paper that applied finite difference methods to option pricing in 1977 [6], there has been extensive interest in this approach. Finite difference methods are a generalisation of treemethods that provide the flexibility to optimise the solution domain to the underlying financial model. The essential difference between tree and grid methods is that the former only prices the option along the path of possible states, where as grid methods price the options over the range of the underlying asset(s). The main advantage of the finite difference method is that it can be easily improved by modifying the scheme, transforming the coordinates and adapting the grid subject to the behaviour of the instrument being priced. It also enables more complex boundary conditions to be implemented leading to more accurate schemes. The success of the approach is based not only on the numerical integrity but also the stability and algorithmic complexity. An established numerical approach for pricing options is the implicit finite difference method [32][37]. Although stable, a solution can only be found by inverting the coefficient matrix A using a direct or iterative solver. This operation is limited by computational cost. The solver is thus the rate determining step in computing finite differences. Iterative solution becomes intractable when solving higher dimensional implicit finite difference schemes. A means of reducing the algorithmic complexity of the problem is to approximate the partial derivatives with higher terms in the Taylor series approximation of the derivatives. Such methods are referred to as high order because the order of the truncation error is increased. Under certain conditions, they enable the use of a lower grid resolution, thus reducing algorithmic complexity, without reduction in accuracy of the approximation. However, this can reduce the condition of A for each scheme. High order compact methods have only recently appeared in published research papers [31]. They offer the same advantages as high order methods but can lead to better conditioned coefficient matrices.

5

1.4

Iterative Methods

The solution vector x of an implicit finite difference scheme is found by inverting A where b is a right hand side vector Ax = b

x, b ∈ Rn

A ∈ Rn×n

It has be shown that standard direct methods for solving linear equations do not scale to large sparse systems, such as those encountered in multi-dimensional or high resolution grids using implicit finite difference schemes [32]. Iterative methods can be classed as stationary or non-stationary. The difference is that stationary methods do not use information that changes between each iteration. The partial differential equations that we will ultimately consider are second-order self-adjoint elliptic equations with initial-boundary conditions. A is a square nonsingular nonsymmetric positive definite matrix where, by definition, det A > 0. The boundary conditions degrade the regularity of the sparsity pattern of the non-zero elements of A. The dependency on diagonal dominance, symmetry and regularity renders stationary methods such as Jacobi, Gauss Seidel and successive over relaxation methods generally unsuitable for large matrices of this class [20]. An effective nonstationary iterative solver is the multi-grid method whose convergence rate is independent of the matrix size [9]. However, multi-grid methods are difficult to calibrate and more complicated to formulate when applied to high dimensional problems. Krylov subspace methods form an important class of iterative solvers for parallel computing because they involve only matrix-vector multiplication, internal dot product and vector update operations [19]. They converge to a solution iff the matrix is nonsingular. Three different classes of Krylov subspace projection methods exist, namely the Ritz-Galerkin approach, the minimum residual approach and the Petrov-Galerkin approach [12]. The Conjugate Gradient method of the first class and the MINRES method of the second are simple approaches for inverting symmetric positive definite (SPD) matrices only. The Conjugate Gradient method satisfies the Galerkin condition rk ⊥Kk where Kk is a Krylov subspace of k basis vectors [9]. The MINRES method minimises the

two-norm residual krk k2 projected across an orthonormal basis of Kk . The orthonormal basis is

defined as the set of vectors {wi } in an inner product space denoted hereon by < . > such that < wi .wj >= 1 if i = j and 0 otherwise.

Bi-Conjugate Gradient (BiCG) and generalised mean residual (GMRES) are variants of the respectives above that do not stipulate the restrictive SPD condition. Both methods are parallelisable and scale to larger problems sets. GMRES is one of the most robust approaches but also computationally expensive per iteration. The GMRES iterates are expressed as Arnoldi vectors and the BiCG as nonsymmetric Lanczos vectors. The BiCG algorithm has the disadvantages that the nonsymmetric Lanczos process is not robust and two matrix-vector operations must be performed for each iteration, doubling the complexity of

6

the algorithm and the number of interprocessor communications. On distributed computers, this can inhibit performance as network latency may be too significant to hide with computation. The first deficiency can be solved by using the more robust Bi-conjugate stabilised (BiCGSTAB) algorithm. Gilli [21] showed that BiCGSTAB methods (without preconditioning) are effective on a distributed memory architecture for pricing options with three underlying assets, assuming no correlation between the assets. Their performance, in comparison with stationary iterative methods, significantly improves with smaller grid spacing. In a similar manner we will investigate the performance of GMRES(m), a variant of GMRES which is restarted after m iterations. GMRES promises greater stability and lower complexity than BiCGSTAB [20].

1.5

Preconditioners

The performance of iterative solvers can be improved by preconditioning, particularly when A is ill-conditioned. In fact, subject to the problem class, the choice of preconditioner can be more influential than the choice of iterative solver alone [20][36]. Preconditioners are designed to ease the computation of the A−1 . The choice of preconditioner is largely determined by the matrix structure, iterative solver and computer architecture. In general an ideal scalable preconditioner is one that can be constructed with a cost which is linearly proportional to the problem size and inversely proportionally to the number of processors used. It is also essential that its effect on convergence does not diminish with increasing problem size, the number of processors or variation in the parameters in the linear system. For use in a distributed environment, it is important that the work and communication can be balanced over processors and the preconditioner is tolerant to network latency. Memory and per processor requirements are an additional practical consideration. For preconditioning A it is important that the preconditioner is effective on less regular nonsymmetric self-adjoint square matrices with general complex spectra (negative and positive complex eigenvalues). For this reason, Polynomial preconditioners ,an established scalable approach, can not be used since they can not be applied to matrices with general complex spectra [5]. We instead consider a more recent approach designed for the solution of general complex sparse linear systems on parallel computers in which a sparse approximate inverse (SPAI) matrix M is computed and applied at each stage of the iterative solver [16].

1.6

Distributed Computing

Distributed computing is a subset of parallel computing for which the processors do not share the same main memory. It is most applicable to Multiple Instruction Multiple Data (MIMD) parallel

7

architectures whereby each processor follows an independent instruction set without lock-step synchronisation, at each instruction, with all other processors. Research on the suitability of numerical and iterative solution methods to distributed computing has been propelled by the development of the Message Passing Interface (MPI). This has made low cost distributed computer architectures a popular alternative to expensive multi-processor parallel computers. Parallel performance is comparable when both the algorithms are coarse and network latency is low. That is, the cost of sending or receiving zero bytes from a remote memory location is not high. Krylov subspace methods for the solution of A are not embarrassingly parallel. To parallelise them, the data locality and regularity of operations is analysed and the methods modified for distributed computer architectures [19]. Parallelisation of Krylov subspace solved finite difference methods can be classified at three levels [20]. At the highest level, referred to as the system level, the domain can be decomposed into sub-problems and solved in parallel. We refer to this as domain decomposition. At the algorithmic level, we can also analyse the data dependencies in the iterative solver and attempt to extract parallelism. At the submatrix level we can identify parallelism in the Basic Linear Algebra Subroutines (BLAS) of the solver. On distributed computers we are not just interested in the data dependencies of the operations at each of these levels but also data distribution and volume of interprocessor communication required. Various methods exist for restructuring the solver to eliminate or hide the communication delay by overlapping the communication with computation.

1.7

Overview

Finite difference methods have proven to be an effective method for pricing up to three asset American options only. The main aim of this thesis is to improve finite difference methods such that their solution by SPAI preconditioned GMRES(m) scales to multi-dimensional bounded problems on distributed computers, in which the number of variables exceeds three. The problem is approached in two parts beginning with (i) the design and analysis of enhanced finite difference schemes and (ii) the efficient solution of the multi-dimensional scheme in parallel. High order finite difference methods have not been applied to multi-dimensional American option pricing problems. We present, for the first time, experimental results on the stability of high order schemes using different boundary conditions applied to multi-dimensional American options in Chapter 5 (on page 55). We begin by formulating the pricing process as a linear complementarity problem, in Chapter 2 (on page 10). We define the input parameter range for which the solution must be accurate and stable in Section 2.2 (on page 11). We then compare its properties with its analagous form in computational fluid dynamics, a research field for which finite difference methods have extensively been used and enhanced. We specify the properties required for the finite difference scheme to be well-posed in Section 2.4 (on page 15). This enables us to design our own methods

8

for implying boundary conditions without increased truncation error, as described in Section 2.5 (on page 18). Thus consistent stability and accuracy over the parameter input range will provide a measure of success of each scheme. High dimensional problems, where the spatial dimensionality D > 3, are rarely encountered in computational fluid dynamics. These problems are non-trivial. In Section 2.6 (on page 20) we transform a multi-dimensional Cartesian co-ordinate system into two dimensional matrix form. We also show how to eliminate the cross-derivative terms (asset correlations), which would otherwise complicate the finite difference scheme. We desire simple yet flexible transformations which result in well structured matrices. Direct application of standard finite differences to higher dimensional problems would render them intractable. We proceed to develop numerous high order finite difference methods and determine their algorithmic complexity in Section 2.7 (on page 22). Algorithmic complexity is not the only measure of solution time. Properties of A have a significant effect on its solution. We further determine the theoretical stability and density of A for high dimensional problems in this Section. In particular, we develop a high order compact scheme, a recent unestablished approach which promises good numerical stability and scalability. The aim of the second half of the thesis is to identify efficient techniques for solving the high order schemes specified in Section 2.7 and present numerical results describing their performance on distributed computers. Current research has not explicitly identified properties of high order finite difference with different boundary conditions when choosing the most effective parallel solver. In Chapter 3 (on page 30) we will describe GMRES(m) and identify methods of improving its performance on distributed computers. In Section 3.4 (on page 40) we will identify the requirements for a scalable preconditioner and show how SPAI meets these. Consistently low convergence times, tolerance to the geometric properties of A and parallel scalability are measures of success of this approach. We will describe the implementation of the high order finite difference schemes in Chapter 4 (on page 43). Whilst libraries have been developed for solving large-scale linear systems, software has not been developed to generate high order, high-dimensional bounded finite difference matrices to price the Black-Scholes equation. We will describe our matrix generator in Section 4.1 (on page 43) and show how to implement boundary conditions and high order finite difference operators. We will then describe the Partial Differential Equation Toolkit for Scientific Computing (PETSc), a MPI based parallel finite difference toolkit. We determine the effectiveness of parallel optimisations implemented in PETSc on two different high order finite difference schemes. We measure our success by the efficiency, scalability and ability to validate our implementation. The aim of Chapter 5 is to confirm the theoretical findings in Chapters 2 and 3, and establish

9

the conditions under which these are valid. We begin by investigating the relative performance of GMRES(m) applied to each high order finite difference scheme. We will show how the finite difference scheme affects the recurrence generation, projection and convergence time of GMRES(m) on serial and distributed computers. We will then present results on the effect of boundary conditions on GMRES(m) for different finite difference schemes. We will proceed to present results on the performance of SPAI in comparison with the Jacobi preconditioner. We will measure performance by consistent reduction in convergence time (excluding preconditioner construction time) on a serial computer and its efficiency in parallel. We will then identify conditions under which SPAI preconditioned GMRES(m) best performs and then investigate the scalability. We will finally conclude on the scalability of high order compact methods with SPAI preconditioned GMRES(m) and identify areas of further research.

10

2 Scalable Finite Difference Methods This Chapter demonstrates how the numerical performance of finite difference schemes can be measured and used to develop methods which scale to higher dimensional problems. In general, a scalable parallel finite difference scheme can be formed by (i) appropriately defining the problem and solution domain (ii) discretising the bounded domain (iii) partitioning the linear system into smaller independent subsystems for solution in parallel (iv) solving the linear system at each time interval and (v) propagating over all time intervals. Having specified a pricing process and solution domain, the choice of discretisation method has a significant influence on the integrity and scalability of the numerical solution. This method must be able to exploit parallel computers with either shared or distributed memory. Scalability is largely dependent on the granularity, data dependencies and memory requirements of a scheme. We define granularity as the ratio of floating points to communication operations per processor. Existing distributed software libraries will be used for managing interprocessor communication, memory management and the mapping of the functional domain to each processor. Emphasis is placed on designing a scheme that satisfies these criteria and can be implemented as a MIMD programming model with message passing communication between processors.

2.1

Pricing American Options

We begin by defining the problem in the simplest form that can be solved by numerical methods for pricing a single asset American option. The stochastic differential equation used to describe asset movement by Ito’s lemma can be converted into a partial differential equation using the FeymannKac condition. This yields the fundamental Black Schole’s partial differential equation 2.1.1 where V (S, t) is the value of an equity option contingent on an underlying asset S exhibiting a volatility σ 2 , interest rate r and continuous payment of a dividend dc over the duration of the option contract. For convenience, we will use L, an operator on V , to refer to this partial differential equation L=

∂V 1 ∂2V ∂V + σ 2 S 2 2 + (r − dc )S − rV = 0. ∂t 2 ∂S ∂S

(2.1.1)

This is a linear second order partial differential equation based on the assumption that S follows a lognormal random process. From inspection of the characteristic directions of equation 2.1.1 we note that they are real and identical, and thus parabolic [2]. Equation 2.1.1 is simply a diffusion, convection and absorption equation, where the terms are described by respectively [37].

σ2 S 2 2 ,

(r − dc )S and rV

11

In addition to definition of the pricing process, payoff conditions must be defined. We consider the pricing of a single asset American put option. This allows the contract holder to sell S at anytime over the life of the option for K price units. Formulated as a linear complementarity problem, the payoff function p(S) is Definition 2.1.1. LV ≤ 0 V ≥ p(S) (V − p(S))LV = 0 where, Pcall = max(S − K, 0) Pput = max(K − S, 0). The call option (allows the contract holder to buy rather than sell the asset) is less straightforward, since exercise is also determined by the dividend paid by the asset. It is never optimal to exercise early if a dividend does not exist. If the asset pays discrete dividends then the optimal exercise can only occur at that point, so as to receive the dividend. The formulation of the early exercise condition in this form also ensures that the gradient of the option value with respect to S is continuous and thus possible to hedge.

2.2

Numerical Properties of the Black-Scholes Equation

We estimate a range of parameters used in the Black-Scholes equation 2.1.1 from published American option pricing experiments in [4][32][37]. From this we can identity conditions in the D spatial dimensional fixed discrete domain Ω for which a stable numerical scheme is difficult to find, on account of the instability of the physical system: From the ranges defined in Table 2.1, we can deduce that the Black-Scholes is typically highly diffusion dominant. This convection to diffusion ratio in fluid dynamics equations is referred to as the Peclet number P e. Most fluid behaviour is described by P e > 1. A condition under which numerical schemes do not perform well is when the flow becomes highly convection dominant, i.e. P e >> 1, making it difficult to find stable numerical solutions. However, the simplicity of the lognormal asset motion model has greater repercussions for the diffusion dominant pricing process. This is because convection does not sufficiently mediate these effects. Typically asset dynamics are best described not by Brownian motion but jump diffusion (analogous to shock waves in computational fluid dynamics). This is primarily due to cash flows from the asset.

12

Param σ2 S K r dc T ρxy

Value 0.30 100 100 0.07 0.02 180 0.5

Range ±0.25 ±50 ±50 ±0.05 ±0.02 ±150 ±0.5

Description Asset Volatility Asset Value Strike Price Interest rate Continuous dividend Option Maturity (days) Correlation between Sx and Sy

Table 2.1: Typical Black-Scholes parameter ranges for pricing option contracts on assets.

From a quantitative perspective, the effect of discretisation on the variance of asset motion is a fundamental consideration. Courtadon [8] addressed this question and showed how to neutrally discretise the pricing equation. Jump diffusion processes have a significant influence on numerical schemes but their consideration is beyond the scope of this thesis.

2.3

Finite Difference Schemes

In this Section we will define the finite difference scheme and methods of measuring its performance. The principle of finite difference methods is to approximate a set of derivatives at finite intervals over the range of the dependent variables. This requires definition of (i) a grid over which a solution is required, (ii) a method of advancing local approximations over this grid and (iii) a method for local approximation of the solution. Stated more formally, the grid is defined as: Definition 2.3.1. We approximate the required solution g(u) over the fixed bounded open region Ω × [0, nt ] in D dimensions with Cartesian coordinates, where Ω := [0, S1 ] × [0, S2 ] × . . . × [0, SD ]

and nt is the number of time steps. The regular, fixed finite difference scheme is applied to a set of

points Ujk where j ∈ JΩ and k is the time-level k ∈ [0, ∆t, 2∆t, . . . , nt ∆t] and j the D dimensional index.

A scheme defines the propagation of the approximations defined in the domain. Finite difference schemes are generally classed as either implicit or explicit depending on the computational structure, hereon referred to as the stencil. In general, research on explicit methods has focused on the implications of its instability, whereas research on implicit methods has investigated the computational complexity of solvers for the matrix inversion problem.

13

2.3.1

Discretisation

Discretisation is the approximation of the continuous partial differential equation at fixed intervals ∆S in finite space and ∆t in time. More specifically, the finite difference method approximates the derivatives of the partial differential equation at discrete points (i, j) in the fixed bounded open region Ω in D spatial dimensions. Time dependent equations such as the Black-Scholes require spatial and temporal discretisation operators. Definition 2.3.2. A finite difference stencil S at node (i, j) on the grid is defined as V 0 (i∆sd , j∆t) =

D X S X

V 0 ((i + m)∆sd , (j + n)∆t).

d=1 m,n

V 0 (i∆sd , j∆t) is an approximation of the Black-Scholes equation at the j th time step ∆t and the ith spatial step ∆sd where the dimension d ∈ [1, D]. The range of spatial and temporal offsets m and

n is specific to each stencil S.

We consider the temporal and spatial derivatives separately. The simplest approach, is the use of the Euler method, which considers a first order approximation of the temporal derivative, where n ∈ [0, ±1] respectively. A spatial derivative is defined as a forward or backward derivative

if m ∈ [0, ±1]. A central difference scheme (CD) approximates the first order derivative to second order accuracy where m ∈ [−1, 1], m 6= 0.

To simplify construction of a finite difference scheme we apply a logarithmic transformation over

Ω, by assuming (i) the partial differential equation is a stable representation of the pricing process (ii) σ 2 and r are deterministic over Ω × [0, nt ] (iii) discretisation is uniform across the grid and (iv)

the number of grid points in each dimension nˆd = n ˆ , ∀d ∈ [1, D].

It can be observed from historical data that σ 2 (t) is best represented as a stochastic variable.

For simplicity we will assume that σ 2 is a constant in each dimension. We can then isolate the effect of volatility as the number of dimensions D increases. This provides a clearer understanding of the significance of volatility on the scalability of the scheme. Under these conditions, a logarithmic transformation of the spatial variables can be applied to eliminate S from the coefficients. The single asset Black-Scholes equation based on the transformed spatial variable Sˆ = log(S) is 1 ∂2V 1 ∂V ∂V + σ2 − rV = 0. + (r − σ 2 ) 2 ˆ ∂t 2 ∂S 2 ∂ Sˆ

(2.3.1)

The transformation is intuitive since the asset movement is defined as a lognormal process. It does not affect the classification of the partial differential equation. It improves the stability of the finite difference scheme by reducing the spectrum of eigenvalues of the A. A disadvantage of this approach is that the use of a uniformly spaced log transformed grid represents a higher proportion of small values of the asset price. These are generally less significant

14

Operator

Definition

δS δS2

φ+ −φ− 2∆S φ+ −2φ+φ− ∆S 2

Table 2.2: Definition of the central finite difference operators

[37]. Another disadvantage is that the transformation reduces the transparency of the financial model. We continue by considering an implicit solution to equation 2.3.1 over an infinite solution ˆ domain. To avoid the convolution of equations we will hereon use the notation S to represent S.

2.3.2

Implicit Schemes

Definition 2.3.3. Let k represent the time step of a single asset backward time discretised scheme where k ∈ [T, 0]. The implicit approximation of the option value at time step k + 1 and spatial

k+1 where m ∈ [−1, 1]. These are calculated from Vik , where Vi0 is the payoff at steps i + m is Vi+m

the maturity of the contract T based on the asset price i∆S. Otherwise for k 6= T the implicit ˆ approximation of the American option V k+1 = max(V k+1 , V k ), ∀ˆi, ∀kˆ ∈ [0, k]. i

i

ˆi

Equation 2.1.1 can be expressed in the implicit form V t+1 = A−1 V t

V ∈ Rn .

(2.3.2)

The central difference representation of this form δt V k + ak+1 δS2 V k+1 + bik+1 δS V k+1 + cik+1 Vik+1 = 0, i

(2.3.3)

where the finite difference operators are defined in Table 2.2. Without considering the boundary conditions, the resulting A from applying this scheme over Ω is shown in matrix notation 

    A=    

Ak+1 1

1 + B1k+1

C1k+1

0

...

0 . ..

A2k+1 .. .

1 + B2k+1 .. .

C2k+1 .. .

0 .. .

0

Ak+1 n−2

k+1 1 + Bn−2

k+1 Cn−2

0

k+1 An−1

0

1+

k+1 Bn−1

0 k+1 Cn−1



        

Considering the single asset with constant σ 2 , A consists of a tri-band of non-zero elements where Ai,j = Ai+1,j+1 ∀i, j ∈ [1, n). The matrix A would be well conditioned were it not for the effects of

boundary conditions and dimensionality. We will investigate these in Section 2.5 (on page 18) and Section 2.6 (on page 20) respectively.

15

2.3.3

Overview of Enhanced Approximations

Research on variants on the finite difference operators discussed above has been extensive and can generally be classified as methods which improve the approximation by improving the stencil in the time and spatial dimensions. By improving the potential accuracy of the approximation, the number of computations can be reduced accordingly without a reduction in the default accuracy. The accuracy of the time component of the error can be improved by forming the stencil over multiple time levels. The simplest example is referred to as a θ method, where a weighting θ is assigned to nodes at time t and t + 1. This is a weighted combination of the explicit and implicit methods which improves the error term by O(∆(t))1 . The improved accuracy can be exploited in two ways. We can approximate the time component to high accuracy, which may be necessary in close proximity of critical points to ensure an accurate √ solution. Alternatively we can reduce the complexity by a factor of the discretisation frequency, nt intervals. It has been found that its use with the complementarity approach results in oscillatory delta factors about the early exercise curve [33]. Application of a linear solver followed by the early exercise constraint will only result in a first order accurate solution with respect to the time step. This is because the solution is not self-consistent. This can be overcome by adjusting the solution with the Successive Over Relaxation (SOR) solver. However, this approach is less scalable in comparison with high order high dimensional schemes. Preferably, we can consider reducing the error associated with the spatial operators. This has a greater potential to reduce algorithmic complexity than temporal optimisation since the upper bound on complexity is O(nt n3D d ), where A is assumed to be dense and the solution of the linear system is found in nD iterations. This approach will be considered in more detail in Section 2.7 (on page 22).

2.4

Finite Difference Scheme Analysis

This Section identifies methods which determine the conditions under which a well-posed scheme exists.

2.4.1

Consistency

Consistency provides a measure of the capability of the finite difference approximation to solve the partial differential equation. For the definition of consistency, we state the most fundamental Theorem in finite differences, the Lax equivalence Theorem. 1 The

Crank Nicolson method is a special case when θ =

1 2

16

Theorem 2.4.1. Stability is the necessary condition for convergence of a consistent finite difference scheme for a partial differential equation with a well-posed initial value scheme. In other words, providing the finite difference scheme is consistent with the well posed initial value problem, the scheme will always converge if it is stable.

2.4.2

Stability

The purpose of stability analysis is to define bounds for the properties of the scheme such that errors in the initial value components do not grow exponentially. In contrast to dynamical stability, numerical stability refers to the behaviour of the solutions to a finite number of intervals as the interval width is decreased, rather than growth of errors over infinite time. There are numerous methods for stability analysis which represent the problem in different systems. These include Fourier series harmonics for the grid errors, energy models and eigenvector analysis. The temporal and spatial structure of the scheme have a significant on the stability of the scheme. More a physical perspective, a scheme is unstable if it exhibits numerical oscillations not associated with the pricing process. Oscillation can reduce convergence rates of implicit scheme solvers and stability over time. Oscillation arises from numerous sources. The use of too coarse a grid structure for the differential equation being represented resorts in instability. In fluid dynamics, this state is referred to as violation of the Peclet cell number. We will first consider a simple and widely used method, known as von Neumann analysis, for an implicit central difference scheme based on a two dimensional Black-Scholes equation with constant coefficients. This method provides a necessary condition for convergence assuming that the finite difference scheme approximates a pure initial value problem with periodic initial data and that the equation is linear and has constant coefficients. Whilst the restrictions are impractical, the method serves as a preliminary means for stability analysis. The error term is the sum of all error harmonics across the grid domain. Theorem 2.4.2. von Neumann Analysis e(S) =

X

Ak eiβk S eαt .

k

Substituting this into the central difference approximation of the two dimensional Black-Scholes equation 2.1.1 and representing the exponential terms as their trigonometric equivalents 1 ∆t ∆t 2a(cosβS − 1) − r∆t + ib (sinβS) − 1), = −( (2.4.1) λ ∆S 2 ∆S Œ Œ where the necessary condition for stability is that Œeλt Œ ≤ 1. That is, the error component will not grow with time provided the above expression is satisfied. It can be shown that equation 2.4.1 is

17

unconditionally stable, since the interest rate and volatility terms can not be negative. Separating the real and complex parts of equation 2.4.1, and considering the real part σ2 ≤ r. ∆S 2 (cosβS − 1)

(2.4.2)

A limitation of von Neumann analysis, aside boundary condition exclusion, is that it does not easily scale to the analysis of multiple dimensions. We shall instead, consider an alternative representation of the problem in eigenvector space, using a matrix method to analyse the stability of the linear system, where V is the right vector Vk+1 = AVk + r, where AV = λV.

(2.4.3)

A necessary and sufficient condition for convergence is that kAk ≤ 1. Since, by definition, the

spectral radius |ρ(A)| ≤ kAk it follows that a sufficient condition for convergence is that |ρ(A)| ≤ 1. The use of this result requires that all eigenvalues be known. This is feasible for a large scheme with constant coefficients but impractical otherwise. Gerschgorin’s Theorem has two main advantages over the von Neumann method, specific to this application. The Theorem does not require computation of the eigenvalues and, and it exhibits a general feature of matrix methods; boundary conditions can be incorporated. Theorem 2.4.3. The Gerschgorin Theorem Gi := |λ − ai,i | =

n X

j=1,j6=i

|ai,j |.

The linear system defined in equation 2.4.3 will converge iff all the eigenvalues {λk } of A= {ai,j }

lie in the union of the Gerschgorin circles in the complex plane defined by Gi .

This provides a theoretical upper bound on the eigenvalue distribution. Applying this Theorem to an implicit central difference approximation of the two dimensional Black-Scholes equation 2.1.1, with constant coefficients

which reduces to

Œ Œ Œ Œ Œ Œ Œ Œ Œλ − ( 2a∆t + 1)Œ ≤ Œ 2a∆t Œ , Œ Œ Œ ∆S 2 ∆S 2 Œ 4a∆t + 1 ≥ λ ≥ 1, ∆S 2

where the condition for stability is

Œ Œ Œ1Œ Œ Œ ≤ 1. ŒλŒ

(2.4.4)

(2.4.5)

From equation 2.4.5 we deduce that the system is unconditionally stable since |λ| ≥ 1, as concluded

from the von Neumann method. The theoretical bounds on the eigenvalues of A define a maximum

condition number k =

λmax λmin .

18

A simple extension to this method is to consider a maximum of D spatial dimensions for the same problem. Given that each dimension contributes the same components to each term in A, equation 2.4.5 can be generalised to 1 ≤ λ ≤ 4∆t

D X ad . ∆s2d

(2.4.6)

d=1

Thus the maximum condition number of ACD increases with dimensionality by a factor of 1 + aad+1 . d In general, the rate of convergence would be expected to decrease proportionally. However, other factors affect the convergence rate of the iterative solver. These will be identified in Section 2.7.4 (on page 27).

2.4.3

Local Truncation Error

Local truncation error Te is a measure by which the exact solution of the partial differential equation does not satisfy the difference equation at any point on Uj ∈ Ω [30]. It arises from neglect of higher

order terms in the Taylor series expansion at this point when defining a finite difference operator.

Thus the truncation error can be determined by replacing each term of the finite difference operator with its Taylor expansion and subtracting the terms in the finite difference operators as shown below. Definition 2.4.4. Consider a Taylor series expansion of a continuous function f (s1 , s2 , . . . , sD ) at the grid point (IS) ≡ (i1 ∆s1 , i2 ∆s2 , . . . , iD ∆sD ), where si is independent of sj , ∀j ∈ [1, D], j 6= i. The central difference operator δ is used to approximate the D dimensional diagonalised log ˆ with fourth order truncation error T 4 . transformed Black Scholes equation L e

Lemma 2.4.5. Te4 = page 100)).

1 12

PD

d=1

∆sd2 (Ad f (id sd )4sd + 2Bd f (id sd )3sd . (A proof is provided in B.2 (on

The truncation error imposes a lower bound restriction on the number of grid points required to ensure a given level of accuracy. It will be shown in Section 2.7 (on page 22) that the complexity of an implicit scheme increases exponentially with the number of grid points. Since our aim is to scale the scheme to higher dimensions, rather than achieve higher accuracy, we will show how an Λth order truncation error can be eliminated, reducing the number of required grid points and hence the complexity compared with a central difference scheme by

2.5

O(∆S Λ ) . 2

Boundary Conditions

Geometric Brownian motion is used to represent the dynamics of the underlying state variable in the Black-Scholes equation. This process is diffusion dominant, attenuating errors emanating from

19

all grid points Uj ∈ ∂Ω ⊂ Ω. However, poorly imposed boundary conditions can still significantly

reduce the quality of the finite difference scheme. Three approaches for treating boundary conditions

are considered. We can (i) apply a simple linearity condition (ii) apply a modified pricing equation and (iii) move the boundary conditions away from critical points, such as the strike price. The requirements for numerically suitable boundary conditions are that it does not compromise the Te and it maintains the structure of A. The first requirement leads to the application of the pricing equation, with an additional constraint placed on one or more of the derivatives. Additionally, the finite difference operators are shifted to ensure that the stencil remains in Ω, without reducing the order of the scheme. This approach has the disadvantage that it introduces outliers at the boundary conditions. The effect of the boundary conditions on convergence properties is presented in the experimental Section. For multi-dimensional models, each boundary condition can be applied normal to each boundary, such that each state variable is only affected by the boundary condition pertaining to it. This isolates the effect of a boundary condition to its own dimension.

2.5.1

Dirichlet Conditions

The Dirichlet condition simply treats the American option as a down and out put, where premature expiry occurs if S becomes worthless   −r(T −t)   (K − S)e  V (S, t) = Ke−r(T −t)      0

0 < S < K, S ∈ Ω S = 0, S ∈ ∂Ω S >= K, S ∈ ∂Ω

This imposes a monotonical constraint on the underlying price process. In other words, payoff is guaranteed when the underlying state variable reaches zero in value. Imposition of this condition for a single asset reduces A to a (n − 2) × (n − 2) matrix, where the boundary conditions are expressed as a residual vector r ∈ Rn where ri = 0, ∀i ∈ (0, n − 1).

2.5.2

Neumann Conditions: BC1

The representation of a boundary condition as a first order derivative is called a Neumann Condition. In physical terms, this approach neglects convection at a boundary value of S. In other words, this can be interpreted in the context of financial modelling as full uncertainty of the future price of the underlying asset. There are two approaches to representing derivative boundary conditions. The first implements this condition directly as

∂V ∂S

= 0 using a first order up-wind scheme with Te = O(∆S) where

20

S ∈ ∂Ω. We could increase the order of Te by increasing the number of terms used to approximate

the derivative. However, the main limitation of this approach is that it is limited to one spatial dimension and the absence of a time derivative leads to stability dependent on the ratio of time to space grid interval. Applying Gerschgorin’s Theorem, we find that the stability is contingent on ∆S ∆t



1 r−Dc .

Instead we consider the second approach of applying the boundary condition through the pricing equation. We form the second order finite difference operator from three shifted terms such that all points lie in Ω. Te is preserved at O(∆s2 ) because the first order error components in the Taylor series are zero. The stability condition σ 2 ≥ 0 is unconditionally achieved. The shift operation has

no effect on the stability of the scheme by Gerschgorin’s analysis.

The coefficient matrix resulting from this approach (shown for AHO1 in Figure A.2 (on page 78)) is similar to the unbounded matrix. The exception is that the stencil is shifted at the boundary to ensure that all stencil nodes lie on the grid and the first order derivatives are set to zero. This approach is particularly effective when the boundary is close to a critical price, such as K. Use of high accuracy boundary conditions enable reduction of the distance between the grid boundary and the critical points, since error propagation is less pronounced. This can lead to a reduction in computational cost, especially for high dimensional problems because fewer grid nodes are required in each dimension.

2.5.3

Linearity Conditions: BC2

This condition states that the payoff from the put option is at most linear in S as it tends to infinity or zero. It is arguably a more realistic representation of the behaviour of financial derivatives than the other approaches [32]. Expressing the condition through a modified pricing equation,

∂2V ∂S 2

=0

where S ∈ ∂Ω. This reduces the stencil at the boundary to a three point up-wind convection

term with Te = O(∆S). Similar to the previous condition, a restrictive stability condition exists where

∆t ∆S 2



1 2σ 2 .

The expression of this condition through the pricing equation is simpler than the

Neumann condition because the operators do not require shifting. The first order upwind convection operator is such that Te = O(∆S 2 ) since the second order error component in the Taylor series is zero. Again, the stability condition σ 2 ≥ 0 is unconditionally achieved.

2.6

Solving in Multiple Dimensions

Undesirable components of the multi-asset Black-Scholes equation are the spatial cross-derivatives ∂2f ∂x∂y

representing the correlation between the two assets x and y. Not only does this complicate

the formulation of finite difference schemes, especially those which are high order, but their values

21

determine the classification of the following multi-dimensional partial differential equation (shown for the two spatial dimensions) a

∂f ∂2f ∂2f ∂2f ∂f ∂f +d +e − rf = − . +c 2 +b 2 ∂x ∂y ∂x∂y ∂x ∂y ∂t

(2.6.1)

Definition 2.6.1. The second order coefficients of two asset log-transformed partial differential equation 2.6.1 are a =

2 σx 2 ,

b = ρσx σy and c =

σy2 2 .

Lemma 2.6.2. Equation 2.6.1 is either hyperbolic, parabolic or elliptic depending on the respective conditions ρ > σx σy , ρ = σx σy or ρ < σx σy 2 .

2.6.1

Linear Transformations

A transformation of the characteristics of equation 2.6.1 eliminates the spatial cross-derivatives in the Hessian matrix H of second order spatial derivatives. For the two dimensional problem, the rotation by α below eliminates the cross derivatives. Since the volatility terms a and c can not be negative, equation 2.6.1 by definition is unconditionally elliptic, since a0 c0 > 0 and b’=0, where ’ denotes the transformed characteristics. The characteristic directions become complex. A general transformation can be applied to the two dimensional case 0

xT Hx = (Rθ x )T HRθ x0

(2.6.2)

Assuming H is a non-singular matrix, it can be shown that the determinant of H 0 is equal to that of H [25]. This property arises through the fact that each matrix has the same characteristic equation. In other words, they represent the same linear system but in different local coordinates. Such matrices are defined as similar and have the same eigenvalues. The transformation described above can be represented as a similarity transformation [11] such that |H 0 | = |H|, H 0 = P −1 HP.

(2.6.3)

More specifically, H 0 is defined as a diagonal of H can be represented as the diagonal of H. This is true when all eigenvalues of H are distinct, that is each is unique. P is then defined such that it has columns of linearly independent eigenvectors of H, corresponding to the eigenvalues. From the basic definition of an independent linear system Hv = λv, (a1,1 − λi )v1 + (a2,2 − λi )v2 + ... + (aD,D − λi )vD = 0, H ∈ RD×D , v ∈ RD .(2.6.4) This provides D equations which can be solved by a row reduction formulae to yield the eigenvectors v. Each eigenvector can then be substituted into P , its inverse calculated and by equation 2.6.3, 2A

proof is provided in Section B.1 (on page 100)

22

the transformed coefficients evaluated. The method can be simplified, by using the property that H is symmetric. H can then be determined as an orthogonal diagonal of H with Q, an orthogonal matrix whose columns consist of orthonormal eigenvectors where H 0 = QT HQ where QT = Q−1 .

2.6.2

(2.6.5)

Domain Mapping

We must specify a mapping of Cartesian coordinates in Ω to A before we can attempt to solve a multi-dimensional linear system. Definition 2.6.3. Let the mapping M define the multi-index j in Ω in terms of the row in A ∈ Rn×n , where j ∈ [1, JΩ ] and jd ⊂ JΩ pertaining to d. M:j→(

D−1 Y X D−1

(

i=1

(2.6.6)

ˆ d )jd ) + jD . n

d=i

Application of M produces the following unbounded A at time k + 1, shown here, without loss of generalisation, in dimensions x and y where A ∈ Rnˆ x nˆ y ׈nx nˆ y  Ax

1k+1

    

1+B1xk+1

C1xk+1

0

...

Ayk+1 1+B yk+1 1

0

Ax 2k+1

1+B2xk+1

C2xk+1

0

...

...

..

..

..

..

..

0 0

.

.

x x An−2 k+1 1+Bn−2k+1

0

Ax n−1k+1

.

x Cn−2 k+1

.

0

.

...

x x 1+Bn−1 k+1 Cn−1k+1

1 Ayk+1 2

C yk+1

0

0

1+B yk+1 2

C yk+1 2

0

..

..

..

Ay

.

n−2k+1

0

1

1+B y

...

Ay

.

n−2k+1

n−1k+1

Cy

.

. ..

0 n−2k+1 y y 1+B C n−1k+1 n−1k+1



    

To clarify, the position of each offset φd along the dth axis of the stencil can be expressed in

terms of j, where φ ∈ Rn×n φd (M(j), ±

D−1 Y

n ˆ d ).

(2.6.7)

d

If we assume that the grid is uniform, then equation 2.6.8 can be expressed in full as D−1 X

φd (

i=1

(D−1−d)

n ˆ (D−1−i) , ±ˆ nd

) = φd (M(j), ±md ).

(2.6.8)

A final step which improves convergence rates and reduces storage, is to compress the band of the A such that the index of any node with offset in dimension d is defined as φd (M(j), ±md ) = φd (M(j), ±(D + 1 − d)).

2.7

(2.6.9)

High Order Schemes

The advantage of reducing the truncation error is that a coarser spatial discretisation can be used to achieve the same level of accuracy, thus reducing the algorithmic complexity of the solver. We

23

consider the complexity for a single time step. Definition 2.7.1. Let n ˆ d and n ˆ HO be the number of grid points along the axis in d for the CD and d the O(nΛ ) scheme respectively, where Λ > 2. Let n ˆd = n ˆ

∀d ∈ [1, D] on Ω. Let ny be the order

of algorithmic complexity of the solver for inversion of A ∈ Rn×n , where in the case of GMRES, 1 ≤ y ≤ 3.

Lemma 2.7.2. The order of algorithmic complexity reduction of a O(∆S Λ ) accurate scheme com2y

pared with the standard second order accurate scheme at a single time step is O(n Λ ). Proof. We choose n ˆ HO such that O(TeHO ) = O(TeCD ) d Λ nHO nd2 ). O(ˆ ) = O(ˆ d

(2.7.1)

From equation 2.6.8 in Section 2.6.2 (on page 22), the length n of the corresponding square 2D A is n=

D Y

ˆD . n ˆd = n

(2.7.2)

d=1

Substituting the n ˆ d terms in equation 2.7.2 for the n ˆ HO terms in equation 2.7 we obtain an expression d 2y

for n, the size of a matrix whose order of algorithmic complexity reduction when inverted is O(n Λ ).

Thus the factor of algorithmic complexity reduction when solving a fourth order accurate scheme y

in place of the CD scheme is O(n 2 ). This is a substantial improvement over higher order time step methods, especially when we consider high dimensional problems. The disadvantage of this approach is that a coarser grid can’t accurately represent discontinuities without using adaptive techniques or a non-linear grid. There is, however, no further distortion of dispersion and dissipation as a result of coarser discretisation using high order methods. Fourier analysis of the Black-Scholes equation shows that the dissipation and dispersion terms for a second 2 2

order central difference approximation are − σ 2k (1 −

∆S 2 k2 ) 3

2 2

and µ(1 − ∆S6 k ) respectively [32]. We

consider the distortion effects of a fourth order approximation of the Black-Scholes, without loss of

generalisation: Definition 2.7.3. We define the Fourier term as eikS e(a+ib)t where L is a fourth order (HO1) central difference approximation of the Black-Scholes equation, at time t and spatial point S, a and b are arbitrary constants, k is the wave number and i is the imaginary number 3 . 2 2

Lemma 2.7.4. The numerical dissipation and dispersion of L is − σ 2k (1 −

respectively 4 . 3 The 4A

HO1 scheme is defined in the next Section proof to a similar problem is provided in [22]

∆S 4 k4 45 )

4 5

and µ(1− ∆S30k )

24

Operator

Definition

δS δS2

φ+2 −8φ+ +8φ− −φ−2 12∆S −φ+2 +16φ+ −30φ+16φ− −φ−2 12∆S 2

Table 2.3: Definition of the fourth order HO1 central finite difference operators

We have shown that the effect of discretisation on numerical dissipation and dispersion does √ not change with the use of a O(∆S)m scheme using a coarser mesh of step size m ∆S where √ O(m ∆S)m = O(∆S) and ∆S ≤ 1. We proceed by considering three high order finite difference schemes based on modification of the stencil structure. We do not consider substitution based high order methods such as Richardson’s extrapolation because our aim is to investigate the effect of stencil structure on the convergence properties of Krylov subspace solvers.

2.7.1

Operator Extension: HO1

Another simple and potentially highly scalable approach is to extend the central difference finite difference to m terms, resulting in an O(∆S m − 1) accurate scheme.

Replacing the single asset central difference operators in equation 2.3.3 with the fourth order

extended operators defined in Table 2.3, a generalised condition number can be determined for assets sd , ∀d ∈ [1, D] HO1 kmax =5

D X

vd1 ad + 1,

(2.7.3)

d=1

where, vd1 =

∆t ∆t , vd2 = . 2 ∆sd ∆sd

The structure of the A generated by this scheme is shown in Figure A.2 (on page 78) with Neumann boundary conditions.

2.7.2

High Order Approximation of the Truncation Error: HO2

The above method approximates each term of the Black-Scholes to fourth order accuracy. Another alternative is to approximate this equation using standard central difference operators defined in Table 2.2 (on page 14) but with a high order operator for Te . In general, a fourth order Te in D spatial dimensions has the form Te =

D X ∆s2

d

d=1

12

(ad

∂4f ∂3f + 2bd ). 4 ∂sd ∂sd 3

(2.7.4)

25

Expressing the approximation of finite difference operators as described in [2], third and fourth order operators can be derived in terms of the second order central difference operators µ=

E +1/2 + E −1/2 , δ = E +1/2 − E −1/2 and Efn = fn+1 2

From this definition the higher order operators can be expressed as ∂3f ∂4f µ 1 3 3 )) f , = (δ + O(δ = (∆sd − O(δ 3 ))4 f. 3 3 4 ∂d ∆sd ∂d ∆s4d Expanding the approximations to second order local accuracy, an overall fourth order accurate scheme is produced, due to the ∆sd2 term in equation 2.7.4. Adopting the same approach as taken in the previous Section, an upper bound on the condition number in D dimensions can be determined PD HO2 d=1 (17vd1 ad + vd2 bd ) + 6 = P k . (2.7.5) D d=1 (vd1 a − vd2 b + 6)

2.7.3

High Order Compact Schemes: HOC

The above two approximations can be described as axial n point schemes, where n = 4d + 1. HO2 also dampens the oscillations through inclusion of a convection term in the eigenvalue bounds. The final class of high order schemes that shall be considered are high order compact finite difference schemes (HOC). HOC can be derived by truncating the Taylor series of each operator to higher orders. It has been shown that they exhibit high accuracy and stability, whilst remaining computationally efficient [31]. The width of the stencil is marginally larger than that of CD yet there is an increase in the order of truncation error and improved data locality in comparison with the other schemes considered. We proceed by defining a fourth order HOC scheme for the diagonalised log-transformed two asset Black-Scholes equation. Definition 2.7.5. Let LHOC (η, ε, t be the high order compact homogeneous finite difference approximation of the diagonalised log-normal Black-Scholes equation, with constant coefficients. Without loss of generality, we use η and ε to convey that LHOC can only be formed if the there are no cross-derivative terms. The Λth ordered truncation error TeΛ is O(dη Λ , dεΛ , dt). Lemma 2.7.6. LHOC approximates the Black-Scholes equation with O(∆S 4 ) such that wHOC ≈ wCD for any number of dimensions 5 .

Each node of the D dimensional HOC stencil is defined in Table 2.4, where ai and bi are diffusion and convection constants, and Cix is the vector of nodal weightings in the finite difference operator for the xth order derivative in dimension i. 5A

proof is provide in Section B.3 (on page 101)

26

Node φ(i, j, t + 1) φ(i + m, j, t + 1)

φ(i + m, j + n, t + 1)

Expression    bi ‘ PD Cj2 ‘ 5 € 1 1 1 ∆t Ci2 ai ∆s + − r − 6 2 + 12 2 j6 = i ∆sj 6 ∆t i  P  ‘‘ 2 Cj2 C D ∆t ai j6=i ∆sj Ci1 − 12i  ‘ € ∆si PD Cj2 bi bi 1 1 C + ∆t Ci1 2∆s + ∆s − + r i 24 12   i 1 24 ‘‘ i j6=i ∆sj 1 1 ∆si + ∆t 12 Ci 2 + 1  P ‘ Cj2 € ∆si 1 D ai 2 ∆t 12 j6=i ∆s2j 2 Ci + Ci  P ‘ Cj1 € ∆si 1 D 1 2 + ∆tbi 12 j6=i 2∆s1 2 Ci + Ci j

Table 2.4: Expressions for the terms of the HOC stencil at time t + 1. To simplify the proof, a fourth order compact scheme has been considered. In theory, higher orders can be achieved by this method. However, the definition becomes too convoluted for practical use. This is because the following discrete cross-derivatives still exist in the approximation due to the representation of high order derivatives as lower order cross-derivatives. These are shown below for the two variable diagonalised Black-Scholes equation in ε and η 3 3 2 2 3 2 δε42 ,η2 ; δε32 ,η , δε,η 2 , δε2 ,t , δη 2 ,t ; δε,η , δε,t , δη,t .

(2.7.6)

The HOC stencil is a box of d(d + 3) + 1 grid points. This produces a denser non-zero sparsity band in A as shown in Figure A.3 (on page 79). Further research is required to eliminate these spatial and time cross derivatives, thereby enabling the scheme to scale to even higher orders. A further limitation is that the method is inherently implicit due to the cross derivatives in t. This prevents the use of the explicit method and multi-time step methods without incurring the added complexity of n − 1 matrix multiplications, where n is the number of time levels required to express the unknown vector f t±a uniquely.

By Gerschgorin’s Theorem, bounds on the eigenvalues enable us to derive the maximum condition number k HOC =

ai



−ad

−1 δs2i





3 δs2i

5 12



PD

1 4

PD

j6=i

‘

€1  bi + 11 12 δt − r − 6 ‘ €  bi . 3 1 1 r − 6 − + 2 4 δt δs

1 j6=i δsj2

(2.7.7)

j

In deriving equation 2.7.7, we have assumed, as with the other high order methods considered, that the coefficients and discretisations are the same in each dimension. Sensitivity analysis shows that there is little change to this result for changes in parameters over the range specified in Table 2.1 (on page 12). We will consider the comparative scalability of these approaches in the Section below.

27

2.7.4

Scalability of High Order Schemes

Referring to the graph in Figure 2.1 (on page 28) we observe that there is a distinct advantage in using a HOC scheme for higher dimensional problems. The maximum condition number of AHOC grows less rapidly with dimensionality indicating that the problem scales to high dimensions. A further important consideration is the sparsity of A for each scheme. The complexity of the solver is a function of the number of non-zero elements in each row. From the lower graph in Figure 2.1 (on page 28) it can be shown that the choice of scheme affects the sparsity less significantly as the number of dimensions increases. The theoretical condition numbers of each scheme have been determined by considering any point Ui,j ∈ Ω. In an unbounded domain with constant coefficients, the set of all eigenvalues {λi } ∈ [1, n] are bounded by G(A) with centre ai,i . In a bounded domain, eigenvalues associated with the outlier

B stencils at Ui,j ∈ ∂Ω are no longer bounded by G(A) but a number of circles associated with a side

or corner point.

We consider two important implications of the boundary conditions on the HO1 and HOC scheme (i) the effect of the number of points Ui,j ∈ ∂Ω on the eigenvalue spectrum and (ii) the effect of derivative boundary conditions on the associated eigenvalue bounds for the full input parameter

ranges specified in Table 2.1 (on page 12). Comparing Figure A.2 (on page 78) and Figure A.3 (on page 79) we observe that, as a consequence of wHO1 > wHOC there are a greater number of outliers. Further, we observe that as a consequence of the use of a shifted stencil S±a when Ui,j ∈ Ω is a grid points from the boundary, there is a greater

n − 2)D eigenvalues in G(AHOC ) and only variation of outlier projections. More precisely, there are (ˆ

n − 2a)D eigenvalues in G(AHO1 ). The remaining eigenvalues of AHO1 are split into a (a − 1) circles (ˆ of (ˆ n − 2)D − (ˆ n − 2a)D eigenvalues associated with S±a . The remaining boundary points lead to

varying outliers depending on the combination of dimensions associated with the boundary value.

A low condition number is not a sufficient condition for improved convergence rate. The clustering of eigenvalues about unity, singular values and the choice of preconditioner also have an important effect [31]. Paramount to finding a finite difference scheme that promises low algorithmic complexity, is stable and appropriately accurate, is the study of the effect of stencil structure on the performance of preconditioned iterative solvers. The eigenvalues are less clustered if the grid is non-uniform and the pricing parameters are inhomogeneous. At boundary points, the change in stencil structure leads to outlier eigenvalues. In the next Chapter we will define a relationship between the number of these outlier eigenvalues and their distance from the main cluster of eigenvalues bounded by G(A). The effect of boundary conditions on the approximated eigenvalue spectrum is presented in Chapter 5 (on page 55).

x101

28

2.4

HO1 HO2 HOC

k

1.6

0.8

0

0

0.4

0.8

1.2

1.6

2

x10-1

log nd

4.8 HOC HO1 CD

Sparsity

3.2

1.6

0

1.6

2.4

3.2

4

4.8

Number of Dimensions

Figure 2.1: Top: Variation of the maximum condition number with the number of dimensions; Bottom: Comparison of the sparsity of A for different schemes with a modified pricing equation at boundary conditions (n = 10).

29

2.8

Summary

The purpose of this Chapter was to design a scalable bounded finite difference scheme for pricing multi-dimensional American options. We began by showing how the pricing process could be formulated as a linear complementarity problem in Section 2.1 (on page 10). In Section 2.2 (on page 11) we assigned numerical ranges to the parameters of the Black-Scholes equation and identified that the problem is highly diffusion dominant, concluding that option pricing problems are stable and ammenable to numerical methods. Instead we must contend with volatility bias as a result of discretisation. In Section 2.3.3 (on page 15) we identified that higher accuracy with respect to the time step could reduce the complexity by a factor of nt . This reduction could not be achieved without application of the SOR solver since the solution would not be self-consistent. In Section 2.3 (on page 12) we considered the finite difference method in more detail and identified the importance of consistency, stability and accuracy. We considered three types of boundary conditions for option pricing in Section 2.5 (on page 18) and described how two of these methods, Neumann and second order, can be implemented to preserve the order of truncation error. We identified two approaches to implementing these boundary conditions, either through direct discretisation of the boundary condition or modification of the pricing equation. We considered the latter approach only. In Section 2.6 (on page 20) we proved that classification of the log-transformed Black-Scholes equation was, in part, dependent on the asset correlations. We then showed how to eliminate cross-derivatives by diagonalisation of A in Section 2.6.1 (on page 21). In Section 2.6.2 (on page 22) we defined the mapping of stencil nodes in the D spatial dimensional finite domain Ω to A. We then showed that the factor of algorithmic complexity reduction when using a stable fourth spatial order scheme on a uniform grid compared with a second order spatial scheme is n ˆ

Dy 2

. We showed how three different high order finite difference methods were

derived and applied Gerschgorin’s Theorem and Fourier analysis to analyse stability and dynamic distortion in the presence of high order terms using a coarser mesh. Most importantly, we showed in Section 2.7.3 (on page 25) that the unbounded HOC scheme was theoretically much more stable at high dimensionality than the other high order schemes. We also discussed the effect of stencil compaction at boundary grid points on the eigenvalue distribution and stated that extended operators result in a greater number of outlier eigenvalues. In the next Chapter we consider methods of optimally solving the schemes on distributed computers.

30

3 Scalable Solution to the Implicit Finite Difference Schemes The purpose of this Chapter is to describe how higher order schemes can be implemented and solved using distributed computers. More specifically we describe the development of the schemes in the framework of PETSc. We consider the following imperatives for a scalable high performance solution (i) time per iteration is reduced with increasing number of processors (ii) efficient processor and cache usage on each workstation and (iii) convergence property preservation with an increase in the number of processors used. We describe the solver and identify the factors which limit its scalability on multiple processors, referred to as processor scalability, in the following Sections.

3.1

GMRES

GMRES, proposed by Saad and Schultz [28] minimises the residual r0 = b − Ax0 over the Krylov

subspace K := K(j, A, r0 ) ≡ span(r0 , Ar0 , A2 r0 , . . . , AJ−1 r0 ). The algorithm consist of two stages

(i) construction of an orthonormal basis and (ii) projection of this basis on to K. The method is classically implemented using the Arnoldi process. This a modified Gram-Schmidt method specifically for Krylov subspaces, designed to orthonormalise the J bases of K [17]. Each new orthonormal base is hereon referred to as a dimension or a recurrence. The dimensionality of the subspace is often high when solving linear problems with nonsymmetric matrices. Since each dimension i requires the storage of the basis vectors Ai r0 , GMRES is usually restarted after m iterations to reduce memory requirements. This approach is referred to as GMRES(m). At each restart point a vector xj is used as an initial estimate for the next j iterations, where 1 ≤ j ≤ m. Sub-optimal choices of m can lead to stagnation points in the algorithm. The

Arnoldi process is more formally defined below.

Definition 3.1.1. Choose x0 ∈ R and let r0 = b−Ax0 such that at each iteration j, minimise krj k ∈

K, where j < m, the restart parameter. Collecting the m basis vectors of K in V = [v0 v1 . . . vm ] we arrive at the decomposition associated with the Arnoldi method AVm = Vm+1 Hm where H is the upper Hessenberg matrix. This process is effective for large, nonsymmetric sparse linear systems such as the class considered. The eigenvalues of H are the Ritz values with respect to the m dimensional Krylov subspace [19]. These converge to the eigenvalues of A as each new recurrence j is generated. However, not all of the set of Ritz values may convey the true size of the eigenvalues and can thus be misleading [14].

31

For completeness, a preconditioned version of the GMRES(m) algorithm is defined below. An implementation of the modified Gram-Schmidt process is also provided in Section C.2.1 (on page 144). The choice of preconditioner and its application to A by multiplying by the preconditioning matrix M will be described in subsequent Sections. Algorithm 3.1.2. GMRES(m) 1: while kb − Axm k >  do 2: 3:

M r0 = b − Ax0

β = ||r0 ||

r0 β

4:

vj + 1 =

5:

for j=1 to m do

6:

ˆ = Avj Mw

7:

for i=1 to j do

8:

vector inner product} ˆ hij = viT w{Calculate

9:

ˆ − hij vi w ˆ=w

10:

end for i

11:

hj+1,j = kwk ˆ 2 {Update Upper-Hessenberg matrix}

12:

vj+1 =

w ˆ hj+1,j

13:

end for j

14:

εsolver = kb − Axk

15:

miny∈Rm kβεsolver − Hm yk

16:

xm = x0 + Vm ym

17:

x0 = xm

18:

end while Each element of H, hj,j+1 indicates the level of orthogonalisation of the j bases of K. The

upper Hessenberg least square problem can be solved for x, once a satisfactory two norm residual has been computed by QR factorisation, typically, using Givens rotations. In practice, the least squares solution requires negligible time compared with the Arnoldi process specified on lines 5-13 in Algorithm 3.1 (on page 31). εsolver is choosen such that it is significantly less than the finite difference Te . Analysis is performed in two stages starting with (i) convergence analysis based on the properties of the set of distinct eigenvalues {λj } ∀j ∈ [1, J] and (ii) the processor scalability of the algorithm based on the non-zero sparsity pattern of A. We will be primarily concerned with the latter.

32

3.1.1

Convergence Analysis

Full convergence analysis including the effect of preconditioners as a function of the properties of the stencil is beyond the scope of this thesis. We will however make an important link between the description of eigenvalue distribution in Section 2.7.4 (on page 27) for each scheme and a lowerbound on the number of iterations required for convergence of GMRES. In finite dimensions the operator A is a complex nonsingular matrix where A ∈ Cn×n . The

minimum number of iterations required for A to converge dmin

Suggest Documents