Monte Carlo Methods: A Computational Pattern for Our Pattern Language ∗
Jike Chong
Ekaterina Gonina
Kurt Keutzer
University of California, Berkeley
University of California, Berkeley
University of California, Berkeley
[email protected]
[email protected]
[email protected]
ABSTRACT
1.
The Monte Carlo methods are an important set of algorithms in computer science. They involve estimating results by statistically sampling a parameter space with a thousands to millions of experiments. The algorithm requires a small set of parameters as input, with which it generates a large amount of computation, and outputs a concise set of aggregated results. The large amount of computation has many independent component with obvious boundaries for parallelization. While the algorithm is well-suited for executing on a highly parallel computing platform, there still exist many challenges such as: selecting a suitable random number generator with the appropriate statistical and computational properties, selecting a suitable distribution conversion method that preserves the statistical properties of the random sequences, leveraging the right abstraction for the computation in the experiments, and designing the an efficient data structures for a particular data working set. This paper presents the Monte Carlo Methods software programming pattern and focuses on the numerical, task, and data perspectives to guide software developers in constructing efficient implementations of applications based on Monte Carlo methods.
The Monte Carlo Methods pattern is a computational software programming pattern in Our Pattern Language (OPL) [10]. A software programming pattern is a general set of solutions to a recurring problem in software design and implementation. It is a narrative in a natural language that communicates to its reader some tacit knowledge about software design. One can construct a pattern language using a set of related software programming patterns, that when used together, can concisely express the structure and organization of a piece of software. The software programing patterns allow software developers to communicate clearly and concisely with each other about the structure and organization of a piece of software. In architecting a software application, patterns can help highlight potential bottlenecks in a software architecture, focus proof of concept efforts; at the time, a pattern language can recommend alternative related patterns to solve a given problem. To assist in the implementation, patterns can help orient the effort of a development team around clearly defined challenges in software application development. OPL [10] is an effort of a research community centered around University of California, Berkeley. The community came together to discuss patterns that are useful for parallel programming. Its members are from a diverse set of international academic and industrial institutions. In OPL, the pattern language is organized in five categories of patterns:
Categories and Subject Descriptors [Software Engineering Patterns]: Monte Carlo Methods
INTRODUCTION
Monte Carlo Methods, Software Patterns, Parallelization
1. Structural Patterns: Describe the overall organization of an application and the interactions of the computational elements that make up an application.
Keywords
2. Computational Patterns: Describe the classes of computations that make up an application.
General Terms
Monte Carlo Methods, Parallelization, Computational Pattern, Random Number Generators ∗Also affiliated with Parasians, LLC, Sunnyvale, California.
[email protected] Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. A preliminary version of this paper was presented in a writers’ workshop at the 2nd Annual Conference on Parallel Programming Patterns (ParaPLoP). ParaPLOP’10 March 30-31, 2010, Carefree, Arizona USA Copyright 2010 ACM 978-1-4503-0127-5 ...$10.00.
3. Parallel Algorithm Strategy Patterns: Define high-level strategies to exploit concurrency within a computation for execution on a parallel computer. 4. Implementation Strategy Patterns: Describe the structures that are realized in source code to support (a) how the program itself is organized and (b) common data structures specific to parallel programming. 5. Parallel Execution Patterns: Describe the approaches often embodied in a runtime system that support the execution of a parallel program. The Monte Carlo Methods pattern is a computational pattern that resolves the recurring problem of constructing efficient analysis involving estimating results by statistically
sampling a parameter space with a large number of experiments. It is presented following the standard format of the OPL patterns, with a problem definition (Section 2), a context explaining the background in which the solution should be used (Section 3), a set of forces defining the design space (Section 4), the solution structure and considerations (Section 5), invariants that should be true in order to apply this pattern (Section 6), illustrative examples (Section 7), known uses (Section 8), and related patterns (Section 9).
2.
PROBLEM
The results of some analysis can be best estimated by statistically sampling a parameter space with thousands to millions of experiments using different parameter settings. How do we efficiently construct and execute this large number of experiments, and estimate the result?
3. The approach is flexible: Experiments can closely mimic business logic. It does not require significant simplifying assumptions as compared to many analytical solutions. This allows a more realistic model to be used, which reduces the risk of violating key assumptions in using analytical solutions. We call the process of sampling of the parameter space, simulating experiments, and aggregating the results, the Monte Carlo Method1 . The ease and intuitiveness of setting up the experiments makes the Monte Carlo Method a popular approach[16]. However, because of the large number of experiments required to achieve statistically significant solutions, it is extremely computationally intensive. Efficient setup and execution of the experiments can make this method practical to a wide range of applications.
4. 3.
CONTEXT
How do you estimate the market risk your investment portfolio is exposed to? How do you determine if a particular molecular dynamics system will reach equilibrium? Often, it is easy to solve a problem by setting up experiments to sample the parameter space and analyze the distribution of the results, but hard to solve the problem analytically. For example, in a volatile financial market, it is crucial to keep track of the downside risks for an investment portfolio. Given a set of market trends and volatility measures, we can estimate the market Value-at-Risk (VaR) by simulating the effects of market trends on the portfolio value through a set of experiments. Specifically, the inputs are sets of market scenarios generated with certain statistical distribution, each experiment simulates the value of the portfolio given one market scenario, and the result is an analysis of the set of portfolio values produced given the distribution of market scenarios. Market VaR measures the probability that a portfolio would suffer a loss greater than some threshold. For example, a VaR could be a 15% loss or more on the portfolio value with a probability of 1% over a day. Given changing market conditions, a portfolio’s exposure to market risk also changes. A portfolio manager must regularly modify the distribution of the investments to adapt to changes in market condition to meet the investors’ risk tolerance. By statistically sampling a problem’s parameter space, we can gain valuable insights into a complex problem that would be impossible or impractical to solve analytically. The quality of a solution estimated through sampling has several properties:
4.1
Universal forces
1. Complexity of parameter modeling We want to use as simple a statistical model as possible to model the parameters. Simplicity leads to ease of understanding and programming, and allows for efficient computation. At the same time, the parameter model needs to be detailed and complex enough to properly represent the statistical variations in the input parameters. 2. Fast vs rigorous random number generation We want to quickly generate many scenarios for experiments. At the same time, the scenarios generated must be independent and identically distributed based on suitable sequences of random numbers [11] to achieve the intended coverage of the parameter space, requiring for a more rigorous random number generation process than the typical random number generator in a standard math library.
4.2
Implementation forces
1. Easy experiment mapping vs load balancing Given the complete independence of the experiments, the easiest way to exploit this concurrency is to map each experiment to a unit of execution (UE). However, the experiments vary in size, thus we need to consider task size to achieve a load balanced execution. This requires for more complex experiment-to-UE mapping. 2. Experiment data working set size We can allow a single experiment to use each UE to be sure to have data working set fit in cache, but execution is exposed to long memory latency. We can allow multiple experiments to share a processor to hide long latency memory fetches, but the working set may not fit in limited cache space.
1. The experiments are independent and parallelizable: the approach assumes that experiments are independent and identically distributed (i.i.d.), such that the set of experiments provides a statistically valid sampling of the parameter space 2. The process is computationally expensive: by the central limit theorem, the statistical significance of solution, or more specifically, the statistical error (standard error) in the solution is proportional to the inverse square-root of the experimental size, i.e. to achieve 10x more precision in the result, one needs to run 100x more experiments
FORCES
3. Static vs. dynamic sample generation The sampling process uses random number sequences. 1
Due to Stan Ulam, John Von Neuman, and Enrico Fermi, who were Physicists who first used the approach for neutron diffusion calculations in the field of high energy physics.[1]
The sequence can be generated before running any experiments, or it can be generated as part of the experiment. Generating it before running any experiment can save computation time if the same sequence will be used multiple times, but it could take significant memory resources to store. Generating it as part of the experiment could be a more memory efficient approach, but care must be taken as to how different threads share their random number generation state.
5.
For (1..NumSimulations) do
For (1..NumSimulations) do Experiment Generation
Experiment Generation
Setup experiments by providing parameters according to some stochastic model
Setup experiments by providing parameters according to some stochastic model Experiment Execution
For (1..NumSimulations) do Experiment Execution
Compute outcome of the experiment
Compute outcome of the experiment
SOLUTION
The Monte Carlo method uses a simple solution structure where experiments are generated, executed, and the experimental outputs are aggregated to estimate the result. We describe the structure of the solution and illustrate the perspectives and considerations that go into the implementation of the solution. The solution is outlined here: 1. Solution Structure (5.1)
Result Aggregation Compute the expected results according to some metric of all the experiment outcomes
Result Aggregation Compute the expected results according to some metric of all the experiment outcomes
(a)
(b)
Figure 1: Monte Carlo Methods solution structures
2. Solution Considerations (5.2) (a) Numerical-centric perspective (5.2.1) i. Random Number Generation (5.2.1.1) ii. Parameter Distribution Conversion (5.2.1.2) iii. Multi-parameter Correlation (5.2.1.3) (b) Task-centric perspective (5.2.2) (c) Data-centric perspective (5.2.3) i. Experiment Generation - RNG step (5.2.3.1) ii. Experiment Generation - Distribution conversion step (5.2.3.2) iii. Experiment Execution Step (5.2.3.3) iv. Result Aggregation (5.2.3.4)
5.1
Solution Structure
Solving a problem with the Monte Carlo method involves a sampling of the parameter space with thousands to millions of experiments. The number of independent experiments provides significant opportunities for parallelization, which high performance computing (HPC) experts often call “embarrassingly parallel”. While this describes the ample parallelism opportunities, there are still many implementation decisions that need to be made to achieve efficient execution. Each experiment requires a three-step process: 1. Experiment generation: generating a random set of input parameters for an experiment 2. Experiment execution: executing the experiment with its input parameter 3. Result aggregation: aggregating the experimental results to into statistical relevant solutions. As there are millions of experiments to be computed, there are many valid ordering of these computations. Figure 1 illustrates two types of ordering that enables static or dynamic generation of the random sequences: (a) illustrate the case with dynamic generation of random sequence, the values are generated per experiment and immediately consumed; (b) illustrate the case with a priori generation of random sequences, which could be generated once and stored for multiple uses.
At the top level, a Monte Carlo simulation may also involve not only one problem to be solved, but multiple very similar problems to be solved. For example, in Pi estimation, we have the one value of Pi to be estimated, where as in option pricing, we often have a batch of thousands of options potentially to be priced in parallel. When there is more than one problem to solve, we have the multiple levels of parallelization opportunities to explore.
5.2
Solution Considerations
The goal of using the Monte Carlo method is to quickly converge on a solution that is within a certain error bound. This motivates us to look at the numerical-centric perspective of the Monte Carlo method, which focuses on improving the convergence rate. The key is to select a parallelizable and scalable parameter generation technique that works well with the parallelization requirements of the rest of the implementation. Experiment execution has the most amount of concurrency available, but also has the most variation in problem size and computation requirements. This calls for the taskcentric perspective, where the key is to be able to quickly re-factor the computation with patterns to take advantage of existing libraries and frameworks. The implementation of the Monte Carlo method is very data intensive. In the data-centric perspective, we focus on optimizing memory access patterns between different stages of an implementation.
5.2.1
Numerical-centric perspective
There are many existing techniques for generating parameters for Monte Carlo experiments. The parameters are usually generated in two to three steps. The two required steps are: Uniform Random Number Generation and Parameter Distribution Conversion. The third step is Multi-parameter Correlation, which may be performed separately, or combined with Experiment Execution. Specifically, Uniform Random Number Generation produces an independent and identically distributed sequence of values uniformly over a sampling domain, e.g. a real value from 0.0 to 1.0. Parameter Distribution Conversion converts the sampled parameter to a distribution that matches the
distribution of the real parameter, e.g. a normal distribution, or a Poisson distribution. Multi-parameter Correlation correlates the multiple parameters of an experiment to produce scenarios that are coherent with the assumptions of the experiment input parameters. We now go into each of the three steps that make up the experiment generation module in more detail. 1. Uniform Random Number Generation: Uniform number generators have been studied for several decades [8]. There are many existing RNG approaches and packages available [7]. The types of RNGs and their properties are described in Appendix A. To choose a suitable RNG, one must be aware of several important attributes of RNGs: a) random number type, b) state sizes, c) output value types. The detailed considerations for RNG selection are listed below. a) Random number type: We distinguish between two types of random sequences: pseudo-random numbers and quasi-random numbers. Pseudo-random numbers are generated using an algorithm that deterministically produces sequences of random bits. Quasirandom number generators produce a special type of random sequences that satisfy the low-discrepancy criteria [8]. They allow a parameter space to be more evenly covered with less experiments. Choosing quasirandom number generation approaches can help improve the convergence rate of Monte Carlo methods (see Appendix A). The important parallelization consideration is that the sequence must support “skipahead”, which allows the generator to arbitrarily skip N-elements ahead in the quasi-random sequence such that a sequence can be deterministically generated in parallel. b) State-size: the ISO C standard specifies that the rand() function uses a state size of 32-bits, and provides a random sequence with a period of at most 232 (Appendix A illustrates how state is used in random number generation). This is often not large enough for experiment generation in Monte Carlo methods. To solve an option-pricing problem, for example, it may require 16 variables per time step, a simulation scope of 1000 time steps, and the results of 1 million experiments to provide a result of the desired accuracy. To obtain independent and identically distributed experiments, it would require a sequence with a period of at least 234 . To resolve this issue, there are special pseudo random generators such as the Mersenne Twister, which provides a period of (219937 - 1). The Mersenne Twister [13] is derived from the fact that the period length is a Mersenne prime. The generator has an internal state of 624 32-bit integers. The state size is important to parallelization of random number generators as it determines the data working set of each random number generation stream. For example, on some parallel processors, the 624 32-bit integer internal state required by Mersenne twister method may be too large to keep in L1 cache if we are using the vector unit to produce multiple sequences in parallel. c) Output value type: Many uniform random sequences are generated in 32-bit unsigned integers through
logical bit operations (Appendix A illustrates how a random number generator outputs values). A typical usage model is to convert it into a 32-bit single precision floating-point (SP FP) value from 0.0 to 1.0. One should be aware that there is a loss of precision in this conversion. A SP FP value only has 24-bits of mantissa, and the number of values it can represent between 0.0-0.5 v.s. 0.5-1.0 is very different. Some algorithms may be sensitive to this difference. 2. Parameter Distribution Conversion RNG produces uniformly distributed sequences, where input of parameters of the experiments may require a variety of statistical distributions. For example, the volatility of a stock in the market can be modeled with a lognormal distribution in its price; the performance of a computer circuit element affected by manufacturing process variation can be modeled by a normal distribution in its delay. In performing the distribution conversion, it is important to provide an array of values to be converted where possible, as many library routines are optimized to use vector instructions to efficiently take advantage of instruction level parallelism in processing an array of inputs. There are many off-the-shelf math libraries that are optimized for distribution conversion. One such example is the Intel Math Kernel Library (MKL) [9]. For implementations that use quasi-random sequences with good uniformity properties, it is also important for the distribution conversion model to preserve the uniformity of the quasi-random sequences. In [6], it was observed that practically, some conversion methods were better at conserving the uniformity properties and lead to better convergence rate than others. The Box-Muller method [4] is based on a polar transformation and uses two of the coordinates of each random point in the unit N hypercube to generate pairs of independent standard normal variates. This method can conserve the uniformity properties of a quasi-random number sequence. float2 BoxMuller(float u0, float u1) { float r=sqrt(-2 log(u0)); float theta=2*PI*u1; return make_float2(r*sin(theta), r*cos(theta)); }
The above code illustrates the Box-Muller method. The method takes in two FP values “u0” and “u1”, both between 0.0 and 1.0, and produces a pair of floating point values in the normal distribution. One should note that the uniformity property can only be inherited if “u0” and “u1” are produced by two independent random sequences of quasi-random numbers. If they come from the same sequence of quasi-random numbers, the uniformity property will be destroyed. Having to maintain two quasi-random sequences increases the complexity of parallelization. This complexity is manageable as the internal states of a quasirandom number generators such as Sobol are only two 32-bit values (32-bit gray code representation of the index and a 32-bit state) [2].
3. Multi-parameter Correlation The cross correlation between N parameters is usually handled by a N ×N correlation matrix. This operation can be efficiently handled by BLAS library available on various platforms. Specific specialized handling of matrix operations can follow the Dense Linear Algebra Pattern.
5.2.2
Task-centric perspective
Statistical sampling of the parameter space with the Monte Carlo method involves running thousands to millions of independent experiments. With respect to the structure of an implementation of the Monte Carlo method, the experiment execution step often has the most amount of parallelism. When there is an analytical equation being applied in each scenario, the experiment execution step may be refactored into a set of matrix operations using existing wellparallelized BLAS library routines, or by referencing the Dense Linear Algebra Pattern. Figure 2 illustrates a sample problem reconfiguration into a dense linear algebra routine. The problem is to calculate the values of m financial instruments using k experiments, each consisting of n coefficients and n random variables. We can formulate this problem using Dense Linear Algebra and use the BLAS3 library to efficiently solve it. For simple experiments, this step is often perceived as the “Map” part of a MapReduce Structural Pattern. The experiments can be trivially mapped onto separate execution units using Task Parallelism Strategy Pattern. In this case the problem is expressed as a collection of explicitly defined tasks, and the the solution is composed of techniques to farm out and load balance the task on a set of parallel resources. However, the operations to determine value of an instrument in a given experiment might not be as simple as an “add” and “multiply”, and some non-linear “max” and “min” operations are required. In that case, it is hard to come up with a simple linear algebra formulation, and we need to use other parallelization strategies to improve performance of the application.
output function (as shown in Figure 5) in a synchronized fashion. When the experiments require hundreds to thousands of parameters to be generated, there is enough concurrency in the application to parallelize the RNG on a highly parallel platform. However, when the experiments only require a few parameters, there needs to be a different set of parallelization strategies. With RNG that provides skip-ahead capability, state-transition functions can be constructed that transition the internal state of the RNG n steps ahead. This allows multiple segments of the same random number sequence to be generated at the same time. The Sobol quasirandom number generator allows skip-ahead to be implemented [8]. With no skip-ahead capability, the most efficient output data layout will be storing different sequences into consecutive memory locations as they are produced. With skip-ahead capability, output data can be produced with consecutive elements storing data in the same sequence, or with consecutive elements storing data across different sequences. 2. Distribution conversion step: The distribution conversion step usually is a one input to one output operation where an algorithm converts one value 0 < v ≤ 1 according to the cumulative probability density function of a particular statistical distribution. If this is the case, this step is very flexible in terms of its data structure requirement. However, with the Box-Muller method [4], each computation requires two independent input values from two independent random sequences and produces two independent normally distributed values. One approach to meet this special need is to allow each thread process two random sequences at a time. The implication of this approach is that the working set of the algorithm will be doubled, which may increase memory resource contention during execution. 3. The experiment execution step:
5.2.3
Data-centric Perspective
A problem to be solved by the Monte Carlo Methods often requires a small set of parameters, involves a large amount of simulation, and outputs a concise aggregation of the results. In the data-centric perspective, we focus on optimizing the memory access patterns between RNG, distribution conversion step, experiment execution, and result analysis.
1. RNG step: In the RNG step, one usually generate as many sequences of random numbers as there are parameters,as different parameters often require random sequences that are independent and identically distributed. This is often achieved using the same algorithm for random number generation, but using a different random seed for each parameter. This technique is ideal for parallelization using vector instructions with the SIMD Pattern. In this case, each SIMD lane will be generating one sequence of random numbers and all lanes will be going through the state-transition function and
The experiment execution step is the most-compute intensive step. If there exist particular high-performance libraries that can accelerate this step well, the data structure should be based on the input requirement of the high-performance library routines. When there does not exist high-performance library routines that can implement the experiments efficiently, the experiments can mapped onto separate execution unit using Task Parallelism Strategy Pattern. In this case, each experiment can be mapped to a SIMD lane on an execution unit, and it would be more efficient for the RNG to output random sequences from the same generator consecutively in memory. 4. Result analysis step: The results from all the experiments can be stored as a vector or can be reduced as they are produced. If the results are stored as a vector, it can be manipulated efficiently with a variety of data parallel operations. In conclusion for the data-centric perspective, the most efficient data layout depends on the implementation
.,%/,(0#%,&"(1'$,+"&(2*"-3'&4*%(
!"#$#%&'()"*+',-(
Valk 0 = a0 w 0 + a1w1 + ...+ an w n Valk1 = b0 w 0 + b1w1 + ...+ bn w n ..... Valkm = m0 w 0 + m1w1 + ...+ mn w n
!
7!!"
!
!
a0
a1
!!"
an
b0
b1
!!"
bn
!
!!" !!" !!" !!"
!
m0
!
!
! m1
!
!!"
mn
#" ! !
!
zn !
"-'&'./&0"/',1%2*3'1,"
!
3453%/*3'1,"
3453%/*3'1,"
%&'()*"+&%,"
7!!"
"-'&'./&0"/',1%2*3'1,"
.)363'1,"
Val00 = a0v 0 + a1v1 + ...+ an v n Val01 = b0v 0 + b1v1 + ...+ bn v n ..... Val0m = m0v 0 + m1v1 + ...+ mn v n
!
!
v0
w0
!!"
z0
v1
w1
!!"
z1
!
!!" !!" !!" !!"
!
vn
!
!
! wn
!
!!"
$" ! !
val00 val10
!!"
val0m
val01 val11
!!"
val1m
!
!!" !!" !!" !!"
! ! val 0m val1m
!!"
valkm
!
!
Figure 2: An example of mapping the original problem of generating vali values for k experiments, using n coefficients a...m for m financial instruments and n random variables vi
Circle Area = " r=1
Square Area = 4
(0,0)
" = 4# !
Circle Area Square Area
Figure 3: π estimation problem
INVARIANT
Precondition: Experiments are independent and identically distributed to achieve statistically valid sampling. Invariant: Solution converges as more experiments are conducted.
7.
EXAMPLES
We illustrate this pattern with four examples: the π estimation, the financial market Value-at-Risk estimation, an option pricing application, and a molecular dynamics application. The first is a pedagogical example demonstrating the implementation structure of a Monte Carlo method. The later examples are real application example.
7.1 7.1.1
Example 1:
Application Structure
1. Experiment generation: Uniformly choose x,y positions in a 2x2 square at random. The center of the square is co-located with the center of circle at x = 0, y=0 2. Experiment execution: If x2 + y 2 < 1, then point is inside circle 3. Result aggregation: π =
!
strategy of the experiment execution step. In this case, Geometric Decomposition Pattern can be used to block the computations between the RNG step and the experiment execution step.
6.
7.1.2
π
Estimation
Application Description
This is a simple example to illustrate basic principles of Monte Carlo Methods. The problem is to estimate π by throwing darts at a square inscribed with a circle. We approximate π by calculating percentage of the darts that fall in the circle. Figure 3 illustrates the problem setup. We use the formulas: Area of square = (2r)2 = 4 Area of circle = π ∗ r2 = π
7.2
4∗# points inside # points total
Example 2: Financial Market Value-at-Risk Estimation
With the proliferation of algorithmic trading, derivative usage and highly leveraged hedge funds, there is an increasing need to accelerate financial market Value-at-Risk (VaR) estimation to measure the severity of potential portfolios losses. VaR estimation of portfolios uses the Monte Carlo method. It requires a small set of parameters to setup the estimation process, involves a large amount of computation, and outputs a concise set of risk profiles as the result. As shown in Figure 4, there are four main steps in the application and the parallelization involves many levels of optimization.
7.2.1
Application Description
The inputs to VaR are assumption about the market and the portfolio, which are encoded in the distribution characteristics of the parameters generated and the coefficients (or influence) of the market risk factors. The outputs are the estimated portfolio prices when they are affected by hypothetical sets of market risk factors. The application follows the three steps described in the solution section. Figure 4a illustrates the steps and explicitly describe the experiment generation in two steps. Figure 4b illustrates the standard implementation, where each white circle represents the computation for each scenario at each algorithmic step. In the standard implementation, each step is performed separately, with the grouping shown with the dark boxes.
!"#$
that is being applied in each experiment. The experiment execution can successfully leverage the Dense Linear Algebra Pattern, and be re-factored into a sequence of calls to BLAS libraries to estimate the risk exposure. Further optimizations involving reformulating the portfolio loss function to use a precomputed, deterministic part of the delta term. This precomputation enables the bottleneck matrix-matrix computation to be replaced with a matrix-vector operation, reducing computation by a factor of O(N ), where N is the number of risk factors. The computation can be further optimized with the Geometric Decomposition Pattern, where the problem can be broken down to blocks that fit into memory. This is illustrated in Figure 4c, where the dark box over scenarios are separated into blocks.
,-(./'!&%!E;/#*'$&7!5&!E$(->*5/!
!"#$%&'(!)*#+&(!! !,-(./'!0/#/'*1! !2"),03!
2N3!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!K!K!K!! 2J3!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!K!K!K! 2L3!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!K!K!K!
!%#$
!4*'*(/5/'! !6$75'$.-1! !8'*#7%&'(*1!
!$
!9&77!:-#;1! !-*1! !29$#/*'!?>@/.'*3!
!'#$
!6*5*! !?77$($>*1!
2M3!
2.3!8AB$;*>!C/!D*'>&!E$(->*1!
f (x)
2N3! 2J3!! 2L3! 2M3!
!
2*3!:&-'!75/B7!%&'!*!G*)!;&(B-5*1!! !!!!!!.*7/+!!C/!D*'>&!7$(->*1!
2;3!C/!D*'>&!7$(->*1!&%!(*'F/5!G*)!!! !!!!!!&B1($H/+!%&'!04"!'/7&-';/!I$/'*';IA!
Figure 4: Solution Structure for the value at risk estimation application
Mathematically, consider a portfolio whose value P (R1 , R2 , . . . , RN ) is a non-linear function in N correlated market risk factors Ri (t) at time t. Under a delta-gamma approximation, the portfolio incurs a change of value on the profit and loss (P&L) account due to market movements of the form dP =
X i,j
1 ∆i dRi + dRi Γij dRj , | {z } 2 {z } | delta
(1)
gamma
where the first and second derivatives in the portfolio ∂P value with respect to the risk factors are denoted ∆i = ∂R i 2
P and Γij = ∂R∂i ∂R - i is the index for each risk factor whose j change in value dRi over a chosen time period is a log-normal random variable. In a typical usage model on a global-scale portfolio, each experiment involves thousands of risk factors describing market parameters such as interest rate trends over time in various currencies.
1. Experiment generation: From the numerical-centric perspective, the application uses the Sobol’ quasirandom low-discrepancy sequence that has been shown in [15] to satisfy property A for uniformity for dimensions up to 16900. Box-Muller method [4] is used to transform the uniform random variables to normally distributed random variables, while preserving the distribution uniformity property of the original sequence. The correlation between risk factors is taken care of during the experiment execution. From the data-centric perspective, the application requires enough random variables for each thread to be generating its own Sobol quasi-random sequences. The distribution conversion step using the Box-Muller method is merged into Sobol sequence generation step for reduced overhead. This optimization is illustrated in Figure 4c, where the dark box over step 1 and 2 are merged. 2. Experiment execution: From the task-centric perspective, the application uses an analytical equation
3. Result aggregation: The results from the experiment with various market conditions are sorted and the thresholds at 5 percentile or 1 percentile worst outcome are used to provide the VaR estimate for the amount of 1-in-20 or 1-in-100 chance of portfolio loss.
7.3
Example 3: Option Pricing
We are trying to determine the present-day-value of a contract with this scenario: “In 3 months’ time a buyer will have the option to purchase Microsoft Corp. shares from a seller at a price of $25 per share.” The key point is that the buyer has the option to buy the shares. Three months from now, the buyer may check the market price and decide whether or not to exercise the option. The buyer would exercise the option if and only if the market price were greater than $25, in which case the buyer could immediately re-sell for an instant profit. This deal has no downside for the buyer once he/she pays the up-front cost of purchasing the option. Three months from now the buyer will either make a profit or walks away. The seller, on the other hand, has no potential gain and an unlimited potential loss. To compensate, the cost for buyer to enter into the option contract must be carefully computed. The process in which we determine the cost of the up-front payment is the option-pricing problem2 . Merton’s work expanded on that of two other researchers, Fischer Black and and Myron Scholes, and the pricing model became known as the Black-Scholes model. The model depends on a constant (Sigma), representing how volatile the market is for the given asset, as well as the continuously compounded interest rate r. The Monte Carlo Method approach takes M number of trials as input, where M could be 1,000 to 1,000,000 large depending on the accuracy required for the result. The pseudo code for one individual experiment is shown below. // A Sequential Monte Carlo simulation of the // Black-Scholes model 1: for i = 0 to M ^ a´Lˇ S 1 do 2: t := S * exp ((r - 0.5*sigma^2)* T + 2 In 1973, Robert C. Merton published a paper [5] presenting a mathematical model, which can be used to calculate a rational price for trading options. (He later won a Nobel prize for his work.) In that same year, options were first traded in the open market.
sigma * sqrt(T) * randomNumber()) 3: trials [ i ] := exp(-r * T ) max{t - E, 0} 4: end for 5: mean := mean( trials ) 6: stddev := stddev( trials , mean) Where: S : asset value function, E : exercise price, r: continuously compounded interest rate, Sigma: volatility of the asset, T : expiry time, M : number of trials The pseudo code also uses several internal variables: • trials : array of size M , each element of which is an independent trial (iteration of the Black-Scholes Monte Carlo method) • mean: arithmetic mean of the M entries in the trials array • randomNumber(), when called, returns successive (pseudo)random numbers chosen from a Gaussian distribution. • mean(a) computes the arithmetic mean of the values in an array a • stddev(a, mu) computes the standard deviation of the values in an array a whose arithmetic mean is mu. • confwidth: width of confidence interval • confmin: lower bound of confidence interval • confmax: upper bound of confidence interval
at random. Metropolis Monte Carlo can be used to compute the set of canonical ensemble averages for the Ising model 3 in the field of Molecular Dynamics. More formally, the Metropolis Monte Carlo is a computational approach for generating a set of N configurations N of the system η1 , η2 , η3 ...ηN such that limN →+∞ Nη = P (η) where P (η) is a given probability distribution (the Boltzman distribution) and Nη is the number of configurations η (e.g. the number of configurations generated with particular spins S(η) of the Ising model). The algorithm is as follows:
Pick any configuration eta_n For N where N is large Pick a trial configuration eta_t Compute ratio R=eta_n/eta_t Generate Random number p (between 0 and 1) If p