An Introduction to Simulation Techniques

3 downloads 0 Views 2MB Size Report
First of all we should know that what is simulation? Presumably a simple .... or she might use a random walk algorithm for approximating the point .... In order to get a generator started, we further need an initial seed value for x. It ..... programmed the ENIAC computer to carry out Monte Carlo ..... that doesn't depend on θ1.) 3.
An aIntroduction to Simulation Techniques

Abhay Pratap Singh M.Sc. (Statistics) Banaras Hindu University

A Project report On

AN INTRODUCTION TO SIMULATION TECHNIQUES

Submitted By ABHAY PRATAP SINGH Under the guidance of Dr. P. S. PUNDIR DST - CIMS BANARAS HINDU UNIVERSITY VARANASI

1|P ag e

CERTIFICATE This is to certify that the project report entitled “AN INTRODUCTION TO SIMULATION TECHNIQUES” has been prepared and submitted by Mr ABHAY PRATAP SINGH to DST-CIMS, Banaras Hindu University towards fulfilment of winter internship program under my supervision. This project has been completed and conducted under my guidance and supervision.

Dr. P. S. PUNDIR DST - CIMS BANARAS HINDU UNIVERSITY

2|P ag e

ACKNOWLEDGEMENT My foremost thanks go to GOD who guided me all the time & my parents for their blessings. I am very obliged and give my heartiest gratitude to Prof. Umesh Singh co-ordinator of the DST-CIMS, B.H.U. for having permitted us to carry out this project work on “AN INTRODUCTION TO SIMULATION TECHNIQUES” especially for providing all other facilities which was necessary for preparing this project report. My sincere thanks and regard to all the teachers of DST-CIMS and department of statistics B.H.U for their co-operative behaviour and persistent encouragement during the entire phase of my work. I have a great pleasure in expressing my deep sense of gratitude and indebted to Dr. P. S. PUNDIR (DST-CIMS), my project supervisor, for his continuous guidance and precious suggestion at all stages during the course of this work, In his natural parental style he has provided constant support and encouragement in the successful completion of this work in its present form. I would like to express my sincere thanks to Prof. S. K. Upadhyay (Department Of Statistics) who inspires me to do an analytic work provided his valuable suggestions, create more interest in topic and aware me towards modern techniques with practical aspects of statistics. I am very grateful to entire class for their cooperation. I am also thankful to all the official staff of DST-CIMS, their help throughout my whole work.

Abhay Pratap Singh

3|P ag e

TABLE OF CONTENTS Sr. No.

CONTENTS

Page No.

1.

INTRODUCTION

5

2.

THE GENERATION OF RANDOM NUMBERS

10

3.

METHODS OF GENERATING RANDOM NUMBERS

14

4.

GENERATION OF NORMAL RANDOM VARIATES

25

5.

GENERATION OF GAMMA RANDOM VARIATES

29

6.

MONTE CARLO TECHNIQUES

32

7.

MARKOV CHAIN MONTE CARLO

39

8.

BOOTSTRAPPING

45

9.

APPENDIX

50

10. BIBLIOGRAPHY

60

4|P ag e

1. INTRODUCTION First of all we should know that what is simulation? Presumably a simple question, but the scientific community is far from a consensus as to the answer. A government administrator might decide to simulate the national effect of a voucher system by taking a single school district and implementing a voucher system there. To a geologist, a simulation might be a three-dimensional differentialintegral equation dynamic (hence, four-dimensional) model of implementation of tertiary recovery for an oil field. To a numerical analyst, a simulation might be an approximation-theoretic point wise function evaluator of the geologist's dynamic model. To a combat theorist, a simulation might consist of use of the Lanchester equations to conjecture as to the result of a battle under varying conditions. To a nonparametric bootstrapper, simulation might consist in resampling to obtain the 95% confidence interval of the correlation coefficient between two variables. While all of the above may be legitimate definitions of simulation, we shall concentrate on the notion of a simulation being the generation of pseudo data on the basis of a model, a database, or the use of a model in the light of a database. Some refer to this as stochastic simulation, since such pseudo data tends to change from run to run. It is to be noted that the major component of simulation is neither stochasticity nor determinism, but rather, analogy. Needless to say, our visions of reality are always other than reality itself. The real world can be viewed as being composed of systems. A system is a set of related components or entities that interact with each other based on the rules or operating policies of the system. A model is an abstracted and simplified representation of a system at one point in time. A model? What is that? Again, the consensus does not exist. We shall take a model to be a mathematical summary of our best guess as to what is going on in a part of the real world. We should not regard a model as reality, or even as a 5|P ag e

stochastic perturbation of reality, only as our current best guess as to a portion of reality. Modelling is a powerful tool. With it, we can analyse, design, and operate complex systems. We use models to assess real-world processes too complex to analyse via spread sheets or flowcharts, testing hypotheses at a fraction of the cost of undertaking the actual activities. An efficient communication tool, modelling shows how an operation works and stimulates creative thinking about how to improve it. Models in industry, government, and educational institutions shorten design cycles, reduce costs, and enhance knowledge. Why simulation is important? Simulation involves designing a model of a system and carrying out experiments on it as it progresses through time. Models enable us to see how a real-world activity will perform under different conditions and test various hypotheses at a fraction of the cost of performing the actual activity. One of the principal benefits of a model is that you can begin with a simple approximation of a process and gradually refine the model as your understanding of the process improves. This “stepwise refinement” enables us to achieve good approximations of very complex problems surprisingly quickly. As we add refinements, the model more closely imitates the real-life process.

Definition of Simulation Simulation is the imitation of the operation of a real world process or system over time. The act of simulating something first requires that a model be developed; this model represents the key characteristic of the selected physical or abstract system or process. The model represents the system itself, whereas the simulation represents the operation of the system, over time. OR

6|P ag e

Simulation is the quantitative procedure which describes a process by developing a model of that process and then conducting a series of organised trial and error experiments to predict the behaviour of the process over time.

7|P ag e

Some diagrammatic representation:

SYSTEM

Experiment with/actual system

Experiment with /model of system

Physical Model

Mathematical Model

Analytical Model

Simulation Model

SIMULATION…. MODEL --------→

DATA

MODEL ←--------- DATA

8|P ag e

9|P ag e

2. THE GENERATION OF RANDOM NUMBERS There are many views as to what constitutes simulation. To the statistician, simulation generally involves randomness as a key component. Engineers, on the other hand, tend to consider simulation as a deterministic process. If, for example, an engineer wishes to simulate tertiary recovery from an oil field, he or she will probably program a finite element approximation to a system of partial differential equations. If a statistician attacked the same problem, he or she might use a random walk algorithm for approximating the point wise solution to the system of differential equations. Numbers chosen at random are useful in a variety of applications. For instance, in Numerical analysis, random numbers are used in the solution of complicated integrals. In computer programming, random numbers make a good source of data for testing the effectiveness of computer algorithms. Random numbers also play an important role in cryptography. In order to generate such random numbers one needs to be able to generate uniformly distributed random numbers, otherwise known as pseudo-random numbers. These pseudo-random numbers can be used either by themselves or they can be used to generate random numbers from different theoretical or empirical distributions, known as random variates or stochastic variates. There are mainly two types of random number generators: 1. Pseudo random number generators (PRNGs) 2. True random number generators (TRNGs)

10 | P a g e

Pseudo-random numbers As the word ‘pseudo’ suggests, pseudo-random numbers are not random in the way we might expect Essentially, Pseudo random number generators (PRNGs) are algorithms that use mathematical formulae to produce sequences of numbers that appear random. In essence, there is no such a thing as a single random number. Rather, we speak of a sequence of random numbers that follow a specified theoretical or empirical distribution. There are two main approaches to generating random numbers. In the first approach, a physical phenomenon is used as a source of randomness from where random numbers can be generated. Random numbers generated in this way are called true random numbers. A true random number generator requires a completely unpredictable and non-reproducible source of randomness. Such sources can be found in nature, or they can be created from hardware and software. For instance, the elapsed time between emissions of particles during radioactive decay is a well-known randomized source. True random numbers are ideal for critical applications such as cryptography due to their unpredictable and realistic random nature. However, they are not useful in Computer simulation. An alternative approach to generating random numbers, which is the most popular approach, is to use a mathematical algorithm. Efficient algorithms have been developed that can be easily implemented in a computer program to generate a string of random numbers. These algorithms produce numbers in a deterministic fashion. That is, given a starting value, known as the seed, the same sequence of random numbers can be produced each time as long as the seed remains the same. Despite the deterministic way in which random numbers are created, these numbers appear to be random since they pass a number of statistical tests designed to test various properties of random numbers. In view of this, these random numbers are referred to as pseudo-random numbers.

11 | P a g e

An advantage of generating pseudo random numbers in a deterministic fashion is that they are reproducible; since the same sequence of random numbers is produced each time we run a pseudorandom generator given that we use the same seed. This is helpful when debugging a simulation program, as we typically want to reproduce the same sequence of events in order to verify the accuracy of the simulation. We note that the term pseudo-random number is typically reserved for random numbers that are uniformly distributed in the space (0, 1). All other random numbers, including those that are uniformly distributed within any space other than (0, 1), are referred to as random variates or stochastic variates. For simplicity, we will refer to pseudo-random numbers as random numbers. In general, an acceptable method for generating random numbers must yield sequences of numbers or bits that are: 1. 2. 3. 4.

Uniformly distributed Statistically independent Reproducible, and Non-repeating for any desired length.

The characteristics of TRNGs are quite different from PRNGs. First, TRNGs are generally rather inefficient compared to PRNGs, taking considerably longer time to produce numbers. They are also nondeterministic, meaning that a given sequence of numbers cannot be reproduced, although the same sequence may of course occur several times by chance. TRNGs have no period.

Comparison of PRNGs and TRNGs The table below sums up the characteristics of the two types of random number generators.

12 | P a g e

Characteristic

Pseudo-Random Number Generators

True Random Number Generators

Efficiency

Excellent

Poor

Determinism

Deterministic

Nondeterministic

Periodicity

Periodic

Aperiodic

These characteristics make TRNGs suitable for roughly the set of applications that PRNGs are unsuitable for, such as data encryption, games and gambling. Conversely, the poor efficiency and nondeterministic nature of TRNGs make them less suitable for simulation and modelling applications, which often require more data than it's feasible to generate with a TRNG. The following table contains a summary of which applications are best served by which type of generator: Application

Most Generator

Lotteries and Draws

TRNG

Games and Gambling

TRNG

Random Sampling (e.g., drug screening)

TRNG

Simulation and Modelling

PRNG

Security (e.g., generation of data encryption TRNG keys)

13 | P a g e

Suitable

3.

METHODS OF GENERATING RANDOM NUMBERS

The first method for creating random numbers (or arithmetic generator), proposed by Von- Neumann & Metropolis in 1940, is Mid-square method. His idea was to start with a four digit positive integer say X0 and square it to obtain a integer with up to 8 digits; if necessary append zero to the left to make it exactly 8 digit number ,take the middle four digits of this 8 digit number as the next four digit number say X1. Place a decimal point at the left of X1 to obtain the first uniform random number U1. Then X2 be the middle four digit of X1 square and let U2 be the X2 with a decimal point at the left, continue the procedure until your desire. e.g.

14 | P a g e

In fact it does not work very well at all. One serious problem is that it has a strong tendency to degenerate fairly rapidly to zero where it would stay forever. It is not random at all. The mid-square method was relatively slow and statistically unsatisfactory. It was later abandoned in favour of other algorithms.

Linear congruenital Method The great majority of random number generator in used today is linear congruenital generator. A Linear Congruenital Generator (LCG) represents one of the oldest and best-known pseudorandom number generator algorithms. The theory behind them is easy to understand, and they are easily implemented and fast. The advantage of this congruential method is that it is very simple, fast, and it produces pseudo-random numbers that are statistically acceptable for computer simulation. The congruential method uses the following recursive relationship to generate random numbers.

xi+1 = axi + c (mod m) Where xi, a, c and m are all non-negative numbers. m: the “modulus” a: the “multiplier” c: the “increment” x0: the “seed” or “start value” , are integer constants that specify the generator. If c = 0, the generator is often called a multiplicative congruential method, or Lehmer RNG. If c ≠ 0, the generator is called a mixed congruenital method. Given that the previous random number was xi, the next random number xi+1 can be generated as follows. Multiply xi by “a” and then add c. Then, compute the modulus m of the result. That is, divide the result by m and set xi+1 equal to the remainder of this division.

15 | P a g e

For example, if x0 = 0, a = c = 7 and m = 10 then we can obtain the following sequence of numbers: 7, 6, 9, 0, 7, 6, 9, 0... The numbers generated by a congruential method are between 0 and m-1. Uniformly distributed random numbers between 0 and 1 can be obtained by simply dividing the resulting xi by m. The number of successively generated pseudo-random numbers after which the sequence starts repeating itself is called the period. If the period is equal to m, then the generator is said to have a full period, period depends on m. The larger the value of m, the larger is the period. In particular, the following conditions on a, c, and m guarantee a full period:  m and c have no common divisor.  a = 1 (mod r) if r is a prime factor of m. That is, if r is a prime number (divisible only by itself and 1) that divides m, then it divides a-1.  a = 1 (mod 4) if m is a multiple of 4. It is important to note that one should not use any arbitrary values for a, c and m. Systematic testing of various values for these parameters has led to generators which have a full period and which are statistically satisfactory. A set of such values is: a = 314, 159, and 269, c = 453, 806, 245, and m = 232 (for a 32 bit machine). In order to get a generator started, we further need an initial seed value for x. It will become obvious later on that the seed value does not affect the sequence of the generated random numbers after a small set of numbers has been generated.

The Inverse transformation Method For generating sample numbers at random from any probability distribution given its cumulative distribution function (cdf). Here we discuss the technique for generating random numbers with a specific distribution. Random numbers following a specific distribution are called random variates or stochastic variates. Pseudo-random numbers 16 | P a g e

which are uniformly distributed are normally referred to as random numbers. Random numbers play a major role in the generation of stochastic variates. There are many techniques for generating random variates. The inverse transformation method is one of the most commonly used techniques. This is discussed below. Suppose we wish to generate a random variate X (i.e. continuous) & have the distribution function F(x). Since F(x) is a nondecreasing function. Let F-1 denote the inverse of the function F. This method is based on the following proposition i.e. Let U be a uniform (0, 1) random variable. For a continuous distribution function function F the random variable X defined by X= F-1 (U) Has distribution F. [F-1 (u) is defined to be that value of x such that F(x) = u.] Algorithm: 1. Generate U ~ U (0, 1) 2. Find X such that F(X) = U Return X= F-1 (U) Example For exponential distribution The density is f(x) = λe-λx And so the CDF F(x) = 1- e-λx Set F(X) = U and solve for U -λx 1- e = U e-λx =1-U -λx = log (1-U) X = -log (1-U)/ λ Algorithm  generate random number U~U(0,1)  set X = -(1/ λ) *log(1-U)

17 | P a g e

In Discrete Case Suppose X is discrete with cumulative distribution function (CDF) F(x) = P(X ≤ x) for all real numbers x and probability mass function p(xi) = P(X = xi), Where x1, x2, ... are the possible values that X can take, so the algorithm will be –

Algorithm: 1. Generate U ~ U(0,1) 2. Find the smallest positive integer i such that U ≤ F(xi) 3. Return X = xi Disadvantages: 1. It requires a closed form expression for F(x). 2. This method may not be fastest way to generate the random variables for a distribution, often very slow because a number of comparisons required. Advantages: Inverse transform method preserves monotonicity and correlation which helps in 1. Variance reduction methods. 2. Generating truncated distributions. 3. Order statistics…

18 | P a g e

Composition Method: The composition technique applies when the distribution function F from which we wish to generate, can be expressed as a convex combination of other distribution functions F1, F2, … We would hope to be able to sample from Fi’s more easily than the original distribution function F, specially, assume that for all x, we have ∞

∑j=1 pj F(xj)

∑j p j

Algorithm: 1. Generate a positive random integer J such that P(J = j) = pj. 2. Return X with CDF FJ (given J = j, X is generated independent of J.) or X ~ Fj Generally X in step 2 should be done of course, independent of J by an appropriate method ∑ ∑ ∑

It is useful for generating from compound distributions such as the hyper exponential or from compound Poisson processes. It is also frequently used to design specialized algorithms for generating from complicated densities. The idea is to partition the area under the complicated density into pieces, where piece j has surface pj. To 19 | P a g e

generate X, first select a piece (choose piece j with probability pj), then draw a random point uniformly over that piece and project it to the horizontal axis. If the partition is defined so that it is fast and easy to generate from the large pieces, then X will be returned very quickly most of the time. The rejection method with a squeeze is often used to generate from some of the pieces. For example Generation from double exponential or Laplace density We know that the density function f(x) = 0.5e | x | f(x) = 0.5ex I(-∞,0)(x) + 0.5e-x I(0,∞) (x)

Algorithm 1. Generate U1 and U2 iid uniform ~ U(0,1) 2. To test the condition If U1 ≤ 0.5 return X=logU2 & If U1 > 0.5 return X=-logU2

20 | P a g e

Convolution method A number of distributions can be expressed in terms of the (possibly weighted) sum of two or more random variables from other distributions. (The distribution of the sum is the convolution of the distributions of the individual random variables). Suppose desired RV X has same distribution as Y1 + Y2 + ... + Ym, where the Yj’s are IID and m is fixed and finite Hence we can write X = Y1 + Y2 + ... + Ym, Suppose also that it's easy to generate the Yi's. Then it is straightforward to generate a value of X: Algorithm: 1. Generate Y1, Y2,..,Ym independently from their distribution(Yi’s ~ G) 2. Return X = Y1 + Y2 + ... + Ym (or set X ← Σi Yi) For example X ~ m-Erlang with mean b > 0 Express X = Y1 + Y2 +...+Ym where Yj’s ~ IID exponential with mean . Note that the speed of this algorithm is not robust to the parameter m.

Acceptance-Rejection Method Rejection method is based on the following fundamental properties of density. Let X be a random variable with density f on ΙR and U be an independent U (0, 1) Random variable, then {X, U C f(x)} is uniformly distributed on region A. Where { } and c is positive, is an arbitrary constant. & Vice versa. If (X, U) is a random variable on ΙR2 uniformly distributed on A, then X has density f on ΙR.

21 | P a g e

Basic version of rejection method The basic version method assume that existence of density g and knowledge of the constant such that Random variates with density f can be obtain as given by the following algorithm  Repeat  Generate two independent random variates X ~ g & U ~ U (0, 1)  Set T ←  Until UT ≤1  Return X. There are three things are needed before we can apply the actual rejection method.  A dominating density g.  A simple method for generating g.(i.e. an easy generator for g )  Knowledge of C. Basically g must have heavier tails and sharper infinite peaks than f. In some situation we can determine Cg(x) for entire class of densities x. Finally, Cg(x) must be such that the algorithm is efficient. Let N be the number of iterations in the algorithm i.e. no. of pairs (X, U) required before the algorithm halts. Let p be the probability of accepting x

Since ∫ ( 22 | P a g e

)

∫(

)

∫ And it can also be shown that

Since the distribution of N is Geometric with parameter p. The method has almost unlimited potential. Generally speaking g is chosen from a class of easy densities, (e.g. Uniform, triangular, gamma, t-distribution/split-t, normal/multivariate Normal distribution…) There are two major techniques for determining C and g, when . One could study the form of f and obtain several devices to set of inequalities. In a second approach one starts with a family dominating density g and chooses the density within that class for which C is smallest. The approach is structured but often leads to tough optimization problem. For a given density f and g the rejection constant C should be at least equals to

It was nicely in most of the situation rather all, we cannot lose anything by setting C equal to be this supremum, because this ensures us that the curves of f and g touches each other somewhere. Instead of letting g be determining by some inequality, it is often wiser to take the best gθ in a family of densities parameterized by θ. defining the optimum constant by Cθ. And find out the optimal θ, at that value for which Cθ is minimal (if possible to close enough to unity). 23 | P a g e

Generalization of rejection method There are several generalization of rejection method, one such generalization assume say (0, 1) valued function ψ and defines. Where g is an easy density often workout by the principle given earlier and C is a normalising constant at least equal to unity.

1. 2. 3. 4.

Algorithm: Repeat Generate independent random variates X and U with X~g ,& U~U(0,1) Until U ≤ ψ(x) Return.

24 | P a g e

5. GENERATION OF NORMAL RANDOM VARIATES We have not yet seen how to generate normal random variables though they are of course very important in practice. Though important in their own right, we need to be able to generate normal random variables so that we can then generate lognormal random variables. Note that if

So we need only worry about generating N (0, 1) random variables. One possible generation method is the inverse transform method, but we would have to use numerical methods since we cannot find F-1(x) in closed form, so not very efficient. Generation of standard normal variates from rejection method using Laplace density We know that

√ & we know that Laplace density is

So it can be used as envelop function. We can write

25 | P a g e

OR And therefore we can also write √



This is same as required for rejection method, i.e. Where √ √ And hence Laplace density can be used as envelop density in order to generate random samples from standard normal distribution. So final algorithm can be written as 1. Repeat 2. Generate independent random numbers Ui ~ U(0,1) X ~ Laplace (i.e. g(x)) 3. Set T= 4. Until , UT ≤ 1 5. Return X. Once standard normal variates generates, after that we can easily generate samples from some other specific distributions (as Lognormal, Chi-square, t-distn. and F- Distributions…). Generating lognormal random variables Suppose distribution, i.e. 26 | P a g e

.Then

has a lognormal

Generating chi-square random variables Suppose Then has a chi-square distribution with 1 degree of freedom, i.e.

Suppose now that For Then Y = X1 +…+Xn has a chi-square distribution with n degrees of freedom, i.e.,

Generating t – random variable Suppose

with X and Y independent.

Then ⁄ ( ) Follows t-distribution with n degrees of freedom.

The normal random variable is constructed using two independent uniform random variables. This transformation is well known as the Box-Muller (1958) transformation and is shown as follows. Let U1 and U2 be uniform random variables between zero and one. Suppose that U1 is independent of U2. Consider the following transformation: √ √ Where we have

27 | P a g e

Then, the inverse transformation is given by: (

)

From this transformation the Jacobean is obtained as

||

||

(

| |

)

(

(

) | |

)

Let be the joint density of X1 and X2 and be the joint density of U1 andf U2 . Since U1 and U2 are assumed to be independent , we have the following : Where are the density functions of U1 and U2,respectively. Note that because U1 and U2 are uniform random variables between zero and one, according the joint density of X1and X2 is: (

(

)

( √

(

) )

)



(

)

This is a product of two standard normal distributions. Thus, X 1 and X2 are mutually independently distributed as normal random variables with mean zero and variance one. 28 | P a g e

6.

GENERATION OF GAMMA RANDOM VARIATES

Generating gamma random numbers is an old and very important problem in the statistical literature. Particularly, in recent days because of popularity of MCMC techniques it has gained more importance. Several methods are available in the literature to generate Gamma random numbers. It is well known that the available algorithms can be divided into two distinct cases. Case 1: Shape parameter < 1, Case 2: Shape parameter > 1. Although several methods are available for case 2, but for case 1, mainly two methods are well known; (a) the most popular and very simple method proposed by Ahrens & Dieter, (b) the modified Ahrens & Dieter's method proposed by Best. Both the methods mainly use the majorization functions and the acceptance-rejection principle. It is known that Best's method has lower rejection proportion than Ahrens & Dieter's method. Since now a day particularly for MCMC sampling, often a large number of gamma random numbers needs to be generated, one naturally prefers an acceptance-rejection method which has lower rejection proportion to achieve larger period of the corresponding generator. We denote the density function of a gamma random variable with scale parameter β=1, and shape parameter α.

When And when

29 | P a g e

then it’ll simply exponential distribution. then it’ll be same as chi square generation.

When α1

So now we try to find out a normalizing constant C such that. ∫ So after solving for C

And therefore the density r(x), can be written as

0

r(x) =

f If x >1

30 | P a g e

And so corresponding CDF can be written as

if 0 ≤ x < 1

R(x) =

if x > 1

R(x) = u R-1(u) = x

X= Otherwise

Where u ~ U (0, 1). Algorithm –  Generate u ~ U (0,1) To test the condition  If u ≤ 1/b , return X= (bu)^ α  If u >1/b ,return X= Therefore we can generate independent random samples from gamma distribution using above algorithm when .

31 | P a g e

7.

MONTE CARLO TECHNIQUES

Monte Carlo officially refers to an administrative area of the Principality of Monaco, specifically the ward of Monte Carlo/Spélugues, where the Monte Carlo Casino is located. Monte Carlo methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results i.e. by running simulations many times over in order to calculate those same probabilities heuristically just like actually playing and recording your results in a real casino situation: hence the name. They are often used in physical and mathematical problems and are most suited to be applied when it is impossible to obtain a closed-form expression or infeasible to apply a deterministic algorithm. Monte Carlo methods are mainly used in three distinct problems: optimization, numerical integration and generation of samples from a probability distribution. The modern version of the Monte Carlo method was invented in the late 1940s by Stanislaw Ulam, while he was working on nuclear weapon projects at the Los Alamos National Laboratory. It was named, by Nicholas Metropolis, after the Monte Carlo Casino, where Ulam's uncle often gambled. Immediately after Ulam's breakthrough, John von Neumann understood its importance and programmed the ENIAC computer to carry out Monte Carlo calculations. Monte Carlo is often the preferred method for evaluating integrals over high-dimensional domains.

MONTE CARLO INTEGRATION The original Monte Carlo approach was a method developed by physicists to use random number generation to compute integrals. Let g(x) be a function and suppose we wanted to compute where ∫ 32 | P a g e

To compute the value of, note that if U is uniformly distributed over (0, 1), then we can express as

If are independent uniform(0,1) random variables, it thus follows that the random variables g(U1), g(U2),…, g(Uk) are independent and identically distributed random variables having mean . Therefore, by the strong law of large numbers, it follows that, with probability 1, ∑



E[g(U)] =

as

Hence we can approximate by generating a large number of random numbers ui and taking as our approximation the average value of g(ui). This approach to approximating integral is called Monte- Carlo approach. OR Suppose we wish to compute a complex integral ∫ If we can decompose h(x) into the production of a function f(x) and a probability density function p(x) defined over the interval (a, b), then note that ∫



So that the integral can be expressed as an expectation of f(x) over the density p(x). Thus, if we draw a large number x1,x2,…,xn of random variables from the density p(x), then ∫



This is referred to as Monte Carlo integration.

33 | P a g e

Monte Carlo integration can be used to approximate posterior (or marginal posterior) distributions required for a Bayesian analysis. Consider the integral ∫ which we approximate by; ̂

∑ ( ⁄ )

Where xi are draws from the density p(x). In simulation study we are interested in determining a parameter , connected with some stochastic model. To estimate , the model is simulated to obtain,the output X such that . After n runs the simulation study is then terminated and the estimate of is then given ̅ ∑ Because this is an unbiased estimate of θ, it follows that its Mean Square Error is equal to its Variance.That is ̅ ( ) Hence if we can obtain a different unbiased estimator of θ having a smaller variance than does , we would obtain an improved estimator.

Variance Reduction Techniques Variance reduction is a procedure used to increase the precision of the estimates that can be obtained for a given number of iterations. Some variance reduction techniques in simulation are: 1. Use of antithetic variables. 2. Use of control variates. 3. Variance reduction by conditioning. 34 | P a g e

4. Stratified sampling. 5. Importance sampling.

Antithetic variables Suppose we are interested in using simulation to estimate and suppose we have generated X1 and X2 identically distributed random variables having mean θ. Then (

)

(

)

Hence it would be advantageous (in the sense that the variance would be reduced) if rather than being independent were negatively correlated. Thus, clearly, if variables are negatively correlated, we can gain a variance Reduction.  One possible approach to arrange for negatively correlated estimates is as follows: 1. Suppose is based on random numbers U1,U2, . . . ,Um, Say = g (U1, U2, . . . ,Um). 2. Then can be another estimate based on random numbers 1 – U1, 1 – U2, . . . , 1 – Um, so that = g (1 – U1, 1 – U2, . . ., 1 – Um).  Both U and 1 − U are uniformly distributed and are clearly negatively correlated.  It can be shown that will be negatively correlated, and hence can obtain a variance reduction, so long as g is monotone (either increasing or decreasing).

The use of control variates Steps  Consider the simple problem of estimating where X is drawn from a simulation.

35 | P a g e

,

 Suppose there is another random variable Y with Expectation E (Y) = µy . Then for any constant c , the quantity, X + c(Y- µy) ,is also an unbiased estimator of θ.  Consider its variance:  ( ( ))

 It can be shown that this variance is minimized when c is equal to  The variance of the new estimator is (

)

 Y is called the controle variate for the simulation estimator X.  We can re-express this by dividing both side by Var(X): (

(

))

Where √ The correlation between X and Y.  The variance is therefore reduced by  The controlled estimator is therefore ̅ ̅ ,  And its variance is given by ̅ (̅ )

percent.

Variance reduction by conditioning Since the law of iterated expectations ⁄ .  This implies that the estimator estimator. 36 | P a g e



is also an unbiased

 Now, recall the conditional variance formula ⁄ ⁄  Clearly, both terms on the right are non- negative , so that we have ⁄  This implies the estimator, by conditioning, produces a more superior variance.

Importance Sampling Consider the case of Estimating ∫ Where f(x) denotes the density of X. Suppose g(x) is another density function such that f(x) =0 whenever g(x) = 0, essentially both densities have the same domain. We can rewrite θ as [ ], where is the expectation ∫ evaluated under the density of g. Thus, to estimate θ, we can then simulate Xi from the density g(x) and use ̅



 The method of importance sampling attempts to reduce the variance by changing the probability distribution from which the values are generated.  If it is possible to select a density g(x) so that

has smaller variance than the original(raw) estimator, then the new estimator will be considered more efficient. 37 | P a g e

It would appear that a criterion is to select this density so that the ratio

is large for small values of h(x) and vice versa (small for

large values of h(x)). This intuitively seems to help reduce the variance. The weight or ratio Nikodym derivative at x.

38 | P a g e

is called the likelihood ratio or Radon –

8.

MARKOV CHAIN MONTE CARLO

Markov chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from probability distributions based on constructing Markov Chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the desired distribution. The quality of the sample improves as a function of the number of steps. A major limitation towards more widespread implementation of Bayesian approaches is that obtaining the posterior distribution often requires the integration of high-dimensional functions. This can be computationally very difficult, but several approaches short of direct integration have been proposed (reviewed by Smith 1991, Evans and Swartz 1995, Tanner 1996). Markov Chain Monte Carlo (MCMC) methods, which attempt to simulate direct draws from some complex distribution of interest. MCMC approaches are so-named because one uses the previous sample values to randomly generate the next sample value, generating a Markov chain (as the transition probabilities between sample n values are only a function of the most recent sample value). The realization in the early 1990’s (Gelfand and Smith 1990) that one particular MCMC method, the Gibbs sampler, is very widely applicable to a broad class of Bayesian problems has sparked a major increase in the application of Bayesian analysis, and this interest is likely to continue expanding for some time to come. MCMC methods have their roots in the Metropolis algorithm (Metropolis and Ulam 1949, Metropolis et al. 1953), an attempt by physicists to compute complex integrals by expressing them as expectations for some distribution and then estimate this expectation by drawing samples from that distribution. The Gibbs sampler (Geman and Geman 1984) has its origins in image processing.

39 | P a g e

Markov Chain: A stochastic process in which future states are independent of past states given the present state. Consider a draw of θ(t) to be a state at iteration t. The next draw θ(t+1) is dependent only on the current draw θ(t), and not on any past draws. This satisfies the Markov property: ⁄ ⁄ ( ) ( )

Suppose our objective is to find a sample from a distribution , where θ is belonging to parameter space Θ. So

θ: θ1, θ2, θ3,…,θk , k ≥ 2 Which is desired is difficult, However suppose that we can construct a Markov chain with state space Θ, which is straightforward to simulate from an equilibrium or stationary distribution, π (θ). So if we run the chain for a long time, simulated values of the chain can be used as a basis for summarising features of π (θ) of interest. Under suitable regularity conditions, asymptotic result exists which clarify the sense in which the sample output from the chain can be used to mimic a random sample from π (θ) OR to find the estimated value of some function with respect to the equilibrium distribution π (θ). If θ1, θ2, θ3,…, θt,.. a realisation from the chain then the results are 1. [ → That means

] converges in distribution to θ as t tends to ∞.

2. ∑ ( ) → This is argodic average. (Because it is a kind of running average since t is not fix.) 40 | P a g e

So Really, What is MCMC? MCMC is a class of methods in which we can simulate draws that are slightly dependent and are approximately from a (posterior) distribution. We then take those draws and calculate quantities of interest for the (posterior) distribution. In Bayesian statistics, there are generally two MCMC algorithms that we use: the Gibbs Sampler and the MetropolisHastings algorithm.

GIBBS SAMPLING Gibbs sampling is named after the physicist J. W. Gibbs, in reference to an analogy between the sampling algorithm and statistical physics. The algorithm was described by brothers Stuart and Donald Geman in 1984, some eight decades after the death of Gibbs. In its basic version, Gibbs sampling is a special case of the Metropolis–Hastings algorithm. Gibbs sampling is applicable when the joint distribution is not known explicitly or is difficult to sample from directly, but the conditional distribution of each variable is known and is easy (or at least, easier) to sample from. The Gibbs sampling algorithm generates an instance from the distribution of each variable in turn, conditional on the current values of the other variables. It can be shown that the sequence of samples constitutes a Markov chain, and the stationary distribution of that Markov chain is just the sought-after joint distribution. Gibbs sampling is particularly well-adapted to sampling the posterior distribution of a Bayesian network, since Bayesian networks are typically specified as a collection of conditional distributions. 41 | P a g e

Suppose we have a joint distribution p(θ1, . . . ,θk) that we want to sample from (for example, a posterior distribution). We can use the Gibbs sampler to sample from the joint distribution if we knew the full conditional distributions for each parameter. For each parameter, the full conditional distribution is the distribution of the parameter conditional on the known information and all the ⁄ other parameters: . Steps to Calculating Full Conditional Distributions Suppose we have a posterior p(θ|y). To calculate the full conditionals for each θ, we do the following: 1. Write out the full posterior ignoring constants of proportionality. 2. Pick a block of parameters (for example, θ1 and drop everything that doesn’t depend on θ1.) 3. Use our knowledge of distributions to figure out what the normalizing constant is (and thus what the full conditional distribution p (θ 1| θ-1, y) is). 4. Repeat steps 2 and 3 for all parameter blocks. Gibbs Sampler Steps Let’s suppose that we are interested in sampling from the posterior p(θ|y), where θ is a vector of three parameters, θ1, θ2, θ3. The steps to a Gibbs Sampler (and the analogous steps in the MCMC process) are 1. Pick a vector of starting values θ (0). (Defining a starting distribution p(0) and drawing θ (0) from it.) 2. Start with any θ (order does not matter, but we’ll start with θ1 for convenience). Draw a value from the full conditional ⁄ . 3. Draw a value . (Again order does not matter) from the full ⁄ conditional . Note that we must use the updated value of . 42 | P a g e

⁄ 4. Draw a value from the full conditional using both updated values. 5. Draw θ(2) using θ(1) and continually using the most updated values. 6. Repeat until we get t draws, with each draw being a vector θ(i). Our result is a Markov chain with a bunch of draws of θ that are approximately from our posterior. We can do Monte Carlo Integration on those draws to get quantities of interest.

Metropolis – Hastings Algorithm Suppose we have a posterior π(θ|y) that we want to sample from, But  The posterior doesn’t look like any distribution we know (no conjugacy).  The posterior consists of more than 2 parameters.  Some (or all) of the full conditionals do not look like any distributions we know (no Gibbs sampling for those whose full conditionals we don’t know). If all else fails, we can use the Metropolis-Hastings algorithm, which will always work.

The Metropolis Algorithm Given a posterior distribution π(θ|y), we can construct a markov chain having π(θ|y) as equilibrium distribution. Let denote a (fairly arbitrary) symmetric Markov kernel, such that if is the current reliazad value of the the chain, generated from is proposed as the next the realized value. Now accept this with probability {

(

| )

}

Otherwise take θ as the next realized value. This is the Metropolis version of algorithm. And if q is not symmetric then {

43 | P a g e

(

)

(

)

}

Suppose the central value of θ generates the next value as our proposal for . However we introduce a further randomisation at this stage as follow with some probability . We accept the value of and chain is allowed to move otherwise we reject the value generated from and the chain is allowed to remain on . This setup defines as a markov chain with transition probability { ∑ Where

{

}

But here the problem is that to define Markov Kernel q, however choice of q is often Multivariate normal, multivariate-t, split-t and rectangular…

44 | P a g e

9.

BOOTSTRAPPING

Problem of statistical inference is often involved estimating some aspect of a probability distribution F on the basis of random sample drawn from F. The empirical distribution function which we will call ̂ , is a simple estimate of the entire distribution F. an obvious way to estimate some interesting aspect of F, like its mean OR median OR correlation, is to use the corresponding aspect of ̂ . This is the “Plugin principle”. The Bootstrap method is the direct application of the Plug-in principle. Having observed a random sample of size n from a probability distribution F, F → (x1,x2,…,xn) The empirical distribution function ̂ is defined to be a discrete distribution that puts probability on each value xi , i=1,2,…,n. In other words, ̂ assigns to a set A in the sample space of x its empirical probability { } ̂{ } The proportion of the observed sample x = (x1, x2,…, xn) occurring in A.

The Plug-in Principle The plug in principle is a simple method of estimating parameters from sample. The plug in estimate of a parameter θ =t(F) is defined to be ̂ ( ̂) In other words we, estimate the function θ =t(F) of the probability distribution F by the same function of the Empirical distribution ̂ , ̂ ( ̂) . The Nonparametric Bootstrap method is an application of plug in principle. By non-parametric we mean that only X’s is known 45 | P a g e

(observed) and no prior knowledge of population density F is available. Originally the idea was introduced by Efron in 1979 to compute the standard error of arbitrary estimators and the basic idea remains the same still today. We want to estimate the accuracy of an arbitrary estimate ̂ knowing only one sample drawn from an unknown population F. A way called Bootstrap may be used for doing this. The Bootstrap is a computer intensive technique for resampling; it is in fact a data base simulation method for drawing the desired statistical inference. The basic idea of bootstrap is to use the sample data to compute a statistic and then to estimate its sampling distribution without any model assumption no theoretical calculations of different measures of accuracies are needed so we don’t bother how mathematically complex the estimator ̂ can be? A bootstrap sample X*= (x1*,x2*,…,xn*) is obtained by randomly sampling n times, with replacement ,from the original data points x1,x2,…,xn.

The Bootstrap Algorithm for estimating Standard Errors 1. Select B independent bootstrap samples each consisting of n data values drawn with replacement from x (original sample). 2. Evaluate the bootstrap replication corresponding to each bootstrap sample, ̂ 3. Estimate the standard error ( ̂) by the sample standard deviation of the B replications ̂

46 | P a g e

{





̂

]

}

̂



̂

The bootstrap algorithm for estimating the standard error of a statistic ̂ ; each bootstrap sample is an independent random sample of size n from ̂ . B, the number of bootstrap replications for estimating a standard error. As , ̂ approaches the plug-in estimate of ( ̂).

47 | P a g e

A schematic diagram:

48 | P a g e

A schematic diagram (given below) of the bootstrap as it applies to one sample problems. In the real world, the unknown probability distribution F gives the data X = (x1, x2, …, xn) by random sampling; from X we calculate the statistic of interest ̂ . In the Bootstrap * world, ̂ generates X by random sampling, giving ̂ . There is only one observed value of ̂, but we can generate as many bootstrap replications ̂ as affordable. The crucial step in the bootstrap process is the process by which we construct from X an estimate ̂ of the unknown population F.

49 | P a g e

10. APPENDIX GENERATING RANDOM NUMBERS IN R Program in R using LCG method rm(list=ls(all=TRUE))

#to remove all objects

z=numeric() u=numeric() for (i in 2:3000) {a=31 c=453; m=74748383; z[1]=25 z[i]=(a*z[i-1]+c)%%m u[i]=z[i]/m

; z # u is uniform random number

u }; plot(hist(u)) #output i.e. histogram of u by LCG method

200 150 0

50

100

Frequency

250

300

Histogram of u

0.0

0.2

0.4

0.6 u

50 | P a g e

0.8

1.0

GENERATION TRANSFORM

FROM

EXPONENTIAL

USING

INVERSE

exp1