STAT 517: Random Numbers and Simulation - University of South ...

2 downloads 26 Views 78KB Size Report
STAT 517: Random Numbers and Simulation. Hitchcock. Random ... Approximating the Sampling Distribution of a Statistic (Ex: Constructing CIs). University of ...
STAT 517: Random Numbers and Simulation

Hitchcock

Random Numbers and Simulation Generating random numbers: Typically impossible/unfeasible to obtain truly random numbers Programs have been developed to generate pseudo-random numbers: Values generated from a complicated deterministic algorithm, which can pass any statistical test for randomness They appear to be independent and identically distributed. Random number generators for common distributions are built into R. For less common distributions, more complicated methods have been developed (e.g., Accept-Reject Sampling, Metropolis-Hastings Algorithm) STAT 740 covers these.

University of South Carolina

Page 1

STAT 517: Random Numbers and Simulation

Hitchcock

(Monte Carlo) Simulation Some Common Uses of Simulation 1. Optimization (Example: Finding MLEs) 2. Calculating Definite Integrals (Ex: Finding Posterior Distributions) 3. Approximating the Sampling Distribution of a Statistic (Ex: Constructing CIs)

University of South Carolina

Page 2

STAT 517: Random Numbers and Simulation

can be



that maximizes (or minimizes) a complicated function





1. Finding the

Hitchcock





difficult analytically









 

to maximize





 



Find

is multidimensional



Situation even tougher if





University of South Carolina

Page 3

STAT 517: Random Numbers and Simulation

Hitchcock

OTHER OPTIONS: Simple Stochastic Search: If the maximum is to take place over a bounded region,  

, then:





say



, 





 

and pick the one that gives the largest



Generate many uniform random observations in that region, plug each into

Advantage: Easy to program. Disadvantage: Very slow, especially for multidimensional problems. Requires much computation. "

" 





  



.









over



!

   

  

 

 

 

University of South Carolina

 







Example: Maximize

Page 4

STAT 517: Random Numbers and Simulation

Hitchcock

More advanced: Gradient Methods, which use derivative information to determine which area of the region to search next. Rule: “go up the slope” Disadvantage: Can get stuck on local maxima









$

.

$

&

 $

%

%





 

 '





$

$

$

, “move” to







If

, “move” to





 

If

 

 #

values:





Simulated Annealing: Tries a sequence of

with a certain probability which depends on

%

% 





 

 

  $

$

.

% 







University of South Carolina

Page 5

STAT 517: Random Numbers and Simulation

Hitchcock

R functions that perform optimization

(

optim() optimize()

one-dimensional optimization

optim(par = c(0,0), fn=my.fcn, control=list(fnscale=-1), maxit=100000) Example:

# Nelder-Mead optimization Other choices:

method="CG", method="BFGS", method="SANN"

University of South Carolina

Page 6

STAT 517: Random Numbers and Simulation

Hitchcock

Calculating Definite Integrals In statistics, we often have to calculate difficult definite integrals (Posterior distributions, expected values) *

) 

,   



 +



(here,

could be multidimensional)

Example 1: Find:   

, 

#  



Example 2: Find: 

 

 -

  

,







,





 #

#

University of South Carolina

Page 7

STAT 517: Random Numbers and Simulation

Hitchcock

Hit-or-Miss Method Example 1: 



   





 

 

.  

) and 





3

(here,

76

$



4 5

3

’s from

0

pairs,

 









3

) $ 

0 

$

is less than the



) that the

2

8

Count the number of times (call this 4

6 

.

; : 

)9

Then

$

2 

$

0  1

.



3

$

(here,

)





2

’s from

random uniform

/

Generate

across entire region of interest. (Here,





such that

.&

.

Determine






 0

over whatever the support of






F

random values

,





 E



  

 /

Take

0  1

?

Note

(i.e., with density ) >

Then <  $ 

0  1



9 



0  1

?



/ 

=$

University of South Carolina

Page 10

STAT 517: Random Numbers and Simulation

Hitchcock

Examples 

.



 





2

6

4

and

01

?

distribution, find

 

is a beta random variable with parameters

 

Example 4: If

G

is a random variable with a

0

Example 3: If

H

, find

J

2 L K

I

?

. 



Some more advanced methods of integration using simulation (Importance Sampling) Note: R function

integrate() does numerical integration for functions of a

single variable (not using simulation techniques)

adapt() in the “adapt” package does multivariate numerical integration

University of South Carolina

Page 11

STAT 517: Random Numbers and Simulation

Hitchcock

Approximating the Sampling Distribution of a Statistic To perform inference based on sample statistics, we typically need to know the sampling distribution of the statistics. .



Example:

G 7Q

O

O,

M < N

0 





 

0



P S 0 P

 R /

T @ U

 

 

Q

If

distribution.

/

V

has a

known, S 0 P

 W /

Q @ U





distribution.



G

has a

Then we can use these sampling distributions for inference (CIs, hypothesis tests).

University of South Carolina

Page 12

STAT 517: Random Numbers and Simulation

Hitchcock

What if the data’s distribution is not normal? 1. Large sample: Central Limit Theorem 2. Small sample: Nonparametric procedures based on permutation distribution

University of South Carolina

Page 13

STAT 517: Random Numbers and Simulation

Hitchcock

If population distribution is known, can approximate sampling distribution with simulation. times) generate random sample of size

/

8

Repeatedly (

from population distribu-

tion. X

Calculate statistic (say,

) each time.

University of South Carolina

X

The empirical distribution of

-values approximates its true distribution.

Page 14

STAT 517: Random Numbers and Simulation

7[Z





  /

?

M Y

0 





 

0

Example 1:

Hitchcock

S 0

What is the sampling distribution of

?

What is the sampling distribution of sample midrange?

University of South Carolina

Page 15

STAT 517: Random Numbers and Simulation

Hitchcock

What if we don’t know the exact population distribution (more likely)? /

Can use bootstrap methods: Resample (randomly select

values from the original

sample, with replacement). These “bootstrap samples” together mimic the population.

8

These

bootstrap samples, calculate the statistic of interest.

8

For each of the, say,

values will approximate the sampling distribution. \

from an unknown population type.



 

_

^ 





-

  "



University of South Carolina

] \

Example 2: Observe

Page 16

STAT 517: Random Numbers and Simulation

Hitchcock

Bootstrap sampling built into R in the “boot” package. Try library(boot);

help(boot) for details.

If you know the form of the population distribution, but not the parameters, a parametric bootstrap can be used. Simple bootstrap CIs have some drawbacks More complicated “bias-corrected” bootstrap methods have been developed

University of South Carolina

Page 17