STAT 517: Random Numbers and Simulation. Hitchcock. Random ...
Approximating the Sampling Distribution of a Statistic (Ex: Constructing CIs).
University of ...
STAT 517: Random Numbers and Simulation
Hitchcock
Random Numbers and Simulation Generating random numbers: Typically impossible/unfeasible to obtain truly random numbers Programs have been developed to generate pseudo-random numbers: Values generated from a complicated deterministic algorithm, which can pass any statistical test for randomness They appear to be independent and identically distributed. Random number generators for common distributions are built into R. For less common distributions, more complicated methods have been developed (e.g., Accept-Reject Sampling, Metropolis-Hastings Algorithm) STAT 740 covers these.
University of South Carolina
Page 1
STAT 517: Random Numbers and Simulation
Hitchcock
(Monte Carlo) Simulation Some Common Uses of Simulation 1. Optimization (Example: Finding MLEs) 2. Calculating Definite Integrals (Ex: Finding Posterior Distributions) 3. Approximating the Sampling Distribution of a Statistic (Ex: Constructing CIs)
University of South Carolina
Page 2
STAT 517: Random Numbers and Simulation
can be
that maximizes (or minimizes) a complicated function
1. Finding the
Hitchcock
difficult analytically
to maximize
Find
is multidimensional
Situation even tougher if
University of South Carolina
Page 3
STAT 517: Random Numbers and Simulation
Hitchcock
OTHER OPTIONS: Simple Stochastic Search: If the maximum is to take place over a bounded region,
, then:
say
,
and pick the one that gives the largest
Generate many uniform random observations in that region, plug each into
Advantage: Easy to program. Disadvantage: Very slow, especially for multidimensional problems. Requires much computation. "
"
.
over
!
University of South Carolina
Example: Maximize
Page 4
STAT 517: Random Numbers and Simulation
Hitchcock
More advanced: Gradient Methods, which use derivative information to determine which area of the region to search next. Rule: “go up the slope” Disadvantage: Can get stuck on local maxima
$
.
$
&
$
%
%
'
$
$
$
, “move” to
If
, “move” to
If
#
values:
Simulated Annealing: Tries a sequence of
with a certain probability which depends on
%
%
$
$
.
%
University of South Carolina
Page 5
STAT 517: Random Numbers and Simulation
Hitchcock
R functions that perform optimization
(
optim() optimize()
one-dimensional optimization
optim(par = c(0,0), fn=my.fcn, control=list(fnscale=-1), maxit=100000) Example:
# Nelder-Mead optimization Other choices:
method="CG", method="BFGS", method="SANN"
University of South Carolina
Page 6
STAT 517: Random Numbers and Simulation
Hitchcock
Calculating Definite Integrals In statistics, we often have to calculate difficult definite integrals (Posterior distributions, expected values) *
)
,
+
(here,
could be multidimensional)
Example 1: Find:
,
#
Example 2: Find:
-
,
,
#
#
University of South Carolina
Page 7
STAT 517: Random Numbers and Simulation
Hitchcock
Hit-or-Miss Method Example 1:
.
) and
3
(here,
76
$
4 5
3
’s from
0
pairs,
3
) $
0
$
is less than the
) that the
2
8
Count the number of times (call this 4
6
.
; :
)9
Then
$
2
$
0 1
.
3
$
(here,
)
2
’s from
random uniform
/
Generate
across entire region of interest. (Here,
such that
.&
.
Determine
0
over whatever the support of
F
random values
,
E
/
Take
0 1
?
Note
(i.e., with density ) >
Then < $
0 1
9
0 1
?
/
=$
University of South Carolina
Page 10
STAT 517: Random Numbers and Simulation
Hitchcock
Examples
.
2
6
4
and
01
?
distribution, find
is a beta random variable with parameters
Example 4: If
G
is a random variable with a
0
Example 3: If
H
, find
J
2 L K
I
?
.
Some more advanced methods of integration using simulation (Importance Sampling) Note: R function
integrate() does numerical integration for functions of a
single variable (not using simulation techniques)
adapt() in the “adapt” package does multivariate numerical integration
University of South Carolina
Page 11
STAT 517: Random Numbers and Simulation
Hitchcock
Approximating the Sampling Distribution of a Statistic To perform inference based on sample statistics, we typically need to know the sampling distribution of the statistics. .
Example:
G 7Q
O
O,
M < N
0
0
P S 0 P
R /
T @ U
Q
If
distribution.
/
V
has a
known, S 0 P
W /
Q @ U
distribution.
G
has a
Then we can use these sampling distributions for inference (CIs, hypothesis tests).
University of South Carolina
Page 12
STAT 517: Random Numbers and Simulation
Hitchcock
What if the data’s distribution is not normal? 1. Large sample: Central Limit Theorem 2. Small sample: Nonparametric procedures based on permutation distribution
University of South Carolina
Page 13
STAT 517: Random Numbers and Simulation
Hitchcock
If population distribution is known, can approximate sampling distribution with simulation. times) generate random sample of size
/
8
Repeatedly (
from population distribu-
tion. X
Calculate statistic (say,
) each time.
University of South Carolina
X
The empirical distribution of
-values approximates its true distribution.
Page 14
STAT 517: Random Numbers and Simulation
7[Z
/
?
M Y
0
0
Example 1:
Hitchcock
S 0
What is the sampling distribution of
?
What is the sampling distribution of sample midrange?
University of South Carolina
Page 15
STAT 517: Random Numbers and Simulation
Hitchcock
What if we don’t know the exact population distribution (more likely)? /
Can use bootstrap methods: Resample (randomly select
values from the original
sample, with replacement). These “bootstrap samples” together mimic the population.
8
These
bootstrap samples, calculate the statistic of interest.
8
For each of the, say,
values will approximate the sampling distribution. \
from an unknown population type.
_
^
-
"
University of South Carolina
] \
Example 2: Observe
Page 16
STAT 517: Random Numbers and Simulation
Hitchcock
Bootstrap sampling built into R in the “boot” package. Try library(boot);
help(boot) for details.
If you know the form of the population distribution, but not the parameters, a parametric bootstrap can be used. Simple bootstrap CIs have some drawbacks More complicated “bias-corrected” bootstrap methods have been developed
University of South Carolina
Page 17