Jun 18, 2008 - Department of Energy Computational Science Graduate Fellow ... G.A.F. Seber and C.J. Wild, Nonlinear Regression, John Wiley & Sons,. New York .... We wish to solve for Ëθ. Ëθ = arg min θâR. M. +. J(θ) = arg min θâR. M.
Quantifying Uncertainty in the Estimation of Probability Distributions Jimena Lamanda Davis Department of Energy Computational Science Graduate Fellow In Collaboration with H.T. Banks
June 18, 2008
Jimena L. Davis (CRSC/NCSU)
DOE CSGF Annual Fellows’ Conference
June 18, 2008
1 / 20
References H.T. Banks, V.A. Bokil, S. Hu, A.K. Dhar, R.A. Bullis, C.L. Browdy, and F.C.T. Allnutt, “Modeling shrimp biomass and viral infection for production of biological countermeasures,” CRSC-TR05-45; NC State University, December, 2005; Mathematical Biosciences and Engineering, 3, pp. 635-660, 2006. H.T. Banks, L.W. Botsford, F. Kappel, and C. Wang, “Modeling and estimation in size structured population models,” LCDS-CC Rpt. 87-13, Brown University; Proceedings 2nd Course on Mathematical Ecology, (Trieste, December 8-12, 1986) World Press, Singapore, pp. 521-541, 1988. H.T. Banks and J.L. Davis, “A comparison of approximation methods for the estimation of probability distributions on parameters,” CRSC-TR05-38; NC State University, October, 2005; Applied Numerical Mathematics, 57, pp. 753-777, 2007.
Jimena L. Davis (CRSC/NCSU)
DOE CSGF Annual Fellows’ Conference
June 18, 2008
2 / 20
References
H.T. Banks and J.L. Davis, “Quantifying uncertainty in the estimation of probability distributions with confidence bands,” CRSC-TR07-21; NC State University, December, 2007; Mathematical Biosciences and Engineering (accepted). H.T. Banks, J.L. Davis, S.L. Ernstberger, S. Hu, A.K. Dhar, and C.L. Browdy, “Comparison of probabilistic and stochastic approaches in modeling growth uncertainty and variability,” CRSC-TR08-03; NC State University, February, 2008; Journal of Biological Dynamics (accepted). G.A.F. Seber and C.J. Wild, Nonlinear Regression, John Wiley & Sons, New York, 1989. J.W. Sinko and W. Streifer, “A new model for age-size structure for a population,” Ecology, 48, pp. 910-918, 1967.
Jimena L. Davis (CRSC/NCSU)
DOE CSGF Annual Fellows’ Conference
June 18, 2008
3 / 20
Motivation
Development of an inverse problem computational methodology for the estimation of functional parameters in the presence of model and data uncertainty Applications involve the estimation of growth rate distributions in size-structured marine populations (Type II problem - aggregate or population level longitudinal data) Extension of the asymptotic standard error theory for finite-dimensional ordinary least squares (OLS) estimators to “functional” confidence bands that will aid in quantifying the uncertainty in estimated probability distributions
Jimena L. Davis (CRSC/NCSU)
DOE CSGF Annual Fellows’ Conference
June 18, 2008
4 / 20
Application: Size-Structured Shrimp Population
Use of shrimp as a scaffold organism to produce large amounts of a vaccine rapidly in response to a toxic attack on populations [Banks et. al. 2006] Joint project with ABN (Advanced Bionutrition Corporation) involving the development of a hybrid model of the shrimp biomass/countermeasure production system Being able to accurately model the dynamics of the size-structured shrimp population is important since the output of the biomass model will serve as input to the vaccine production model
Jimena L. Davis (CRSC/NCSU)
DOE CSGF Annual Fellows’ Conference
June 18, 2008
5 / 20
Sinko-Streifer (SS) Model for Size-Structured Populations (1967) Widely used to describe various age and size-structured populations (cells, plants, and marine species) vt (t, x; g) + (g(t, x)v(t, x; g))x = −µ(t, x)v(t, x; g),
x < x < x¯ (1)
Initial Condition v(0, x) = v0 (x; g) Boundary Condition g(t, x )v(t, x ; g) =
Z
x
x¯
K (t, ξ)v(t, ξ; g)dξ
Deterministic Growth Rate Model dx = g(t, x) dt Jimena L. Davis (CRSC/NCSU)
DOE CSGF Annual Fellows’ Conference
June 18, 2008
6 / 20
Example of Aggregate Type Longitudinal Shrimp Data Previous size-structured population data from a group in Texas indicates variability in size that could be a result of variability in growth rates and might suggest the use of GRD model (2) Raceway 1: April 28
45
Raceway 1: May 7
12
40 10 35
8
Frequency
Frequency
30
25
20
6
15
4
10 2 5
0
0
0.1
0.2
0.3
0.4
0.5
x (size)
0.6
0.7
0.8
0.9
0
1
Raceway 1: May 15
8
0
0.1
0.2
0.3
0.4
0.5
x (size)
0.6
0.7
0.8
0.9
1
0.7
0.8
0.9
1
Raceway 1: May 23
6
7 5 6 4
Frequency
Frequency
5
4
3
3 2 2 1 1
0
0
0.1
0.2
Jimena L. Davis (CRSC/NCSU)
0.3
0.4
0.5
x (size)
0.6
0.7
0.8
0.9
1
0
0
0.1
0.2
0.3
0.4
DOE CSGF Annual Fellows’ Conference
0.5
x (size)
0.6
June 18, 2008
7 / 20
Growth Rate Distribution (GRD) Model (1988) Deterministic growth model of (1) is not biologically reasonable when modeling populations that exhibit a great deal of variability in aggregate type longitudinal data as time progresses GRD model, a modification of the SS model, was developed by Banks et. al. [Banks et. al. 1988] to account for the variability observed in populations, such as size-structured mosquitofish populations that exhibit both dispersion and bifurcation in time Assumption of GRD: Individual growth rates vary across the population u(t, x; P) =
Jimena L. Davis (CRSC/NCSU)
Z
v(t, x; g)dP(g)
(2)
G
DOE CSGF Annual Fellows’ Conference
June 18, 2008
8 / 20
Early Growth Dynamics of Shrimp We assume that mortality rate and reproduction rate in (1) are both zero We also assume that the size-dependent growth rate function of the shrimp has the form g(x; b, c) = b(x + c), which was shown to provide reasonable fits to average size data for 50 randomly sampled shrimp in [Banks et. al. 2008] Intrinsic growth rate b is a random variable taking values in a compact set B Analysis of previous data also suggested that the assumption of a normal distribution on the intrinsic growth rates leads to a lognormal distribution in size We choose a truncated normal distribution with mean µ b and standard deviation σb Jimena L. Davis (CRSC/NCSU)
DOE CSGF Annual Fellows’ Conference
June 18, 2008
9 / 20
Standard Parametric Approach - PAR(M, N) In the parametric approach, we assume that we know the distribution of the growth rates Assuming P is (absolutely) continuous ( dP db = p), the population density from the GRD model (2) is given by Z u(t, x; θ) = v(t, x; g(x; b))p(b; θ)db, B
where θ ∈ RM + represents the parameters (µ b , σb ) that are associated with the a priori probability density and distribution M represents the number of parameters in θ and N represents the number of quadrature nodes used to approximate the integral above Jimena L. Davis (CRSC/NCSU)
DOE CSGF Annual Fellows’ Conference
June 18, 2008
10 / 20
Parameter Estimation with PAR(M,N)
Ordinary Least Squares Formulation (assuming constant variance noise model) We wish to solve for θˆ θˆ = arg min J(θ) = arg min θ∈RM +
θ∈RM +
X
ˆij |2 |u(ti , xj ; θ) − u
i,j
We use MATLAB fmincon to determine the optimal values of θ = (µb , σb ) used to generate the estimated probability density and distribution
Jimena L. Davis (CRSC/NCSU)
DOE CSGF Annual Fellows’ Conference
June 18, 2008
11 / 20
Confidence Intervals... Confidence Bands Since we reduced infinite dimensional estimation problem to finite dimensional problem for θ, we are able to compute standard errors based on the established asymptotic standard error theory for OLS estimators [Seber and Wild 1989] Standard errors are used to compute confidence intervals to quantify the uncertainty in the estimated finite dimensional parameter θ
Known and Estimated Probability Density with Confidence Region 0.9 Known Estimated 0.8 p− 0.7 p +
0.6 0.5 0.4 0.3 0.2
0.9 0.8 0.7 0.6 0.5 0.4 0.3
Known Estimated P
0.2
0.1 0 3
Known and Estimated Probability Distribution with Confidence Bands 1
Probability Distribution
Probabililty Density p(b)
How does one use the confidence intervals computed in the finite dimensional setting to construct confidence bands in the infinite dimensional setting?
−
0.1 3.5
Jimena L. Davis (CRSC/NCSU)
4 4.5 5 Intrinsic Growth Rates b
5.5
6
0 3
P+ 3.5
4 4.5 5 Intrinsic Growth Rate b
DOE CSGF Annual Fellows’ Conference
5.5
6
June 18, 2008
12 / 20
Monte Carlo Sampling Study to Aid in the Design of Experiments Goal: Determine the sampling size Ns and sampling frequency Nt needed to obtain reliable estimates of the probabilistic growth rate parameters in the GRD model (2) - experiments to be carried out at ABN and SCDNR (South Carolina Department of Natural Resources) [Banks et. al. 2008] Population data (total number of shrimp in each size class) used in inverse problem calculations NGRD (t, x; θ) ≈ u(t, x; θ)∆x, where ∆x is the length of the size class interval We simulated population data, where Ns varied from 25, 50, 75 to 100 and Nt varied from twice a week, once a week to once every two weeks
Jimena L. Davis (CRSC/NCSU)
DOE CSGF Annual Fellows’ Conference
June 18, 2008
13 / 20
Monte Carlo Sampling Study Results 0.2
RE(µb) for centered bins, with σ −scale = 0.9
RE(σ ) for centered bins, with σ −scale = 0.9 b
N =3
N =3
t
t
Nt=6
0.18
Nt=6
0.35
N =12
NT=12
T
0.16
0.3
0.14 0.25
RE(σb)
RE(µb)
0.12 0.1 0.08
0.2 0.15
0.06
0.1
0.04 0.05
0.02 0 25
50 75 Number of samples at each time point
0 25
100
50 75 Number of samples at each time point
95% Confidence Bounds for µ , with σ −scale = 0.9 0.08
N =3 t
N =6
0.07
t
0.07
N =12 t
95% Confidence Bounds
95% Confidence Bounds
0.06
0.05
0.04
0.03
N =6 t
N =12 σb
0.05 0.04 0.03 0.02 0.01
0.02
25
Nt=3
t
µb
0.06
100
95% Confidence Bounds for σb, with σ −scale = 0.9
b
50 75 Number of samples at each time point
100
0 25
50 75 Number of samples at each time point
100
Conclusions: Most desirable experiment involved using Ns = 100 once a week; however there appears to be little loss in accuracy if one uses Ns = 50 Jimena L. Davis (CRSC/NCSU)
DOE CSGF Annual Fellows’ Conference
June 18, 2008
14 / 20
Parameter Estimation Results with ABN Data
Inverse problem with data (subsequently) collected from shrimp cultured in tanks at ABN Fifty shrimp were randomly sampled and measured once a week under relatively constant tank conditions Using our methodology, we determined estimates of the growth rate distribution and quantified the uncertainty associated with these estimates with confidence bands
Jimena L. Davis (CRSC/NCSU)
DOE CSGF Annual Fellows’ Conference
June 18, 2008
15 / 20
PAR(2,128) Results with Complete December Data µ ˆ b ± 1.96SE(ˆ µb ) : 0.0010 ± 0.0535 σ ˆb ± 1.96SE(ˆ σb ) : 0.0324 ± 0.0313 J ∗ = 574.8315, σ ˆ 2 = 17.4191 ABN Data − December 4
45
ABN Data versus Estimated Population Data − December 11
30
ABN Data Estimated Pop.
40 25
35 20
Frequency
Frequency
30 25 20
15
10
15 10
5
5 0
0
0.05
0.1
0.15
0.2
0.25
Size of Shrimp
0.3
0.35
0
0.4
ABN Data versus Estimated Population Data − December 18
18 16
0
0.05
0.1
0.15
0.2
0.25
Size of Shrimp
0.3
0.35
0.4
ABN Data versus Estimated Population Data − December 24
14
ABN Data Estimated Pop.
ABN Data Estimated Pop.
12
14 10
Frequency
Frequency
12 10 8
8
6
6 4 4 2
2 0
0
0.05
0.1
0.15
0.2
0.25
0.3
Size of Shrimp
0.35
0
0.4
0
0.05
0.1
0.15
0.2
0.25
Size of Shrimp
0.3
0.35
0.4
ABN Data versus Estimated Population Data − December 31
10
ABN Data Estimated Pop.
9 8
Frequency
7 6 5 4 3 2 1 0
Jimena L. Davis (CRSC/NCSU)
0
0.05
0.1
0.15
0.2
0.25
Size of Shrimp
0.3
0.35
0.4
DOE CSGF Annual Fellows’ Conference
June 18, 2008
16 / 20
PAR(2,128) Results Excluding December 11 Data
µ ˆ b ± 1.96SE(ˆ µb ) : 0.0369 ± 0.0027 σ ˆb ± 1.96SE(ˆ σb ) : 0.0159 ± 0.0030 J ∗ = 87.4619, σ ˆ 2 = 3.1236 Estimated Probability Density with Confidence Region
35
0.9
+
30
p
−
0.8
Probability Distribution
25
Probability Density
Estimated Probability Distribution with Confidence Bands
1
Estimated p
20
15
10
0.7 0.6 0.5 0.4 0.3
Estimated M P+
0.2 5
0
M
P−
0.1 0
0.01
0.02
Jimena L. Davis (CRSC/NCSU)
0.03
0.04
0.05
0.06
0.07
Intrinsic Growth Rates, b
0.08
0.09
0.1
0
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Intrinsic Growth Rates, b
DOE CSGF Annual Fellows’ Conference
0.08
0.09
0.1
June 18, 2008
17 / 20
PAR(2,128) Results Excluding December 11 Data ABN Data − December 4
16
35
14
30
12
25 20
ABN Data Estimated Pop.
10 8
15
6
10
4 2
5 0
ABN Data versus Estimated Population Data − December 18
18
40
Frequency
Frequency
45
0
0.05
0.1
0.15
0.2
0.25
Size of Shrimp
0.3
0.35
0
0.4
ABN Data versus Estimated Population Data − December 24
12
0
ABN Data Estimated Pop.
0.05
0.1
0.15
0.2
0.25
Size of Shrimp
0.3
0.35
0.4
ABN Data versus Estimated Population Data − December 31
9
ABN Data Estimated Pop.
8
10 7 6
Frequency
Frequency
8
6
4
5 4 3 2
2 1 0
0
0.05
0.1
0.15
0.2
0.25
Size of Shrimp
Jimena L. Davis (CRSC/NCSU)
0.3
0.35
0.4
0
0
0.05
0.1
DOE CSGF Annual Fellows’ Conference
0.15
0.2
0.25
Size of Shrimp
0.3
0.35
0.4
June 18, 2008
18 / 20
Summary and Ongoing Work We have demonstrated how mathematical and statistical tools can be used to gain insight into the early growth dynamics of shrimp. We are working on improving the model predictions to the shrimp population data by considering different parametric and non-parametric approaches [Banks and Davis 2005] in the GRD model. Following the work of Seber and Wild, we are also working on fully developing the mathematical and asymptotic statistical theory (“functional” confidence bands) for OLS inverse problems where the parameter of interest is a probability distribution. We would also like to determine if the confidence bands constructed in the non-parametric approximation methods (not discussed here today) are converging to some “true” smooth confidence bands. Jimena L. Davis (CRSC/NCSU)
DOE CSGF Annual Fellows’ Conference
June 18, 2008
19 / 20
Acknowledgements Department of Energy Computational Science Graduate Fellowship and the Krell Institute Staff Collaborators: NCSU Dr. H.T. Banks Dr. Shuhua Hu Stacey Ernstberger
Advanced Bionutrition Corporation Dr. Elena Artimovich Dr. Arun K. Dhar
Marine Resources Research Institute Dr. Craig L. Browdy
Jimena L. Davis (CRSC/NCSU)
DOE CSGF Annual Fellows’ Conference
June 18, 2008
20 / 20