Quantifying Uncertainty in the Estimation of Probability ... - Krell Institute

2 downloads 83 Views 544KB Size Report
Jun 18, 2008 - Department of Energy Computational Science Graduate Fellow ... G.A.F. Seber and C.J. Wild, Nonlinear Regression, John Wiley & Sons,. New York .... We wish to solve for ˆθ. ˆθ = arg min θ∈R. M. +. J(θ) = arg min θ∈R. M.
Quantifying Uncertainty in the Estimation of Probability Distributions Jimena Lamanda Davis Department of Energy Computational Science Graduate Fellow In Collaboration with H.T. Banks

June 18, 2008

Jimena L. Davis (CRSC/NCSU)

DOE CSGF Annual Fellows’ Conference

June 18, 2008

1 / 20

References H.T. Banks, V.A. Bokil, S. Hu, A.K. Dhar, R.A. Bullis, C.L. Browdy, and F.C.T. Allnutt, “Modeling shrimp biomass and viral infection for production of biological countermeasures,” CRSC-TR05-45; NC State University, December, 2005; Mathematical Biosciences and Engineering, 3, pp. 635-660, 2006. H.T. Banks, L.W. Botsford, F. Kappel, and C. Wang, “Modeling and estimation in size structured population models,” LCDS-CC Rpt. 87-13, Brown University; Proceedings 2nd Course on Mathematical Ecology, (Trieste, December 8-12, 1986) World Press, Singapore, pp. 521-541, 1988. H.T. Banks and J.L. Davis, “A comparison of approximation methods for the estimation of probability distributions on parameters,” CRSC-TR05-38; NC State University, October, 2005; Applied Numerical Mathematics, 57, pp. 753-777, 2007.

Jimena L. Davis (CRSC/NCSU)

DOE CSGF Annual Fellows’ Conference

June 18, 2008

2 / 20

References

H.T. Banks and J.L. Davis, “Quantifying uncertainty in the estimation of probability distributions with confidence bands,” CRSC-TR07-21; NC State University, December, 2007; Mathematical Biosciences and Engineering (accepted). H.T. Banks, J.L. Davis, S.L. Ernstberger, S. Hu, A.K. Dhar, and C.L. Browdy, “Comparison of probabilistic and stochastic approaches in modeling growth uncertainty and variability,” CRSC-TR08-03; NC State University, February, 2008; Journal of Biological Dynamics (accepted). G.A.F. Seber and C.J. Wild, Nonlinear Regression, John Wiley & Sons, New York, 1989. J.W. Sinko and W. Streifer, “A new model for age-size structure for a population,” Ecology, 48, pp. 910-918, 1967.

Jimena L. Davis (CRSC/NCSU)

DOE CSGF Annual Fellows’ Conference

June 18, 2008

3 / 20

Motivation

Development of an inverse problem computational methodology for the estimation of functional parameters in the presence of model and data uncertainty Applications involve the estimation of growth rate distributions in size-structured marine populations (Type II problem - aggregate or population level longitudinal data) Extension of the asymptotic standard error theory for finite-dimensional ordinary least squares (OLS) estimators to “functional” confidence bands that will aid in quantifying the uncertainty in estimated probability distributions

Jimena L. Davis (CRSC/NCSU)

DOE CSGF Annual Fellows’ Conference

June 18, 2008

4 / 20

Application: Size-Structured Shrimp Population

Use of shrimp as a scaffold organism to produce large amounts of a vaccine rapidly in response to a toxic attack on populations [Banks et. al. 2006] Joint project with ABN (Advanced Bionutrition Corporation) involving the development of a hybrid model of the shrimp biomass/countermeasure production system Being able to accurately model the dynamics of the size-structured shrimp population is important since the output of the biomass model will serve as input to the vaccine production model

Jimena L. Davis (CRSC/NCSU)

DOE CSGF Annual Fellows’ Conference

June 18, 2008

5 / 20

Sinko-Streifer (SS) Model for Size-Structured Populations (1967) Widely used to describe various age and size-structured populations (cells, plants, and marine species) vt (t, x; g) + (g(t, x)v(t, x; g))x = −µ(t, x)v(t, x; g),

x < x < x¯ (1)

Initial Condition v(0, x) = v0 (x; g) Boundary Condition g(t, x )v(t, x ; g) =

Z

x



K (t, ξ)v(t, ξ; g)dξ

Deterministic Growth Rate Model dx = g(t, x) dt Jimena L. Davis (CRSC/NCSU)

DOE CSGF Annual Fellows’ Conference

June 18, 2008

6 / 20

Example of Aggregate Type Longitudinal Shrimp Data Previous size-structured population data from a group in Texas indicates variability in size that could be a result of variability in growth rates and might suggest the use of GRD model (2) Raceway 1: April 28

45

Raceway 1: May 7

12

40 10 35

8

Frequency

Frequency

30

25

20

6

15

4

10 2 5

0

0

0.1

0.2

0.3

0.4

0.5

x (size)

0.6

0.7

0.8

0.9

0

1

Raceway 1: May 15

8

0

0.1

0.2

0.3

0.4

0.5

x (size)

0.6

0.7

0.8

0.9

1

0.7

0.8

0.9

1

Raceway 1: May 23

6

7 5 6 4

Frequency

Frequency

5

4

3

3 2 2 1 1

0

0

0.1

0.2

Jimena L. Davis (CRSC/NCSU)

0.3

0.4

0.5

x (size)

0.6

0.7

0.8

0.9

1

0

0

0.1

0.2

0.3

0.4

DOE CSGF Annual Fellows’ Conference

0.5

x (size)

0.6

June 18, 2008

7 / 20

Growth Rate Distribution (GRD) Model (1988) Deterministic growth model of (1) is not biologically reasonable when modeling populations that exhibit a great deal of variability in aggregate type longitudinal data as time progresses GRD model, a modification of the SS model, was developed by Banks et. al. [Banks et. al. 1988] to account for the variability observed in populations, such as size-structured mosquitofish populations that exhibit both dispersion and bifurcation in time Assumption of GRD: Individual growth rates vary across the population u(t, x; P) =

Jimena L. Davis (CRSC/NCSU)

Z

v(t, x; g)dP(g)

(2)

G

DOE CSGF Annual Fellows’ Conference

June 18, 2008

8 / 20

Early Growth Dynamics of Shrimp We assume that mortality rate and reproduction rate in (1) are both zero We also assume that the size-dependent growth rate function of the shrimp has the form g(x; b, c) = b(x + c), which was shown to provide reasonable fits to average size data for 50 randomly sampled shrimp in [Banks et. al. 2008] Intrinsic growth rate b is a random variable taking values in a compact set B Analysis of previous data also suggested that the assumption of a normal distribution on the intrinsic growth rates leads to a lognormal distribution in size We choose a truncated normal distribution with mean µ b and standard deviation σb Jimena L. Davis (CRSC/NCSU)

DOE CSGF Annual Fellows’ Conference

June 18, 2008

9 / 20

Standard Parametric Approach - PAR(M, N) In the parametric approach, we assume that we know the distribution of the growth rates Assuming P is (absolutely) continuous ( dP db = p), the population density from the GRD model (2) is given by Z u(t, x; θ) = v(t, x; g(x; b))p(b; θ)db, B

where θ ∈ RM + represents the parameters (µ b , σb ) that are associated with the a priori probability density and distribution M represents the number of parameters in θ and N represents the number of quadrature nodes used to approximate the integral above Jimena L. Davis (CRSC/NCSU)

DOE CSGF Annual Fellows’ Conference

June 18, 2008

10 / 20

Parameter Estimation with PAR(M,N)

Ordinary Least Squares Formulation (assuming constant variance noise model) We wish to solve for θˆ θˆ = arg min J(θ) = arg min θ∈RM +

θ∈RM +

X

ˆij |2 |u(ti , xj ; θ) − u

i,j

We use MATLAB fmincon to determine the optimal values of θ = (µb , σb ) used to generate the estimated probability density and distribution

Jimena L. Davis (CRSC/NCSU)

DOE CSGF Annual Fellows’ Conference

June 18, 2008

11 / 20

Confidence Intervals... Confidence Bands Since we reduced infinite dimensional estimation problem to finite dimensional problem for θ, we are able to compute standard errors based on the established asymptotic standard error theory for OLS estimators [Seber and Wild 1989] Standard errors are used to compute confidence intervals to quantify the uncertainty in the estimated finite dimensional parameter θ

Known and Estimated Probability Density with Confidence Region 0.9 Known Estimated 0.8 p− 0.7 p +

0.6 0.5 0.4 0.3 0.2

0.9 0.8 0.7 0.6 0.5 0.4 0.3

Known Estimated P

0.2

0.1 0 3

Known and Estimated Probability Distribution with Confidence Bands 1

Probability Distribution

Probabililty Density p(b)

How does one use the confidence intervals computed in the finite dimensional setting to construct confidence bands in the infinite dimensional setting?



0.1 3.5

Jimena L. Davis (CRSC/NCSU)

4 4.5 5 Intrinsic Growth Rates b

5.5

6

0 3

P+ 3.5

4 4.5 5 Intrinsic Growth Rate b

DOE CSGF Annual Fellows’ Conference

5.5

6

June 18, 2008

12 / 20

Monte Carlo Sampling Study to Aid in the Design of Experiments Goal: Determine the sampling size Ns and sampling frequency Nt needed to obtain reliable estimates of the probabilistic growth rate parameters in the GRD model (2) - experiments to be carried out at ABN and SCDNR (South Carolina Department of Natural Resources) [Banks et. al. 2008] Population data (total number of shrimp in each size class) used in inverse problem calculations NGRD (t, x; θ) ≈ u(t, x; θ)∆x, where ∆x is the length of the size class interval We simulated population data, where Ns varied from 25, 50, 75 to 100 and Nt varied from twice a week, once a week to once every two weeks

Jimena L. Davis (CRSC/NCSU)

DOE CSGF Annual Fellows’ Conference

June 18, 2008

13 / 20

Monte Carlo Sampling Study Results 0.2

RE(µb) for centered bins, with σ −scale = 0.9

RE(σ ) for centered bins, with σ −scale = 0.9 b

N =3

N =3

t

t

Nt=6

0.18

Nt=6

0.35

N =12

NT=12

T

0.16

0.3

0.14 0.25

RE(σb)

RE(µb)

0.12 0.1 0.08

0.2 0.15

0.06

0.1

0.04 0.05

0.02 0 25

50 75 Number of samples at each time point

0 25

100

50 75 Number of samples at each time point

95% Confidence Bounds for µ , with σ −scale = 0.9 0.08

N =3 t

N =6

0.07

t

0.07

N =12 t

95% Confidence Bounds

95% Confidence Bounds

0.06

0.05

0.04

0.03

N =6 t

N =12 σb

0.05 0.04 0.03 0.02 0.01

0.02

25

Nt=3

t

µb

0.06

100

95% Confidence Bounds for σb, with σ −scale = 0.9

b

50 75 Number of samples at each time point

100

0 25

50 75 Number of samples at each time point

100

Conclusions: Most desirable experiment involved using Ns = 100 once a week; however there appears to be little loss in accuracy if one uses Ns = 50 Jimena L. Davis (CRSC/NCSU)

DOE CSGF Annual Fellows’ Conference

June 18, 2008

14 / 20

Parameter Estimation Results with ABN Data

Inverse problem with data (subsequently) collected from shrimp cultured in tanks at ABN Fifty shrimp were randomly sampled and measured once a week under relatively constant tank conditions Using our methodology, we determined estimates of the growth rate distribution and quantified the uncertainty associated with these estimates with confidence bands

Jimena L. Davis (CRSC/NCSU)

DOE CSGF Annual Fellows’ Conference

June 18, 2008

15 / 20

PAR(2,128) Results with Complete December Data µ ˆ b ± 1.96SE(ˆ µb ) : 0.0010 ± 0.0535 σ ˆb ± 1.96SE(ˆ σb ) : 0.0324 ± 0.0313 J ∗ = 574.8315, σ ˆ 2 = 17.4191 ABN Data − December 4

45

ABN Data versus Estimated Population Data − December 11

30

ABN Data Estimated Pop.

40 25

35 20

Frequency

Frequency

30 25 20

15

10

15 10

5

5 0

0

0.05

0.1

0.15

0.2

0.25

Size of Shrimp

0.3

0.35

0

0.4

ABN Data versus Estimated Population Data − December 18

18 16

0

0.05

0.1

0.15

0.2

0.25

Size of Shrimp

0.3

0.35

0.4

ABN Data versus Estimated Population Data − December 24

14

ABN Data Estimated Pop.

ABN Data Estimated Pop.

12

14 10

Frequency

Frequency

12 10 8

8

6

6 4 4 2

2 0

0

0.05

0.1

0.15

0.2

0.25

0.3

Size of Shrimp

0.35

0

0.4

0

0.05

0.1

0.15

0.2

0.25

Size of Shrimp

0.3

0.35

0.4

ABN Data versus Estimated Population Data − December 31

10

ABN Data Estimated Pop.

9 8

Frequency

7 6 5 4 3 2 1 0

Jimena L. Davis (CRSC/NCSU)

0

0.05

0.1

0.15

0.2

0.25

Size of Shrimp

0.3

0.35

0.4

DOE CSGF Annual Fellows’ Conference

June 18, 2008

16 / 20

PAR(2,128) Results Excluding December 11 Data

µ ˆ b ± 1.96SE(ˆ µb ) : 0.0369 ± 0.0027 σ ˆb ± 1.96SE(ˆ σb ) : 0.0159 ± 0.0030 J ∗ = 87.4619, σ ˆ 2 = 3.1236 Estimated Probability Density with Confidence Region

35

0.9

+

30

p



0.8

Probability Distribution

25

Probability Density

Estimated Probability Distribution with Confidence Bands

1

Estimated p

20

15

10

0.7 0.6 0.5 0.4 0.3

Estimated M P+

0.2 5

0

M

P−

0.1 0

0.01

0.02

Jimena L. Davis (CRSC/NCSU)

0.03

0.04

0.05

0.06

0.07

Intrinsic Growth Rates, b

0.08

0.09

0.1

0

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Intrinsic Growth Rates, b

DOE CSGF Annual Fellows’ Conference

0.08

0.09

0.1

June 18, 2008

17 / 20

PAR(2,128) Results Excluding December 11 Data ABN Data − December 4

16

35

14

30

12

25 20

ABN Data Estimated Pop.

10 8

15

6

10

4 2

5 0

ABN Data versus Estimated Population Data − December 18

18

40

Frequency

Frequency

45

0

0.05

0.1

0.15

0.2

0.25

Size of Shrimp

0.3

0.35

0

0.4

ABN Data versus Estimated Population Data − December 24

12

0

ABN Data Estimated Pop.

0.05

0.1

0.15

0.2

0.25

Size of Shrimp

0.3

0.35

0.4

ABN Data versus Estimated Population Data − December 31

9

ABN Data Estimated Pop.

8

10 7 6

Frequency

Frequency

8

6

4

5 4 3 2

2 1 0

0

0.05

0.1

0.15

0.2

0.25

Size of Shrimp

Jimena L. Davis (CRSC/NCSU)

0.3

0.35

0.4

0

0

0.05

0.1

DOE CSGF Annual Fellows’ Conference

0.15

0.2

0.25

Size of Shrimp

0.3

0.35

0.4

June 18, 2008

18 / 20

Summary and Ongoing Work We have demonstrated how mathematical and statistical tools can be used to gain insight into the early growth dynamics of shrimp. We are working on improving the model predictions to the shrimp population data by considering different parametric and non-parametric approaches [Banks and Davis 2005] in the GRD model. Following the work of Seber and Wild, we are also working on fully developing the mathematical and asymptotic statistical theory (“functional” confidence bands) for OLS inverse problems where the parameter of interest is a probability distribution. We would also like to determine if the confidence bands constructed in the non-parametric approximation methods (not discussed here today) are converging to some “true” smooth confidence bands. Jimena L. Davis (CRSC/NCSU)

DOE CSGF Annual Fellows’ Conference

June 18, 2008

19 / 20

Acknowledgements Department of Energy Computational Science Graduate Fellowship and the Krell Institute Staff Collaborators: NCSU Dr. H.T. Banks Dr. Shuhua Hu Stacey Ernstberger

Advanced Bionutrition Corporation Dr. Elena Artimovich Dr. Arun K. Dhar

Marine Resources Research Institute Dr. Craig L. Browdy

Jimena L. Davis (CRSC/NCSU)

DOE CSGF Annual Fellows’ Conference

June 18, 2008

20 / 20

Suggest Documents