Aldrin Knopp - Computational Tools

BAYESIAN METHODS IN PROBABILITY OF DETECTION ESTIMATION AND MODEL-ASSISTED PROBABILITY OF DETECTION EVALUATION John C. Aldrin2, Jeremy S. Knopp1 and Harold A. Sabbagh3 1

Air Force Research Laboratory, Wright-Patterson AFB, OH 45433, USA, Computational Tools, Gurnee, IL 60031, USA 3 Victor Technologies LLC, Bloomington, IN 47401, USA 2

ABSTRACT. In this paper, the application of Bayesian methods for probability of detection (POD) estimation and the model-assisted probability of detection methodology is explored. A demonstration of Bayesian estimation for an eddy current POD evaluation case study is presented and compared with conventional approaches. Hierarchical Bayes models are introduced for estimating parameters including random variables in physics-based models. Results are presented that demonstrate the feasibility of simultaneously estimating model calibration parameters, model random variables and measurement error. Keywords: Bayesian Methods, Eddy Current, Model-assisted POD, POD Evaluation PACS: 02.50.-r, 81.70.-q

INTRODUCTION A model-assisted strategy for the design and execution of POD studies for NDE has been under developed [1-2] with several demonstrations to date [2-5]. Recent work has indicated the potential for reducing POD sample number by better managing experimental variation using more accurate statistical models and appropriately designing experiments based on prior experience and data [6]. Leveraging validated models that include all significant sources of variation on the measurement response is the key element to sample and experimental test reduction [5]. However, to fully address the challenge of performing a complete quality evaluation with limited experimental samples, a model-based assessment must incorporate the variations of the most significant input factors and appropriately integrate simulated and experimental results. A block diagram of the modelassisted POD evaluation process is presented in Figure 1. This process was developed by Bruce Thompson and Chuck Annis and is found in Appendix H of the MIL-HNBK 1823A [1]. As part of this process, there are several critical components in the evaluation that concern the propagation of varying conditions and uncertainty through the model, or updating of parameter estimates in the POD assessment. Bayesian analysis is a statistical approach for evaluating a “posterior” distribution of model parameters through combining information in the prior distribution with new evidence according to Bayes’ theorem. Bayesian methods can be leveraged here for the evaluation of three key components of the methodology: (1) evaluation of variability for an input factor or condition is as a probability density function (pdf), (2) model calibration [7], and (3) revised measurement model estimates with experimental data. In practice, the posterior distribution can be evaluated, providing a refinement to the original prior distribution through numerical methods such as Markov Chain Monte Carlo (MCMC) simulation [7-9].

In prior work on NDE reliability, Bayesian methods have been generally proposed [1011] with several examples for the evaluation of statistical POD models for hit-miss data [12-13] and ahat-versus-a reliability studies [14-15]. In particular, Bayesian methods are dvantageous for more complex can be quite useful to evaluate address complex estimation problems and when prior information is available. For example, Li, Meeker and Hovey [14] used Bayesian methods for simultaneously evaluating the noise interference model for POD and the crack distribution. In this paper, the focus will be on the application of Bayesian methods for probability of detection (POD) estimation for models of increasing complexity, laying the foundation for the model-assisted probability of detection (MAPOD) evaluation.

(1) Assess Key Factors (Joint Distributions) Using Bayesian Methods

Uncertainty Propagation Input Parameter Variability (Distributions) Model Error Stochastic Models

(2)Model Model ‘Calibration’ ‘Calibration‘ Confidence Bounds (Limited Samples)

Revise Model Estimates (3) Revise Model Estimates Using Methods with Bayesian Experimental Data Using Bayesian Methods

FIGURE 1. Model-Assisted POD model building process with complete approach to uncertainty propagation in MAPOD from MIL-HNBK 1823A, Appendix H (2009) [1].

BAYESIAN METHOD FOR EVALUATING MULTIPARAMETER REGRESSION MODELS A simulated POD demonstration study for the eddy current inspection of surface breaking cracks is revisited [5] here contrasting several different POD evaluation methodologies. This simulated study follows the same experimental measurement system, sample conditions and flaw characteristics from the prior work [5]. The primary controlling factors for this POD evaluation study are crack length (a1), crack width (a2) and probe liftoff (a3). VIC-3D© was applied to simulate the impedance measurement response. Two models for the eddy current measurement response, aˆ , are considered to evaluate their fit and associated residual error. They include a single parameter linear model:

aˆ = β 0 + β1a1 + ε ,

ε ~ N (0,σ ε2 ) ,

(1)

a two parameter linear statistical model including crack length and depth:

aˆ = β 0 + β1a1 + β 2 a2 + ε ,

ε ~ N (0,σ ε2 ) ,

(2)

where βi represent the model fit parameters and ε represents the residual error of the fit (as a Guassian random variable). Through application of a detection criterion as part of the NDE procedure, these models can be used for evaluating the POD curve and false call rate. Clearly, the nature of the confidence bounds on the POD model will be dependent on both the variation in the number of test samples used and the quality of the model fit. Unlike ‘hit-miss’ analysis, the bounds will be dependent upon the quality of the model to represent all controlled and uncontrolled variation during a POD study. A statistical model fit considering only crack length (Eq. 1) and both crack length and depth (Eq. 2) were compared, following the work of Hoppe [6]. Table 1 presents a comparison of the metrics on the statistical evaluation fits using maximum-likelihood estimation (MLE) and Bayesian methods. The inclusion of crack depth, a2, in the model was found to reduce the residual variance, σ ε2 , by 20%. For the two-parameter fit evaluations, three different methods for assessing confidence bounds, the delta method (Wald), Monte Carlo (MC) simulation and Bayesian methods (MCMC), were investigated. All three were found to be in good agreement. TABLE 1. Metrics on statistical model evaluation including MLE and Bayesian method comparisons.

β0 β1 β2 δ = σ ε2 aˆ threshold

σ00 σ11 σ22

a50 a90 a90/95

MLE [survreg/Wald] model: Eq (1) -0.05780 5.39532 0.00000

MLE [survreg, Wald] model: Eq (2) -0.05986 2.77503 6.65178

MLE [glm()/MC] model: Eq (2) -0.05986 2.77503 6.65178

Bayes/MCMC model: Eq (2) -0.05983 2.77668 6.64630

0.02538 0.10000 0.00003 0.00912 0.00000 0.02925 0.03529 0.03616

0.02001 0.10000 0.00002 0.11971 0.73494 0.03204 0.03719 0.03720

0.02001 0.10000 0.00002 0.12346 0.75790 0.03204 0.03721 0.03837

0.02061 0.10000 0.00002 0.12687 0.78247 0.03203 0.03732 0.03733

BAYESIAN APPROACH FOR EVALUATING POD USING PHYSICS-BASED MODELS In prior work [5], Matlab was linked with R and WinBUGS to perform the statistical fit with Bayesian method. Extensions using Python have also been developed to facilitate Bayesian analysis for POD Evaluation. For both approaches, WinBUGS and OpenBUGS were used to perform the MCMC simulations [16]. However, to address more sophisticated physics-based model fits, a new capability was needed to better link the numerical models within the MCMC simulation. At this time, using WinBUGS and OpenBUGS was deemed not practical going forward for a MAPOD evaluation. Matlab was chosen as a demonstration platform to perform a POD analysis incorporating physics-based models. An important prerequisite for fitting of the physicsbased model was developing a fast and accurate surrogate model to be called from Matlab. A series of numerical models were constructed and solved using VIC-3D© to address a variety of crack length, crack depths, aspect ratios, cracks location, and probe liftoff. These data sets were organized and a function call including an interpolation scheme was developed in Matlab to evaluate the numerical data files. The second critical component was having a capability to perform Markov Chain Monte Carlo (MCMC) simulations in Matlab. The Matlab MCMC toolbox, DRAM (Delayed Rejection Adaptive Metropolis)

[17] was leveraged for this task. The case study in the prior section was repeated, and good agreement was demonstrated between the Markov Chain Monte Carlo (MCMC) simulations in Matlab and WinBUGS. With the ability to evaluate parameters using MCMC with physics-based models, a second case study problem, the eddy current inspection of fastener sites for fatigue cracks, was revisited [3] to contrast the POD results for a single parameter linear model:

aˆ = β 0 + β1a1 + ε ,

ε ~ N (0,σ ε2 ) ,

(3)

and a calibrated physics-based model including crack length and depth:

aˆ = β 0 + β1 f (a1 , a2 ) + ε ,

ε ~ N (0,σ ε2 ) ,

(4)

where f () is a function call for a physics-based model, β0 , β 1 are model calibration parameters, a2 , the crack aspect ratio, (b/a) is fixed. Results for the linear statistical model and physics-based model evaluation are presented in Figure 2. Several observations can be made by examining the model fit and residual plots. First, the physics-based model fit provides a better match with the data and the residuals are generally reduced. There is a minor exception at the far right of the residual plot; however, these few data points may simply be due to having greater variance for larger flaw responses or they may be outliers due to poor surface conditions. It should also be noted that the MCMC simulation naturally provides the bounds on the parameter estimates being evaluated. Thus, for more complex models, Bayesian evaluations using MCMC simulation are quite advantageous. For the POD curves shown in Figures 2(e) and 2(f), the result for the physics-based model appears to better represent the data and actually produces a more conservative POD model fit.

0.15

0.15

(a)

0.05

a

a

0.05 0 -0.05

0 0

0.02

0.04

0.06

0.08 0.1 a1 (in.)

0.12

0.14

0.16

-0.05

0.18

0.04

0 -0.02

0.06

0.08 0.1 a1 (in.)

0.06

0.08 0.1 a1 (in.)

0.12

0.14

0.16

0.18

0.12

0.14

0.16

0.18

0.14

0.16

0.18

-0.05 0

0.02

0.04

0.06

0.08 0.1 a1 (in.)

0.12

0.14

0.16

-0.1

0.18

0

0.02

0.04

p y

y

1

(f) POD

(e) POD

0.04

0

1

0.5

0

0.02

(d)

0.05

residuals

residuals

0

0.1

(c)

0.02

-0.04

(b)

0.1 hat

hat

0.1

0

0.02

0.04

0.06

0.1 0.08 a1 (in.)

0.12

0.14

0.16

0.18

0.5

0

0

0.02

0.04

0.06

0.1 0.08 a1 (in.)

0.12

FIGURE 2. Results for (a) linear statistical model and (b) physics-based model fits to experimental data. Corresponding residual plots are presented in (c) and (d) respectively. POD curves for the (e) linear statistical model and (f) physics-based model fits.

HEIRARCHICAL BAYESIAN METHODS FOR EVALUATING PHYSICS-BASED MODELS WITH UNCERTAIN STOCHASTIC PARAMETERS The estimation of both the calibration parameters of a physics-based model and the distribution of select model factors, including crack aspect ratio and liftoff were studied. Matlab code was developed to call a surrogate measurement model representing the numerical results from VIC-3D©. The following model estimation problem was studied:

aˆ = β 0 + β1 f (a1 , β 2 , β 3 ) + ε ,

ε ~ N (0,σ ε2 ) ,

(5)

where f () is a function call for a physics-based model, β0 and β 1 are model calibration parameters, β 2 is a random variable associated with crack aspect ratio (b/a) and β 3 is random variable associated with liftoff variation. An initial study was performed where only the random variables β0, β1 and β2 were estimate using a Bayesian (MCMC) approach. A plot of the simulated data is given in Figure 3 for corner flaws located at the faying surface of the first and second layer. Initial results were found to be mixed due to the ill-posedness of estimating β1 and β2 simultaneously. Improvements were made to the evaluation by ensuring the surrogate model was smooth over the entire parameter space and the application of informative priors based on expert knowledge of the parameter constraints in the inversion process. To fully evaluate this model, there is also the need to provide a true estimate of variance for crack aspect ratio random variable. Parametrized eddy current models that include aspect ratio as a random variable as shown in Figure 3 can reasonably address both the mean response and non-constant variance trends observed in experimental results. The evaluation of these stochastic model parameters can be achieved through evaluation of hierarchical Bayesian models. Gelman et al. introduced hierarchical Bayesian models for these classes of problems [16-17] and presented several examples implemented using WinBUGS. Progress was made in the effort to implement hierarchical Bayesian models in Matlab (in order to leverage surrogate models); however, the estimation results were found to not be as consistent as the WinBUGS code. Thus, WinBUGS was used for the final demonstration in this paper. 0.16

1st

z

layer 1 layer 2

0.14

b

2nd

0.12

a hat

0.1

a

parameters: β0 = 0.0 β1 = 1.0 µ_β2 = 0.75 σ_β2 = 0.12 σ_ε = 0.0

0.08 0.06 0.04 0.02 0

0

0.02

0.04

0.06

0.08 a1 (in.)

0.1

0.12

0.14

0.16

FIGURE 3. Simulated results from the surrogate eddy current model response as a function of crack length based on VIC-3D® numerical simulations for first and second layer cracks at a fastener site. The aspect ratio is considered a Gaussian random variable with the mean and standard deviation prescribed.

Based on the eddy current case study problem, an example is presented here to investigate the ability to simultaneously evaluate the model fit parameters and the variance terms for both the noise in the measurement and stochastic variance in the model slope. The physics-based hierachrical NDE measurement model is given as follows:

aˆ = β 0 + β 1 f (a1 ; β 2 ) + ε aˆ , and

ε aˆ ~ N (0, σ a2ˆ ) , β 2 ~ N ( µ β , σ β2 ) , 2

2

(6) (7)

where f () is a function call for a physics-based model, β0 , β 1 are model calibration parameters, and β 2 is a random variable associated with crack aspect ratio (b/a) with unknown mean and variance. A simplified test case hierachrical NDE measurement model is given as follows:

aˆ = β 0 + (β 1 + η )a1 + ε aˆ ,

η = εη ,

ε aˆ ~ N (0, σ a2ˆ ) , ε η ~ N (0, σ η2 ) ,

(8) (9)

where β0 and β 1 are offset and mean slope terms of the model respectively, η is a random variable associated with varying-slope in the model and σ η2 is the variance in slope parameter. Here, the random variable, η, is used to simulate the increasing variance with increasing flaw size present in the physics-based model shown in Figure 3. The goal of this study is to assess how accurately these four parameters, β0, β1, σ η2 and σ a2ˆ , can be simultaneously estimated with respect to known values. Three example estimation problems are presented in Figure 4. Test case values were selected to closely represent examples in prior experimental data (see Figure 2) and simulated results (Figure 3). The first case (a) investigated the condition where variance as a function of flaw size dominates measurement noise (i.e. the variance independent of flaw size). Using only 100 samples, the estimates for the two variance terms, σ η2 and σ a2ˆ , were found to be 0.2580 and 0.00139 respectively, close to the exact values of 0.300 and 0.0010. However, the 95% credible bounds for both estimates just missed containing the true values. This error may have been due to the specific random sample used with the limited number of points or could be something systematic in the estimation problem. A second case study (b) with the same known parameters was repeated but with 1000 samples. As showing in Figure 4(b), the estimated parameter results are in much better agreement. In both cases, there appears to be a repeated underestimate of the slope parameter variance, σ η2 , and an overestimate of the measurement noise variance, σ a2ˆ . Note, the estimates for the calibration parameters, β0 and β1, were found to be in good agreement with the true values for these case studies. The variance terms appear to be the more challenging parameters to estimate in the hierarchical model. A third case (c) investigated the condition where variance as a function of flaw size is on a similar order as the measurement noise (variance independent of flaw size). Using only 100 samples, the estimates for the two variance terms, σ η2 and σ a2ˆ , were again found to be quite close to the exact values of 0.100 and 0.0050 respectively and the credible bounds included the true values of the parameters. Both variance parameters are slightly overestimated while the calibration parameter β1 was slightly underestimated. All in all, these case studies demonstrate the potential of simultaneously estimating the model calibration parameters, model random variables and measurement error.

0.25

Ns = 100

0.2

β0

a

hat

0.15

1.0312

(0.9848,1.0810)

0.300

0.2580

(0.2286,0.2923)

σε 0.00100 0.00139 (0.00103,0.00210)

0.05 0

β1 1.000

ση

0.1

(a) 0

0.02

0.04

0.06

0.08

0.12 0.1 a1 (in.)

0.14

0.16

0.18

0.2

0.3

Ns = 1000

a

hat

0.2

0.1

-0.1

β0

True Bayesian 95% credible values estimates bounds 0.000 -0.00011 (-0.00052,0.00027)

β1 1.000

1.0145

(0.9952,1.0330)

ση

0.300

0.2966

(0.2853,0.3084)

σε 0.00100 0.00113 (0.00101,0.00139)

0

(b) 0

0.02

0.04

0.06

0.08

0.1 0.12 a1 (in.)

0.14

0.16

0.18

0.2

0.2

Ns = 100

a

hat

0.15

β0

0.1

True Bayesian 95% credible values estimates bounds 0.000 0.00193 (0.00007,0.00382)

β1 1.000

0.9716

(0.9371,1.007)

0.100

0.1014

(0.0795,0.1248)

ση

σε 0.00500 0.00545 (0.00432,0.00677)

0.05

0

True Bayesian 95% credible values estimates bounds 0.000 -0.00024 (-0.00119,0.00364)

(c) 0

0.02

0.04

0.06

0.08

0.1 0.12 a1 (in.)

0.14

0.16

0.18

0.2

FIGURE 4. Hierarchical model test cases estimating calibration parameters and variation in measurement noise and stochastic model slope. Test cases shown for strong model slope variation with (a) 100 samples and (b) 1000 samples, and (c) for varying slope and measurement noise significant factors, for 100 samples.

CONCLUSIONS AND FUTURE WORK Progress was presented on the feasibility of using Bayesian methods to estimate model calibration parameters, model random variables and measurement error in MAPOD evaluations. Future work is planned to perform hierarchical Bayesian evaluations with physics-based NDE measurement models, better address random variable parameter estimation, and investigate the appropriate diagnostics to verify the evaluation assumptions. The Uncertainty Quantification (UQ) community is currently developing a broad Bayesian framework for the use of computational models with observational data, but challenges remain. First, one must properly address model discrepancy and not treat it as a random error [7]. Simply calibrating a wrong model will give physical parameter estimates that are wrong. Gaussian Process (GP) models can be used to fit model discrepancy. Second, the use of prior information in a Bayesian framework can greatly help mitigate the challenge of estimating parameters. Through the elicitation of expert opinion and applying reasonable constraints on the estimation process, these challenges can feasibly be managed. Lastly, there is need to better leverage model form uncertainty evaluation approaches. Recent work has demonstrated the benefits of using Bayes methods [13,19] to identify the best models and minimize issues associated with model discrepancy.

ACKNOWLEDGEMENTS The authors wish to thank the Air Force Office of Scientific Research (AFOSR) and Dr. David Stargel in particular for supporting this research under task number 11RX15COR. The authors thank Chuck Annis (Statistical Engineering), David Forsyth (TRI/Austin) and Eric Lindgren (AFRL) for useful discussions on the subject. The authors also thank Chuck Annis for his work on the mh1823 software in R and David Forsyth and Chris Coughlin of TRI/Austin for work on the POD Toolkit. For more information on using Bayesian tools for POD evaluation, see www.computationaltools.com/Bayes. REFERENCES 1. U.S. Department of Defense, Handbook, Nondestructive Evaluation System Reliability Assessment, MIL-HDBK-1823A, (7 April 2009). 2. Thompson, R. B., “A unified approach to the model-assisted determination of probability of detection”, Materials Evaluation, Vol. 66, pp. 667-673, (2008). 3. Aldrin, J.C., Knopp, J. S., Lindgren E. A., and Jata, K. V., “Model-assisted Probability of Detection (MAPOD) Evaluation for Eddy Current Inspection of Fastener Sites”, Review of Progress in QNDE, Vol. 28, AIP, pp 1784-1791, (2009). 4. Dominguez, N. Feuillard, V., Jenson. F., and Willaume, P., “Simulation assisted POD of a Phased Array Ultrasonic Inspection in Manufacturing,” Review of Progress in QNDE, Vol. 31, AIP, pp 1765-1772, (2012). 5. Aldrin, J. C., Sabbagh, H. A., Murphy, R. K., Sabbagh, E. H., Knopp, J. S., Lindgren, E. A., Cherry, M. R., “Demonstration of model-assisted probability of detection evaluation methodology for eddy current nondestructive evaluation,” Review of Progress in QNDE, Vol. 31, AIP, pp. 1565-1572, (2012). 6. Hoppe, W. C., “Parametric probability of detection (POD) estimation for eddy current crack detection,” Electromagnetic Nondestructive Evaluation, Dayton OH, (July 21-23, 2009). 7. Kennedy, M. C. and O’Hagan, A., “Bayesian calibration of computer models,” J. R. Statist. Soc. B, Vol. 63, pp. 425–464, (2001). 8. Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., Bayesian Data Analysis, (CRC Press, 2003). 9. Christensen, R., W. Johnson, and A. Branscum, Bayesian Ideas and Data Analysis: An Introduction for Scientists and Statisticians, (CRC Press, 2010). 10. Meeker, W.Q. and L.A. Escobar, "Introduction to the Use of Bayesian Methods for Reliability Data," in Statistical Methods for Reliability Data, Wiley, pp. 343-368, (1998). 11. Thompson, R.B., “A Bayesian Approach to the Inversion of NDE and SHM Data”, Review of Progress in QNDE, Vol 29, AIP, pp. 679-686, (2010). 12. Leemans, D.V, and Forsyth, D., “Bayesian Approaches to Using Field Test Data in Determining the Probability of Detection,” Mat Eval, Vol. 62, pp. 855-859, (2004). 13. Knopp, J. S. and Zeng, L., Statistical Analysis of Hit/Miss Data using Bayes Factors, Materials Evaluation, (2012, accepted for publication.) 14. Li, M., Meeker, W. Q., and Hovey, P., “Joint Estimation of NDE Inspection Capability and Flaw-size Distribution for In-service Aircraft Inspections,” Research in NDE, Vol. 23, pp. 104123 (2012). 15. Kanzler, D., Muller, C., Pitkanen, J., Ewert, U., “Bayesian Approach for the Evaluation of the Reliability of Non-Destructive Testing Methods,” World Conference NDT, (2012). 16. Lunn, D., Spiegelhalter, D., Thomas, A., and Best, N., "The BUGS project: Evolution, critique and future directions,” Statistics in Medicine, Vol. 28, pp. 3049–3067, (2009). 17. Haario, H., Laine, M., Mira, A. and Saksman, E., “DRAM: Efficient adaptive MCMC”, Statistics and Computing, Vol. 16, pp. 339-354, (2006). 18. Gelman, A. and Hill, J., Data Analysis Using Regression and Multilevel/Hierarchical Models, (Cambridge University Press, 2007). 19. Park, I., Amarchinta, H. K., Grandhi, R. V., “A Bayesian approach for quantification of model uncertainty,” Reliability Engineering & System Safety, Vol. 95, pp. 777-785, (2010).