Michael Cardiff

Poster H53B-0917

Bayesian Inference of Site-Specific Petrophysical Relations from Noisy Data Michael Cardiﬀ (mcardiﬀ@stanford.edu) and Peter K. Kitanidis Stanford University, Department of Civil and Environmental Engineering

1. Overview In order to interpret geophysical images for use in parameterizing hydrologic models, measured geophysical properties must often be converted to hydrologically meaningful properties using a petrophysical relation. Unfortunately, both the hydrologic and geophysical parameters being related are often subject to significant noise. In this work, we develop a probabilistic method for inferring site-specific petrophysical relation curves from noisy data. Our method has the following features: 1) It explicitly considers the fact that all properties being measured are subject to noise (we term this omni-directional noise.) 2) It utilizes a probabilistically-derived objective function which can be shown, under basic assumptions, to be proportional to the likelihood of the curve given the data. 3) Since it is probabilistically-derived, the method can be used to produce best (i.e., maximum likelihood) estimates of the curve, but also to produce conditional realizations and perform model selection.

3. Conceptual Outline & Key Formulas

4. Example Performance The methodology is tested on three synthetic datasets. Datapoints were randomly generated from the “true” underlying relation, and significant omni-directional noise was added:

Suppose we are given a set of datapoints which we know contains significant omni-directional noise, and a proposed curve representing the petrophysical relation. Examining a single datapoint, there are many ways in which a datapoint under omni-directional noise could have been generated from the curve (see a on right).

Example 1

Example 3

We can define an “averaged” likelihood of the datapoint being generated from the curve by considering all possible ways in which that datapoint could have been generated (see b on right). The objective function derived from this averaging implicitly links the length-scale of data error to the length of the curve. (a) Different ways a measurement with omni-directional noise could be obtained (b) Average area under curve determines likelihood of the datapoint given the curve. Example (synthetic) datasets, showing true curve (triangle / dashed line), measured datapoints (dots), and representative 95% confidence ellipses (solid line) for measurements.

After averaging over the curve, the likelihood of the curve given all datapoints can be obtained and maximized. Equivalently, the negative log-likelihood (NLL) can be minimized. This can be derived as:

Formula 1

Using Formula 1, fit to each dataset was optimized using a range of Bezier splines (0- to 3rd-order), shown as the squares/solid curves below. Using Formula 2, a (b+1)-order spline is selected over a (b)-order spline only if the improvement in NLL is > 2 (This corresponds to roughly 95% confidence for the more highly parameterized model)

2. Key Problems

1st-order Spline (line)

2nd-order (quadratic) Spline

3rd-order (cubic) Spline

26.4868

26.3251

26.2594

26.2156

61.6383

27.2226

25.8167

25.6955

558.2207

137.0340

128.4446

121.3037

0-order Spline (point)

yi = coordinates for datapoint i Ri = covariance matrix for datapoint i measurement errors C(p,t) = a particular parametric curve model, where p are optimizable curve parameters and t defines the location along the given curve n = number of datapoints k = dimension of problem (2 for all problems presented)

Common approaches such as least-squares regression and kriging are maximum likelihood estimators ONLY if one can assume that there is no noise present in one of the properties. Uncertainty and biases can be created by this assumption if applied to data with omni-directional noise:

Another key question is model selection - how complex of a curve model should we utilize? Since our objective function is based on a probabilistically-derived likelihood measure, the likelihood ratio statistic may be employed for model selection. Assuming nested models, the following chi-squared statistic can be used: Where:

4. Bezier Spline Parameterization

min(NLL)

Example 2: Linear Relation

min(NLL)

Bezier splines are one flexible set of models that can be used for petrophysical relations. Bezier splines are ideal for this application since they are nested parametric models.

General formula for the x and y components of a b-order Bezier spline. 1st-order spline is a line, 2nd- and higher order splines create progressively more complex curves.

Example 1: No underlying relation

Formula 2 p1, p2 are the vectors of parameters of curve model 1 (C1) and curve model 2 (C2), respectively m is the difference in the number of parameters between curve model 1 and curve model 2

Issue 1 (Uncertainty): Regression on omni-directionally noisy data (true relation=dashed, best fit =solid, w/ representative error ellipse) Results depend highly on which coordinate is assumed error-free.

Issue 2 (Biases): Regression-fit models using one coordinate may lead to “infinite” misfit along another coordinate.

Example 2

Example 3: Cubic Relation

Parameters (P) in Bezier spline model can be thought of as “pulling points”. Spline passes through P0 and Pb and is “pulled” towards intermediate points.

min(NLL)