Stochastic Simulation of Complex Dependency ...

0 downloads 0 Views 146KB Size Report
association measures such as Kendall's tau and Spearman's rho. A brief .... where r is the linear correlation coefficient and the off-diagonal element of matrix R .
Proceedings of IAMG’05: GIS and Spatial Analysis, Vol.2, 749-755

Stochastic Simulation of Complex Dependency Pattern of Petrophysical Properties Using T-copulas Martín Díaz-Viera, Ricardo Casar-González Instituto Mexicano del Petróleo, Eje Central Lázaro Cárdenas 152, 07730, México D.F. E-mail: [email protected]

1. Abstract Copulas are a new way of modeling the correlation structure between variables. Over the past forty years copulas have played an important role in several areas of statistics. But, only recently copulas have become popular in simulation models. Copulas are functions that describe dependencies among variables, and provide a way to create distributions to model correlated multivariate data. Using a copula, a data analyst can construct a multivariate distribution by specifying marginal univariate distributions, and choosing a particular copula to provide a correlation structure between variables. The choice of a particular copula in an application may be based on actual observed data, or different copulas may be used as a way of determining the sensitivity of simulation results to the input distribution. In the present work we explore how to use copulas to simulate dependent bivariate random data with different correlation patterns using a Monte-Carlo approach. In particular, we'll use a bivariate t-copula and the empirical model for the marginal distributions in conjunction with different association measures such as Kendall’s tau and Spearman’s rho. A brief discussion on how to produce dependent geostatistical cosimulations using copulas is presented. The extension of this procedure to a higher number of dimensions is straight forward. Some numerical examples of application of this methodology to petrophysical data at wells in a petroleum reservoir will be shown.

2. A Very Short Introduction to Copulas A copula is a function which joins or “couples” a multivariate distribution function (d.f.) to its one-dimensional marginal d.f’s (Nelson, 1999). It can be shown that if H ( x, y ) is a bivariate d.f.

with margins F ( x ) and G ( y ) , then there exists a copula C such that H ( x, y ) = C ( F ( x ) , G ( y ) ) . In short, a copula is simply a multivariate probability distribution whose marginal distributions are uniforms. Copulas can be defined informally as follows: Let X and Y be continuous random variables (r.v.) with d.f’s F ( x ) = P ( X ≤ x ) and G ( y ) = P ( Y ≤ y ) , and joint d.f. H ( x, y ) = P ( X ≤ x, Y ≤ y ) . For 2 3 every ( x, y ) in [ −∞, ∞] consider the point in [ 0,1] with coordinates ( F ( x ) , G ( y ) , H ( x, y ) ) . 2 This mapping from [ 0,1] to [ 0,1] is a copula.

Martín Díaz-Viera and Ricardo Casar-González

750

For the purposes of this paper it will be considered only the bivariate case. Given two r.v’s X and Y , whose marginal d.f’s are continuous and strictly increasing, the so-called copula C of their joint d.f. may be defined by evaluating C ( u, v ) = H ( F −1 ( u ) , G −1 ( v ) ) , where the F −1 ( u ) and G −1 ( v ) are the quantile functions of the margins. One of the most important properties of copulas is that they let you specify the dependence between r.v’s completely separate from the specification of their marginal distributions. In other words, through copulas you can specify dependence between uniform r.v’s and afterwards transform them to any desired marginal distribution. There are several families of copulas among them are the Gaussian and Student’s t copulas, which belong to the broader class of elliptic copulas. In this paper we will focus in the study of t-copulas and its capacity for modeling complex pattern of dependency of bivariate petrophysical data. The main reason for this choice is that in a recent paper (Breymann et al., 2003) has been shown that t-copula is generally superior to the Gaussian copula in the ability to capture better the phenomenon of dependent extreme values.

3. Modeling of Dependency Pattern Using T-Copulas The t-copula can be thought of as representing the dependence structure implicit in a multivariate t-distribution. The n-variate t-copula can be defined by+

Cνt , µ ,Σ ( u1 ,..., un ) = tn,ν ( t1,−ν1 ( u1 ) ,..., t1,−ν1 ( un ) )

(1)

where tn ,ν ≡ tn ,ν ( µ , Σ ) is the n-variate t-distribution with ν degrees of freedom, mean vector µ and positive-definite dispersion matrix Σ . Since the copula has the property to remain invariant under a standardization of the marginal distribution, the copula of a tn ,ν ( µ , Σ ) is identical to that of a tn ,ν ( 0, R ) distribution where R is the correlation matrix implied by the dispersion matrix Σ . Thus, the bivariate t-copula can be expressed by

Cνt , r ( u1 , u2 ) = t2,ν ( t1,−ν1 ( u1 ) , t1,−ν1 ( u2 ) )

(2)

where r is the linear correlation coefficient and the off-diagonal element of matrix R . The algorithm for numerical simulation of a bivariate distribution with given correlation coefficient r and marginal distributions by means of a t-copula is particularly simple. The general procedure is as follows: i.

Generate a bivariate t-distributed random pair

( x1 , x2 ) : t2,ν ( 0, R ) .

Stochastic Simulation of Complex Dependency Pattern of Petrophysical Properties

ii. iii.

751

Obtain a uniformed distributed pair u1 = t1,ν ( x1 ) and u2 = t1,ν ( x2 ) . Transform y1 = F1−1 ( u1 ) and y2 = F2−1 ( u2 ) , where the Fi −1 ( ui ) , i = 1, 2 , are the quantile functions of the margins.

Making use of copulas can be captured the “scale-invariant” nature of the association between random variables. The most widely known scale-invariant measures of association are the population versions of Kendall’s tau and Spearman’s rho. Both measure a form of dependence known as concordance. Geometrically, two distinct points (x1, y1) and (x2, y2) in the plane are concordant if the line segment connecting them has positive slope, and discordant if the line segment has negative slope. Kendall’s tau ( τ K ) measure can be defined as the difference between the probabilities of concordance and discordance for two independent pairs ( X 1 , Y1 ) and ( X 2 , Y2 ) each with bivariate distribution H . Whereas, Spearman’s rho ( ρ S ) coefficient is defined as the difference between probabilities of concordance and discordance of the vectors ( X 1 , Y1 ) and ( X 2 , Y3 ) , where ( X 1 , Y1 ) , ( X 2 , Y2 ) and ( X 3 , Y3 ) are three independent random vectors with a common joint distribution function H . Roughly speaking, these rank correlations measure the degree to which large or small values of one random variable associate with large or small values of another. However, unlike the linear correlation coefficient, they measure the association only in terms of ranks. As a consequence, the rank correlation is preserved under any monotonic transformation. In particular, the transformation method just described preserves the rank correlation. Therefore, knowing the rank correlation of the bivariate t-distribution exactly determines the rank correlation of the final transformed r.v’s. Whereas r is still needed to parameterize the underlying bivariate t-distribution, Kendall's τ K or Spearman's ρ S are more useful in describing the dependence between r.v’s, because they are invariant to the choice of marginal distributions. It turns out that for the t-distribution, there is a simple mapping between Kendall's τ K or Spearman's ρ S , and the linear correlation coefficient r :

τ K = ( 2 π ) arcsin ( r ) or r = sin (τ K π 2 )

(3)

ρ S = ( 6 π ) arcsin ( r 2 )

(4)

or

r = 2 sin ( ρ S π 6 )

4. The Problem of Modeling Dependency Pattern of Petrophysical Properties The most common way to estimate permeability profiles in uncored wells is through some permeability predictor, typically in the form of an empirical equation (Balan et al., 1995). This normally requires a calibration data set that is represented by one or more key wells where comprehensive information is available in terms of core and log data. This calibration data set is used to build the predictor and to test the reliability of the results.

Martín Díaz-Viera and Ricardo Casar-González

752

The regression approach, using statistical instead of deterministic formalism, tries to predict a conditional average, or expectation of permeability, corresponding to a given set of parameters. A different predictive equation must be established for each new area or new field. The main drawbacks of this method are that the variability of data (in terms of variance and standard deviation) cannot be captured and the predicted permeability profile will be ineffective in estimating the extreme values. On the other hand, model-free function estimators like artificial neural networks are very flexible tools for recognizing and reproducing the pattern of permeability distribution, but require a time consuming “learning” process which strongly depends on the amount and quality of available data.

5. Conditional Simulation of Permeability with Simulated Annealing Using a T-Copula The method here proposed is a modification of the methodology for geostatistical modeling of permeability with annealing cosimulation that was first introduced by Deutsch and Cockerham in 1994. Basically, the procedure can be described as a two stage algorithm. The first part consist in performing a joint Mote Carlo simulation of porosity (PHI) and permeability (K) using a t-copula which reproduces the empirical dependence pattern between K and PHI and their marginals (Perkins and Lane, 2003). Whereas, in the second part a conditional simulation of permeability applying simulated annealing method (Deutsch and Journel, 1998) is carried out. A more detailed description of the algorithm is as follows: i.

Estimate the sample rank correlation τ K* ( ρ S* ) of K and PHI.

ii.

Obtain the corresponding linear correlation coefficient rK* ( rS* ) applying the mapping of eqs.

iii. iv.

(3)-(4). Generate m correlated pairs applying the algorithm of section 3 with the empirical cumulative distributions of K and PHI. Simulate permeability conditioned on known permeability values using the porosity values as a secondary variable in the same fashion as was performed by Deutsch and Cockerham (1994), but considering as input data the previous simulated bivariate distribution instead of the empirical one.

6. Numerical Experiments The numerical experiments were carried out on interpreted porosity (PHI) and permeability (K) well log data taken from a turbiditic sandstone-shale formation. The original data were firstly averaged every meter and later on they were sampled every five meters, resulting a working data set of 214 points. It is easy to observe in the Fig. 1 that the 1000 values that were simulated by Monte Carlo method

Stochastic Simulation of Complex Dependency Pattern of Petrophysical Properties

753

employing t-copulas with sample rank correlations (τ K* and ρ S* ) reproduce quite well the dependency pattern, the correlation coefficient and even the extreme values of the original sample data (Fig. 2).

Simulated Values

Simulated Values

0.5 0.4

0.5

r* = 0.6375

r* = 0.6474

0.4

K

0.3

K

0.3 0.2

0.2

0.1

0.1

0

PHI

0.02

K

0.04

0.06 PHI

0.08

0

0.1

PHI

Min

0.0001

0.000

Min

0.0001

0.000

Max

0.109

0.5448

Max

0.109

0.5448

Std

0.02222 0.08116

A- Simulation with

Std

τ = 0.895 ( r =0.986) * K

* K

0.02

K

0.04

0.06 PHI

0.08

0.1

0.02231 0.07395

B- Simulation with ρ S* = 0.980 ( rS* =0.982)

Fig. 1. Cross plot distributions, histograms and estimated correlation coefficients of PHI and K simulated values using: A-Kendall´s tau, B-Spearman´s rho.

r* = 0.6281

γ ( h) =0.0014+0.0044*Sph20(h) PHI

K

Min

0.0001

0.000

Max

0.109

0.5448

Std

0.022 0.07624

A

B

Fig. 2. A- Cross-plot distribution, histograms and estimated correlation coefficient of PHI and K sample values. BSample variogram and its fitted model for K sample values. Where Sph 20 (h) -denotes spherical variogram model with 20 meters range.

The previous simulated values were introduced as the bivariate distributions functions of PHI and K in the annealing conditional simulation of permeability. The simulation was performed in a grid with a mesh size of one meter. The porosity values which are known in the same simulation mesh are included as secondary data. The variogram, the correlation coefficient and the cross-plot were considered in the objective function of the simulated annealing method. The results of the annealing conditional simulation of permeability (Figs. 3 and 4) show a very good agreement in terms of histograms, variogram, correlation and cross-plot reproduction. Note that the

Martín Díaz-Viera and Ricardo Casar-González

754

extreme values and the standard deviation (Std) of permeability are also reproduced in the simulations.

r* = 0.6257

γ ( h) =0.0014+0.0042*Sph20(h) PHI

K

Min

0.0001

0.000

Max

0.1254

0.5448

Std

0.02181 0.07798

A

B

Fig. 3. Simulated values of PHI and K using Kendall´s tau after conditional annealing simulation. A- Cross-plot distribution, histograms and estimated correlation coefficient. B- Sample variogram and its model for K simulated values. Where Sph 20 ( h) -denotes spherical variogram model with 20 meters range.

r* = 0.6043

γ ( h) =0.0014+0.0042*Sph20(h) PHI

K

Min

0.0001

0.000

Max

0.1254

0.5448

Std

0.02181 0.07672

A

B

Fig. 4. Simulated values of PHI and K using Spearman´s rho after conditional annealing simulation. A- Cross-plot distribution, histograms and estimated correlation coefficient. B- Sample variogram and its model for K simulated values. Where Sph 20 ( h) -denotes spherical variogram model with 20 meters range.

7. Conclusions The method here presented possesses several features that make it very competitive for modeling complex dependency pattern of petrophysical properties. Actually, it could serve as an alternative to traditional methods like linear regression, since it doesn’t need the assumption of the existence of a linear dependency model between variables. At the same time, this method reproduces quite well the extreme values and the variability observed in the data samples, overcoming the main drawbacks of the linear regression method. Besides, in comparison with the method of artificial neural networks, this approach has the advantage that doesn’t require a time consuming “learning” process.

Stochastic Simulation of Complex Dependency Pattern of Petrophysical Properties

755

Additionally, this method offers the following benefits: − There is no need of logarithmic transformation of permeability, and consequently, the potential bias due to the back transformation procedure is avoided. − As a consequence of including the simulated by a t-copula bivariate distribution in the annealing cosimulation, the histogram of permeability is automatically matched. − There are no limitations for its application to several dimension problems. A further comparative study concerning the relative performance of Kendall’s tau and Spearman’s rho in reproducing complex dependency structures is required. The application of optimal fitted copulas, instead of empirical copulas, in geostatistical cosimulations is an open and very promising line of research.

8. References Balan, B., Mohaghegh, S., Ameri, S., 1995. State-of-The-Art in Permeability Determination From Well Log Data: Part 1- A Comparative Study, Model Development, SPE 30978. Breymann, W., Dias, A. and Embrechts, P., 2003. Dependence structures for multivariate high-frequency data in finance. Quant. Finance 3, 1–14. Deutsch C.V. and Cockerham P.W., 1994. Geostatistical Modeling of Permeability with Annealing Cosimulation (ACS), SPE 28413. Deutsch, C. V. and Andre, G. Journel, 1998. GSLIB: Geostatistical Software Library and User's Guide, Oxford University Press, Second Edition, 369 pp. Nelson, R.B., 1999. An Introduction to Copulas, Lecture Notes in Statistics, Vol. 139, Springer-Verlag, 216pp. Perkins P. and Lane T., 2003. Monte-Carlo Simulation in MATLAB Using Copulas, MATLAB News & Notes, www.mathworks.com, November 2003.