Computers & Geosciences ] (]]]]) ]]]–]]]
Contents lists available at ScienceDirect
Computers & Geosciences journal homepage: www.elsevier.com/locate/cageo
HOSIM: A high-order stochastic simulation algorithm for generating three-dimensional complex geological patterns$ Hussein Mustapha n, Roussos Dimitrakopoulos COSMO—Stochastic Mine Planning Laboratory, Department of Mining and Materials Engineering, McGill University, 3450 University St., Montreal, Que., Canada H3A 2A7
a r t i c l e in f o
abstract
Article history: Received 15 April 2010 Received in revised form 26 August 2010 Accepted 7 September 2010
The three-dimensional high-order simulation algorithm HOSIM is developed to simulate complex nonlinear and non-Gaussian systems. HOSIM is an alternative to the current MP approaches and it is based upon new high-order spatial connectivity measures, termed high-order spatial cumulants. The HOSIM algorithm implements a sequential simulation process, where local conditional distributions are generated using weighted orthonormal Legendre polynomials, which in turn define the so-called Legendre cumulants. The latter are high-order conditional spatial cumulants inferred from both the available data and training images. This approach is data-driven and reconstructs both high and lowerorder spatial complexity in simulated realizations, while it only borrows from training images information that is not available in the data used. However, the three-dimensional implementation of the algorithm is computationally very intensive. To address his topic, the contribution of high-order conditional spatial cumulants is assessed in this paper through the number of Legendre cumulants with respect to the order of approximation used to estimate a conditional distribution and the number of data used within the respective neighbourhood. This leads to discarding the terms of Legendre cumulants with negligible contributions and allows an efficient simulation algorithm to be developed. The current version of the HOSIM algorithm is several orders of magnitude faster than the original version of the algorithm. Application and comparisons in a controlled environment show the excellent performance and efficiency of the HOSIM algorithm. & 2010 Elsevier Ltd. All rights reserved.
Keywords: Sequential simulation High-order spatial cumulants Legendre polynomials
1. Introduction The quantification of spatial uncertainty in the characteristics of a diversity of natural phenomena is typically based upon the stochastic simulation of stationary and ergodic random fields conditional to available data and information. Since the 1990s, several new simulation frameworks have been developed to address the limits of well known conventional simulation methods (e.g. Rosenblatt, 1985; Journel and Alabert, 1989; Journel, 1994; Goovaerts, 1998; Tjelmeland, 1998; Chile s and Delfiner, 1999; Bernardeau et al., 2002; Deutsch, 2002; Dimitrakopoulos and Luo, 2004; Remy et al., 2009). These limits include their ineffectiveness in dealing with spatial complexity, largely because they are also limited to the two-point or second-order spatial statistical moments of the corresponding random field models employed. New simulation frameworks and approaches include: the multiple point simulation algorithms, such as the snesim (Strebelle, 2002), filtersim (Zhang et al., 2006; Wu et al., 2008), and simpat (Arpat and Caers, 2007), and others such as (De Vries et al., 2008; Chiginova and Hu, 2008; Mehrdad and Caers, 2010); $
The code will be available at http://cosmo.mcgill.ca. Corresponding author. E-mail addresses:
[email protected],
[email protected] (H. Mustapha). n
Markov random field models based multipoint type approaches (Daly, 2004; Tjelmeland and Eidsvik, 2004), new kernel based approaches (Scheidt and Caers, 2009); multi-scale simulations based on discrete wavelet decomposition (Gloaguen and Dimitrakopoulos, 2009; Chatterjee et al., in press). Vargas-Guzma´n (2008, 2009) has used cumulants to account for high-order residual terms in permeability estimation and for resources-in-place calculations, although the technical aspect of the related work is incomplete. Recently, highorder spatial cumulants were also introduced as a means to consistently describe complex spatial architectures and configurations (Dimitrakopoulos et al., 2010). High-order cumulants are combinations of moment statistical parameters characterizing non-Gaussian random variables (Billinger and Rosenblatt, 1966). Nikias and Petropulu (1993) provide new definitions and terms, in a systematic way, for signal processing approaches that are widespread in the signal processing literature, including the use of high-order multivariate cumulants for identification of noisy closed loop systems (Delopoulos and Giannakis, 1996) and in non-linear signal processing (Zhang, 2005). For more details, we refer to (Dimitrakopoulos et al., 2010). In the context of spatial random fields, the first and second order cumulants are but the well known mean and covariance; in general, spatial cumulants may be seen as an extension of the wellknown covariance function. The systematic definitions non-Gaussian spatial random fields and their high-order spatial statistics are given
0098-3004/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2010.09.007
Please cite this article as: Mustapha, H., Dimitrakopoulos, R., HOSIM: A high-order stochastic simulation algorithm for generating threedimensional complex geological patterns. Computers and Geosciences (2010), doi:10.1016/j.cageo.2010.09.007
2
H. Mustapha, R. Dimitrakopoulos / Computers & Geosciences ] (]]]]) ]]]–]]]
in Dimitrakopoulos et al. (2010) and the algorithm to efficiently calculate high-order spatial cumulants and related source code are detailed in Mustapha and Dimitrakopoulos (2010a). A related finding related to the information described by cumulants, from the above work, is that spatial cumulants of orders three to five capture directional multiple point-periodicity, as well as multiple point connectivity including connectivity of extreme values, in addition to geometric characteristics and spatial architectures in two- and three-dimensional datasets. The stochastic simulation based on high-order spatial cumulants for continuous variables is developed in Mustapha and Dimitrakopoulos (2010b), and it is founded upon a sequential framework, where a nonparametric Legendre polynomial series approximation (Lebedev, 1965) is used to estimate local conditional densities. The estimation of these conditional densities is based on coefficients calculated in terms of spatial high-order cumulants and make no distributional assumptions or require data transformations before and after simulating, while working well for complex non-Gaussian and non-linear system. An important aspect of this new framework, as shown in the above reference, is that the specific relations between the order of the spatial cumulants and the lower order moments, as it exists in the dataset used, are maintained by the simulated realizations. This makes the simulation process consistent over a series of orders, that is, spatial cumulants are not some randomly selected moments. This is also a main difference with the existing multiple point methods (Boucher, 2009) and leads to realizations where all the lower order statistics in a dataset is reproduced, something that the existing multipoint methods do not ensure, thus conflicts appear when the number of hard data increases (Strebelle, 2002). Note that existing multiple point methods reproduce the statistics of the training image employed. Despite the advantages shown in past work, simulation based on high-order spatial cumulants is computationally demanding, particularly as the number of dimensions considered increases. This is the area where this manuscript contributes by assessing the contribution of various terms of Legendre polynomials so as to reduce computational needs without loss of important information as well as providing a full threedimensional algorithm for the practical use of the method in various applications. In the following sections, firstly, the sequential simulation method using high-order spatial conditional cumulants is reviewed and the related general algorithm (HOSIM) introduced; the computational complexity and sensitivities of this algorithm are also presented and explored in some detail. Then, the specifics input parameters for HOSIM are described. Subsequently, applications show how HOSIM performs using different options, and conclusions follow.
x4 x0 x3
The proposed HOSIM algorithm (Mustapha and Dimitrakopoulos, 2010b) proceeds in two steps: (1) cumulants calculation and (2) sequential simulation. HOSIM first calculates cumulants of different order using template data obtained from scanning a training image (TI) and available data. Next, the grid points are simulated sequentially using a non-parametric method based on Legendre polynomials with coefficients calculated from the cumulant maps in (1). 2.1. High-order cumulants Cumulants are obtained by scanning TIs and available data as shown in Dimitrakopoulos et al. (2010). The algorithm used to accomplish this stage is based on HOSC algorithm developed in Mustapha and Dimitrakopoulos (2010a). Denote by x0 an unsampled location subject to simulation using a set of its closer neighbours as shown in Fig. 1. The spatial correlation between variable Z0 defined at x0 and two random
xi
x2
xn Ω
Fig. 1. An unknown value is at the location x0 and values at the locations x1, x2,y,xn in a neighbourhood of x0 are assumed to be known.
variables Zi, Zj defined, respectively, at neighbours xi, xj is expressed using a third-order cumulant cum(Z0, Zi, Zj). Assuming that Z0, Zi and Zj are zero-mean random variables, then cum(Z0, Zi, Zj), identified as bivariate distance function cum(h1,h2), is calculated as cumðh1 , h2 Þ ¼
1 Nh1 ,h2
Nh1 ,h2
X
Zðxk ÞZðxk þ h1 ÞZðxk þ h2 Þ,
k¼1
fxk ; xk þ h1 ; xk þ h2 g A T3h1 ,h2 ,
ð1Þ
where T3h1 ,h2 is the associated spatial template of order 3 and is defined (considering a spatial location x as a reference) as T hn1þ,h12 ðh1 ,h2 ,a1 ,a2 Þ ¼ fðx,x þ h1 ,x þ h2 Þsuch that the points fx,x þ hi , i ¼ 1,2g are a set of the original points distributiong
ð2Þ
In Eq. (2), distances h1 and h2 are directed along two directions ! f d i , i ¼ 1,2g that are supported by the direction angles {a1, a2}. Finally, the elements in T3h1 ,h2 are searched from the TI and available data. The fourth-order cumulant, expressed as a trivariate distance function cum(h1,h2,h3), is calculated using template T4h1 ,h2 ,h3 as cumðh1 ,h2 ,h3 Þ ¼
Nh1 ,h2 ,h3
X
1 Nh1 ,h2 ,h3 1
Zðxk ÞZðxk þh1 ÞZðxk þ h2 ÞZðxk þ h3 Þ
k¼1
20
k¼1
1
4@
Nh1 ,h2 ,h3
X
1
Zðxk ÞZðxk þ h1 ÞA ðNh1 ,h2 ,h3 Þ2 k¼1 0 13 Nh1 ,h2 ,h3 X @ Zðxk þh2 ÞZðxk þ h3 ÞA5
2. The HOSIM algorithm and related computations aspects
x1
20
Nh1 ,h2 ,h3
X
1
4@ Zðxk ÞZðxk þ h2 ÞA ðNh1 ,h2 ,h3 Þ2 k¼1 0 13 Nh1 ,h2 ,h3 X @ Zðxk þh1 ÞZðxk þ h3 ÞA5
k¼1
1
20
Nh1 ,h2 ,h3
X
1
4@ Zðxk ÞZðxk þ h3 ÞA ðNh1 ,h2 ,h3 Þ2 k¼1 0 13 Nh1 ,h2 ,h3 X Zðxk þh1 ÞZðxk þ h2 ÞA5 @
k¼1
fxk ; xk þ h1 ; xk þh2 ; xk þh3 g A T4h1 ,h2 ,h3
ð3Þ
where Nh1,h2 and Nh1,h2,h3 are the number of elements of T3h1 ,h2 and T4h1 ,h2 ,h2 , respectively. Cumulants of order higher than four are
Please cite this article as: Mustapha, H., Dimitrakopoulos, R., HOSIM: A high-order stochastic simulation algorithm for generating threedimensional complex geological patterns. Computers and Geosciences (2010), doi:10.1016/j.cageo.2010.09.007
H. Mustapha, R. Dimitrakopoulos / Computers & Geosciences ] (]]]]) ]]]–]]]
similarly calculated. For more details, we refer to Appendix A and references therein. 2.2. Sequential simulation HOSIM utilizes the classic sequential simulation paradigm (Deutsch and Journel, 1998). At each node x along the random path visiting the simulation grid, a search template TEMP is used to extract the conditioning data event. The conditional probability density function is estimated using series of Legendre polynomials and coefficients expressed in terms of cumulants. Consider a stationary and ergodic random field Z(xi) or Zi, xiAO DRr (r ¼1, 2 or 3) for i ¼0,y, N, where N is the number of points in a discrete grid (DN) and a set of conditioning data dn ¼ {Z(xa), a ¼1,..., n}. Assume that x0 is the first node visited and its neighbours are found within a certain neighbourhood (Fig. 1). The HOSIM algorithm calculates the conditional probability density function (cpdf) fZ0 given dn using cumulants maps and Legendre polynomials as the following: fZ0 ðz0 =dn Þ ¼ R
1
D fZ ðxÞ dz0
¼R
fZ ðz0 , z1 ,. . ., zn Þ
1 1 1 X X X 1 ... Li0 ,i1 ,...,iN1 ,iN P i0 ðz0 Þ D fZ ðxÞ dz0 i ¼ 0 ¼0i ¼0 i 0
N1
ð4Þ
N
P m the normalized Legendre polynomials defined as Pm ðzÞ ¼ where pffiffiffi p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2m þ 1Pm ðzÞ= 2, Li0 ,...,iN1 , iN ¼ Li0 ,...,iN1 ,iN P i1 ðz1 Þ. . .P iN1 ðzN1 ÞP iN ðzN Þ and Li ,...,i ,i are defined in terms of cumulants as shown in 0 N1 N Mustapha and Dimitrakopoulos (2010b) and Appendix B. Here, we note that {z1,y, zn} are the values of the samples around x0.
and samples as presented in Mustapha and Dimitrakopoulos (2010b). The variation of the number of Legendre cumulants, Ncoeff, to be calculated with respect to: (1) the order of approximation o and (2) the number of neighbours considered to simulate a visiting node is studied in this section. The number of terms involved in Eq. (5) is equal to Ncoeff ¼
1 fZ ðz0 , z1 , . . ., zn Þ f Z0 , o ðz0 =dn Þ f ðxÞ dz0 Z D
¼R
iN2 iN1 o X X X 1 ... Li , i ,:::, i , i Pi0 ðz0 Þ 0 1 N1 N D fZ ðxÞ dz0 i ¼ 0 i ¼0i 0
N1
ð5Þ
N ¼ 0
where i k ¼ ik ik1 for koN. The coefficients Li ,...,i ,i are functions 1 N1 N of spatial cumulants, as explained earlier, and are inferred from TI
ð6Þ
2.3.2. Reducing computational complexity The Legendre polynomials are defined on ] 1,1[; then the moments (or cumulants) EðZ0i1 Z1i2 . . .Zkik Þ can be neglected for i0, i1,y,ik 4MAX, where MAX is a number higher than 1 because EðZ0i1 Z1i2 . . .Zkik Þ-0. Table 1 shows the variation of Ncoeff with respect to MAX for fixed o ¼3 and N ¼5. A way of reducing the number of terms consists of only calculating explicitly seven cumulants to estimate the conditional density in Eq. (5). For N ¼10, only 11 terms are needed. Then, at most (N + 2) terms are calculated for N number of samples used to simulate a value at x0. For N¼ 5, the seven coefficients to be calculated are (1)
cumðZ00 ,Z10 ,. . .,Zk0 Þ,
(2)
cumðZ01 Þ,
(3)
cumðZ01 , Z11 Þ, cumðZ01 ,Z11 ,Z21 ,Z31 Þ, cumðZ01 ,Z11 ,Z21 ,Z31 ,Z41 ,Z51 Þ.
(4)
cumðZ01 ,Z11 ,Z21 Þ,
(6)
cumðZ01 ,Z11 ,Z21 ,Z31 ,Z41 Þ,
(5) (7)
fZ0 ðz0 =dn Þ ¼ R
ðo þ 1Þðo þ2Þ. . .ðo þN þ 1Þ ðN þ1Þ!
For example, Ncoeff ¼28 for an order o ¼2 and a number of samples N ¼5; the variations of Ncoeff, respectively, with respect to o for a fixed N, and with respect to N for a fixed o are shown in Fig. 2. Reducing Ncoeff with an appropriate method is discussed next.
2.3. The complexity of the HOSIM algorithm 2.3.1. Current computational complexity Legendre series coefficients Li ,...,i ,i , usually called Legendre 0 N1 N cumulants, of order smaller than or equal to o are only used; then the density function fZ0 in Eq. (1) can be approximated as follows:
3
The coefficient (3) represents the covariance between Z0 and Z1; the covariance between Z0 and Z2 is implicitly included in the thirdorder cumulant, i.e. coefficient (4). Similarly, for the covariance between Z0 and Zi, i¼3–5. The third-order cumulant of Z0, Z1, Z2 is given by coefficient (4); however, the third-order cumulants of Z0 and any other two variables Zi, Zj, i,j¼1–5 are considered in the higher order cumulants. Then, coefficients (1)–(7) cover the correlations Table 1 HOSIM algorithm for 2D continuous variable simulations. MAX Ncoeff
0 1
1 7
2 28
3 84
Fig. 2. Variation of Ncoeff with respect to o (i.e. N ¼ 5 is fixed) in (1) and with respect to N (i.e. o ¼ 3 is fixed) in (2).
Please cite this article as: Mustapha, H., Dimitrakopoulos, R., HOSIM: A high-order stochastic simulation algorithm for generating threedimensional complex geological patterns. Computers and Geosciences (2010), doi:10.1016/j.cageo.2010.09.007
4
H. Mustapha, R. Dimitrakopoulos / Computers & Geosciences ] (]]]]) ]]]–]]]
between the unsampled and the samples and also between samples up to order six. The terms of the form cumðZ0n0 ,Z11 Þ will tend to zero if n0 is large (i.e. b1); however, if n0 is close to 1 than, cumðZ0n0 ,Z11 Þ can be approximated by coefficient (3), i.e. cumðZ01 ,Z11 Þ. Same approximation is used to calculate the other coefficient in the series. For example, cumðZ0n0 ,Z1n1 ,Z21 Þ may tend to zero or to cumðZ01 ,Z11 ,Z21 Þ, i.e. coefficient (4), depending on the values of n0 and n1. The set of (N+2) cumulants represents a basis for all spatial correlations up to order 6. The first two terms represent the integral of the density (cumulant of order zero) and the mean (cumulant of order one). The third term (cumulant of order two) is the covariance and represents a two-point spatial correlation between the unsampled location and one sample. The three-point spatial correlation between the unsampled location and two samples is represented by the fourth term, i.e. the cumulant of order three. Similarly, the terms five to seven represent the four-point to six-point spatial correlation between the unsampled location and, respectively, three and five samples. It is important to note that the number of coefficients and consequently the computational complexity will be reduced by the ratio Ncoeff/(N +2). For example for N ¼5, the variation of the ratio with respect to o is illustrated in Fig. 3. This figure shows that for o ¼10 and N¼ 5, the simplified version of the algorithm is about three orders of magnitude faster than the original version that uses the full set of cumulants. Finally, using expression (2), the main steps of the HOSIM algorithm are presented below: Step 1: Scan the training image and the sample data and store the spatial cumulants calculated in a global tree. Step 2: Define a random path visiting once all unsampled nodes. Step 3: Define the template shape T for each unsampled location x0 using its neighbours. The conditioning data available within T are then searched. The high-order spatial cumulants are read from the global tree in Step 1, and are used to calculate the coefficients of the Legendre series. These coefficient are used to build the cpdf of Z0 using Eq. (5). Step 4: Draw a uniform random value in [0,1] to read from the conditional distribution a simulated value, Z(x0), at x0. Step 5: Add x0 to the set of sample hard data and the previously simulated values. Step 6: Repeat Steps 3–5 for the next points in the random path defined in Step (2). Repeat Steps 2–6 to generate different realizations using different random paths. The random path defined in Step 2 concerns only the unsampled locations. Thus, the final realization obtained in after Step 6 honours the conditioning data.
3. HOSIM program HOSIM reads three input files: parameters file, grid file and data file; it generates one output file: simulations files, i.e. realizations generated. The source code is written in C++language, and HOSIM is organized by classes. HOSIM can be compiled using nmake command in Windows operating machine and contains three main classes: (1) class_initialization, (2) class_cumulants_map and (3) class_hosim. They are described by the following. 3.1. class_initialization In this class, the parameters required for the program are provided by reading the input files. The input files are shown in Tables 2–4: 3.1.1. Parameters file description (Table 2)
Line 2: the problem dimension Dim. Dim ¼k for k-dimensional
problem, k¼1, 2, or 3. Line 4: number of realizations to be generated. Line 6: random number seed. Line 8: grid file, i.e. TI, name. Line 10: data file name. Line 12: output file name. Line 14: debugging file name.
Table 2 A parameter file example of HOSIM program. # Dimension of the problem: 1 ¼1D, 2 ¼2D and 3¼ 3D 3 # Number of realizations to be generated 5 #Random number seed 90054 #File with training image File_name.grid #File with data File_name.data #File for realization (S) File_name.out # File for debugging output File_name.dbg # Coordinate system of the TI 100 0 1 # nx, xmn, xsiz 50 0 1 # ny, ymn, ysiz 30 0 1 # nz, zmn, zsiz # Template search dimensions 10 5 5 # Minimal & maximal number of neighboors 4 5 # Order of approximation of the series and max 3 3
-Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Table 3 A grid (TI) file example of HOSIM program. # Number of grid nodes in the TI 150,000 # For each grid node: ind, x, y, z and value 1 2 3 4 5 ^ ^
Fig. 3. Variation of the ratio Ncoeff/(N +2) with respect to the order of the series o for a fixed N¼ 5.
150,000
-Line 1 -Line 2 -Line 3 1 2 3 4 5
100
1 1 1 1 1 ^
1 1 1 1 1 ^
4.06 2.95 4.34 5.48 5.64 ^
50 30 2.34
-Line -Line -Line -Line -Line -Line -Line -Line -Line
4 5 6 7 8 9 10 11 12
Please cite this article as: Mustapha, H., Dimitrakopoulos, R., HOSIM: A high-order stochastic simulation algorithm for generating threedimensional complex geological patterns. Computers and Geosciences (2010), doi:10.1016/j.cageo.2010.09.007
H. Mustapha, R. Dimitrakopoulos / Computers & Geosciences ] (]]]]) ]]]–]]]
5
The coordinate system of the TI is defined by specifying the center coordinates of the first, i.e. the lower left point (xmn, ymn, zmn), the number of grid nodes (nx, ny, nz), and the spacing of the nodes (xsiz, ysiz, zsiz).
3.1.2. Grid file description (Table 3) The grid file contains the Training Image (TI) which is assumed to be a 2D or 3D rectangle with values defined at a regular grid as shown in Fig. 4(1). This file is described as
Line 16: (nx, xmn, xsiz) the grid, i.e. TI, definition in the
Line 2: total number of nodes (NN) used in the TI geometry
x-direction. Line 17: (ny, ymn, ysiz) the grid definition in the y-direction. Line 18: (nz, zmn, zsiz) the grid definition in the z-direction. Line 20: template search dimensions. In the current version of HOSM, the template is a 2D and/or 3D rectangle. Line 22: minimal and maximal number of neighbours to simulate a visiting node. Line 24: order of approximation o of Legendre series, and the term MAX defined in Section 2.3.
p(p ¼1,y,NN), we provide its coordinates Xp, Yp, Zp and, V[p], the value at p. For 2D problem Zp is zero.
3.1.3. Data file description (Table 4) The data file contains the samples (Fig. 4(2)) information. It is described as follows:
Line 2: total number of samples (NS). Lines 4–12: samples information. For each sample s
Table 4 A data file example of HOSIM program. # Number of samples in the data file 9 # For each sample: ind, x, y, z and value 1 10 12 2 14 16 3 8 6 4 22 33 5 55 5 6 87 7 7 75 41 8 62 15 9 33 9
description.
Lines 7-end of file: nodes information. For each node
-Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line -Line
3 25 17 2 12 3 23 15 7
1 2 3 4 5 6 7 8 9 10 11 12
(p ¼1,y,NS), we provide its coordinates Xs, Ys, Zs and, V[s], the value at p. For 2D problem Zs is zero. 3.2. class_cumulants_map Given the order of the series and the template size, this class calculates all cumulants required in expression (2) as detailed in Section 2.1. The cumulants are calculated using TIs and data. Global maps are used to store the cumulant coefficients. The algorithm used in this part is inspired from HOSC algorithm developed by Mustapha and Dimitrakopoulos (2010a). 3.3. class_hosim
Training image
Samples
This class is the main part of the program. It uses the cumulants maps calculated in the previous class and the expression (2) to calculate the conditional PDF at each visited node.
4. Numerical results
Fig. 4. (1) Training image; hard data locations in (2).
A three-dimensional image of channels of different sizes (Fig. 5(1)) is considered and it is used here to provide an insight to simulations and interpretations of HOSIM in three dimensions. This image shows the porosity variation in a 100 130 30 m3 field. The data set used (syn_poro.out) are available in the Stanford V Reservoir Data Set (Mao and Journel, 1999). Fig. 5(1) shows the exhaustive image to be simulated from the sample data set (DS) in Fig. 5(2). DS is combined with the training image (Mustapha and Dimitrakopoulos, 2010b) to infer the high-order spatial cumulants
Fig. 5. Simulation of a 3D fluvial reservoir: (1) Exhaustive image: true image (390,000 points) and (2) DS: 500 sample data ( E0.13% of the total number of points).
Please cite this article as: Mustapha, H., Dimitrakopoulos, R., HOSIM: A high-order stochastic simulation algorithm for generating threedimensional complex geological patterns. Computers and Geosciences (2010), doi:10.1016/j.cageo.2010.09.007
6
H. Mustapha, R. Dimitrakopoulos / Computers & Geosciences ] (]]]]) ]]]–]]]
that are needed for the estismation of the local pdfs. Fig. 5(1) is used as a TI in this example. Sensitivity analysis of HOSIM to the TIs used, unconditional realizations and comparisons with other algorithms are shown in Mustapha and Dimitrakopoulos (2010b). The full set and truncated set of cumulants, discussed above, are both employed to generate five realizations generated using HOSIM where about 14 nearby data are selected in average for the simulation of any single node. The main features of our approach are illustrated through the following discussion. All runs are performed on a 3.2 GHz Intel(R) Xeon (TM) PC with 2 GB of RAM. The case of full set is compared to truncated set by simulating the exhaustive data set shown in Fig. 5(1) using the data set (DS) in Fig. 5(2). Realizations using the two sets of cumulants are generated with the same seed number as shown in Fig. 6. The figure shows similar results. Fig. 7 shows that histogram and variograms of the samples are well reproduced by both realizations. High-order statistics of the samples are also reproduced by HOSIM realizations as shown by third- and fourth-order cumulants in Figs. 8 and 9. Fig. 10 compares two cross-sections along z (Fig. 10—first column) of the exhaustive image to corresponding cross-sections (Fig. 10—second column, —third column) of two HOSIM realizations generated using the truncated set. The figure shows a good reproduction of the channels in the exhaustive image. The
statistics, i.e. histograms, variograms and high-order cumulants, of the realizations and samples are very close as shown in Figs. 11– 13. Finally, it is important to note that the results presented in this section are calculated with very small number of terms and are very good comparing to the results obtained by using the full set of cumulant terms. In addition, the algorithm is not sensitive to the MAX variable. In contrast, very appropriate results are obtained by working only with few cumulant terms.
5. Conclusions A 3D stochastic simulation method (HOSIM) has been presented and it is conceived for simulating complex geological patterns. The method developed uses high-order Legendre polynomials with coefficient calculated from cumulants population. The HOSIM algorithm is validated by simulating a three-dimensional domain of complex channels. In addition, the computational costs of HOSIM are discussed. A method to reduce HOSIM’s complexity is presented where only aulated; the other cumulants are inferred from the reference subset using the convergence property of the sequ the modified algorithm is several orders of magnitude faster than the original version of the algorithm that calculates explicitly all spatial
Fig. 6. A realization generated by HOSIM using DS by calculating (1) explicitly full set of terms and (2) only use truncated set of (N+ 2) terms discussed in Section 2.3.
Fig. 7. Histograms (1), variograms (2–4) of two HOSIM realizations using truncated and full set of cumulants, respectively. Circles refer to data set and the solid lines refer to realizations.
Please cite this article as: Mustapha, H., Dimitrakopoulos, R., HOSIM: A high-order stochastic simulation algorithm for generating threedimensional complex geological patterns. Computers and Geosciences (2010), doi:10.1016/j.cageo.2010.09.007
H. Mustapha, R. Dimitrakopoulos / Computers & Geosciences ] (]]]]) ]]]–]]]
Fig. 8. Third-order cumulant maps of DS (1,4,7) and two HOSIM realizations using truncated set of cumulants (2,5,8) and full set of cumulants (3,6,9), respectively.
Fig. 9. Fourth-order cumulant maps of DS (1) and two HOSIM realizations using truncated set of cumulants (2) and full set of cumulants (3), respectively.
Please cite this article as: Mustapha, H., Dimitrakopoulos, R., HOSIM: A high-order stochastic simulation algorithm for generating threedimensional complex geological patterns. Computers and Geosciences (2010), doi:10.1016/j.cageo.2010.09.007
7
8
H. Mustapha, R. Dimitrakopoulos / Computers & Geosciences ] (]]]]) ]]]–]]]
Fig. 10. (1a) and (1b) two cross-sections along z of exhaustive image and (2a, 2b) and (3a, 3b) corresponding cross-sections along z in two realizations generated by HOSIM using DS and truncated set of cumulants.
Fig. 11. Histograms (1), variograms (2–4) of five HOSIM realizations using truncated set of cumulants. Cercles refer to data set and the solid lines refer to realizations.
cumulants required. In addition, the modified algorithm provides results very similar to those obtained by explicitly calculating all the cumulants required. The HOSIM algorithm, with different options, generates accurate realizations which are in good agreement with the data set used. The realizations reproduce very well
the channels in the true image. Furthermore, data statistics are also reproduced by the realizations including histogram, variogram and high-order cumulants. It is important to note that the HOSIM algorithm is independent of the input info. The algorithm requires the estimation of high and
Please cite this article as: Mustapha, H., Dimitrakopoulos, R., HOSIM: A high-order stochastic simulation algorithm for generating threedimensional complex geological patterns. Computers and Geosciences (2010), doi:10.1016/j.cageo.2010.09.007
H. Mustapha, R. Dimitrakopoulos / Computers & Geosciences ] (]]]]) ]]]–]]]
9
Fig. 12. Third-order cumulant maps of two HOSIM realizations using truncated set of cumulants. (1,2) XY-cumulant, (3,4) XZ-cumulant and (5,6) YZ-cumulant.
Fig. 13. Fourth-order cumulant maps of two HOSIM realizations using truncated set of cumulants. (1) Realization 1: XYZ-cumulant and (2) Realization 2: XYZ-cumulant.
low order cumulants. Any sources of sources can be used to accomplish this, including multiple training images. Additional computational costs are strongly related to the number of terms to be calculated. If it is the same number of terms, no additional
computational costs are required; otherwise it would be related to the purpose of using multiple training images. Finally, the HOSIM algorithm can easily be integrated into any open source code including the SGeMS platform (Remy et al., 2009).
Please cite this article as: Mustapha, H., Dimitrakopoulos, R., HOSIM: A high-order stochastic simulation algorithm for generating threedimensional complex geological patterns. Computers and Geosciences (2010), doi:10.1016/j.cageo.2010.09.007
10
H. Mustapha, R. Dimitrakopoulos / Computers & Geosciences ] (]]]]) ]]]–]]]
Acknowledgements
The work in this paper was funded from NSERC CDR Grant 335696 and BHP Billiton, as well NSERC Discovery Grant 239019. Thanks are in order to Brian Baird, Peter Stone, and Gavin Yates of BHP Billiton, as well as BHP Billiton Diamonds and, in particular, Darren Dyck, for their support, collaboration, as well as technical comments.
Appendix A. Calculation of high-order spatial cumulants
1 N2
N X
N X
N X
N X
! Zðxk þ h1 Þj1 Zðxk þ h2 Þj2
N X
Zðxk þ h1 Þj1 Zðxk þ h3 Þj3
1 N2
N X
Zðxk þ h1 Þj1 Zðxk þ h4 Þj4
N X
N X
!# Zðxk Þj0 Zðxk þ h2 Þj2 Zðxk þ h4 Þj4
N X
!# Zðxk Þj0 Zðxk þ h2 Þj2 Zðxk þ h3 Þj3
k¼1
! Zðxk þ h2 Þj2 Zðxk þ h3 Þj3
N X
!# Zðxk Þj0 Zðxk þ h1 Þj1 Zðxk þ h4 Þj4
k¼1
! Xðxk þ h2 Þj2 Xðxk þh4 Þj4
k¼1
"
!# Zðxk Þj0 Zðxk þ h3 Þj3 Zðxk þ h4 Þj4
k¼1
!
k¼1
"
N X k¼1
!
k¼1
"
!# Zðxk þ h1 Þj1 Zðxk þ h2 Þj2 Zðxk þ h3 Þj3
k¼1
k¼1
"
1 N2
1 N2
! Zðxk Þj0 Zðxk þ h4 Þj4
k¼1
"
1 2 N
The translation of high-order moments to high-order cumulants, and vice versa, can be obtained recursively (Dimitrakopoulos et al., 2010; Mustapha and Dimitrakopoulos, 2010a, 2010b) as ! ! ! i1 in1 X in1 X X in 1 i1 in1 i1 in EðZ1 :::Zn Þ ¼ mi1 :::in ¼ ... ... j1 jn1 jn
N X k¼1
"
1 2 N
j1 ¼ 0
"
1 N2
!#
N X
Xðxk Þj0 Xðxk þ h1 Þj1 Xðxk þ h3 Þj3
k¼1
! Xðxk þ h3 Þj3 Xðxk þh4 Þj4
k¼1
N X
!# Xðxk Þj0 Xðxk þ h1 Þj1 Xðxk þh2 Þj2
,
k¼1
ðA:5Þ
j1 ¼ 0 j1 ¼ 0
ci1 j1 ,...,in1 jn1 ,in jn mj1 ,...,jn1 ,jn
ðA:1Þ
where N is the number of elements in the set T h1 ,h2 ,h3 ,h4 defined by Eq. (2).
and CumðZ1i1 . . .Znin Þ ¼ ci1 ...in
¼
i1 X
...
j1 ¼ 0
in1 iX n 1 X j1 ¼ 0 j1 ¼ 0
i1
!
j1
mi1 j1 ,...,in1 jn1 ,in jn cj1 ,...,jn1 ,jn
...
in1 jn1
!
in 1
!
jn ðA:2Þ
Assuming Z(x) is a zero-mean ergodic stationary random field indexed in Rn, then the rth-order moment of the random field is defined as E(Z(x)Z(x+ h1)yZ(x +hr 1)). The moments depend only on h1,y, hr 1. Similarly, the rth-order cumulant can be denoted as ci1 , ...,in ðh1 ,. . ., hr1 Þ, where r ¼ i1 þ þ in . For example, the secondorder cumulants of a non-centered random function Z(x), known as the covariance, is given using (A.2) by c1, 1 ðhÞ ¼ EðZðxÞZðx þ h1 ÞÞEðZðxÞÞ2 Its third-order cumulant is given by c1,
ðA:3Þ
1, 1 ðh1 , h2 Þ ¼ EðZðxÞZðxþ h1 ÞZðxþ h2 ÞÞ EðZðxÞÞEðZðx þ h1 ÞZðx þ h2 ÞÞ EðZðxÞÞEðZðx þ h1 ÞZðx þ h3 ÞÞ
EðZðxÞÞEðZðx þ h2 ÞZðx þ h3 ÞÞ þ 2EðZðxÞÞ3 ,
1 N2 1 N2
"
N X
!
Zðxk Þj0 Zðxk þ h1 Þj1
k¼1
"
N X
ðA:4Þ
N X k¼1
!#
Zðxk þ h2 Þj2 Zðxk þh3 Þj3 Zðxk þ h4 Þj4
k¼1
! Zðxk Þj0 Zðxk þ h2 Þj2
k¼1
"
N X
N X
Zðxk Þj0 Zðxk þ h3 Þj3
N X k¼1
Lm
Pm ðzÞ 99Pm 99
ðB:1Þ
where Pm(z) is the mth-order Legendre polynomials (Fig. 14), with norm 99Pm99, defined as (Lebedev, 1965; Spiegel, 1968) m m 2 X 1 d ðz 1Þm ¼ ai,m zi , and 1 r z r1: Pm ðzÞ ¼ m 2 m! dz i¼0 ðB:2Þ The Legendre polynomials Pm(z) obey the following recursive relation: Pm þ 1 ðzÞ ¼
2m þ 1 m xPm ðzÞ Pm1 ðzÞ, m þ1 mþ1
ðB:3Þ
where P0(z)¼1, P1(z) ¼z, and mZ1. The set of Legendre polynomials {Pm(z)}m forms a complete orthogonal basis set on the interval [ 1, 1]. The orthogonality property is defined as 8 ma n Z < 0, 2 Pm ðzÞPn ðzÞ dx ¼ ðB:4Þ , m¼n : D 2m þ 1
!# Zðxk þ h1 Þj1 Zðxk þh3 Þj3 Zðxk þ h4 Þj4
The discrete Legendre polynomials also satisfy
k¼1
!
1 X m¼0
cumðZ0j0 ,Z1j1 ,Z2j2 ,Z3j3 Þ N 1X ¼ Zðxk Þj0 Zðxk þh1 Þj1 Zðxk þh2 Þj2 Zðxk þ h3 Þj3 Zðxk þ h4 Þj4 Nk¼1 1 N2
The determination of a joint PDF, given its cumulants up to order n, is a well known problem, i.e. the cumulants problem (Kendall and Stuart, 1977), has been studied extensively from a theoretical point of view. Examples of solving this type of problem can be found in Edgeworth (1905) and edev, 1965; Welling, 1999; Gaztanaga et al., 2000). The approximation used here is based on Legendre series (Lebedev, 1965; Liao and Pawlak, 1996; Yap and Paramesran, 2005; Hosny, 2007) with coefficients calculated in terms of high-order spatial cumulants. The method is first reviewed for the univariate case; subsequently, the approximation developed for the general multivariate case is introduced. A 1D squared integrable and real piecewise smooth function f defined on D ¼ [ 1, 1] can be formally written in a series of Legendre polynomials f ðzÞ ¼
where h3 is along the difference between the vectors supporting h1 and h2. The cumulants are invariant to additive constants; thus, if a given process Z(x) is not zero-mean, its cumulants can be computed as the cumulants of Z(x) E(Z(x)) (Nikias and Petropulu, 1993). It can be computationally convenient to consider zero-mean random functions as some of the terms vanish. Then, the following expression is used by Mustapha and Dimitrakopoulos (2010) to calculate the (j0 + j1 +j2 +j3)th-order cumulant
Appendix B. Legendre series approximation
!# Zðxk þ h1 Þj1 Zðxk þh2 Þj2 Zðxk þ h4 Þj4
k X i¼1
Pm ðzi ÞPn ðzi Þ Dz ¼
2 dmn , 2m þ 1
8 m, n Z0
ðB:5Þ
Please cite this article as: Mustapha, H., Dimitrakopoulos, R., HOSIM: A high-order stochastic simulation algorithm for generating threedimensional complex geological patterns. Computers and Geosciences (2010), doi:10.1016/j.cageo.2010.09.007
H. Mustapha, R. Dimitrakopoulos / Computers & Geosciences ] (]]]]) ]]]–]]]
1 P0 P1 P2 P3 P4
0.5
pi
0
-0.5
-1 -1
-0.5
0 X axis
0.5
1
Fig. 14. Legendre polynomials up to order 4.
where Dz ¼zi zi 1 ¼ 2/k is a space step, k is the number of steps, {zi} is a uniform discretisation of [1,1], and dmn is the delta Dirac function. To avoid numerical instability in polynomial computation, we normalized the Legendre polynomials by utilizing the square norm. The set of normalized Legendre polynomials is defined as rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2m þ 1 Pm ðzÞ: P m ðzÞ ¼ 2 In this case, the orthogonality condition becomes k X
P m ðzi ÞP n ðzi ÞDz ¼ dmn ,
8m, n Z 0
ðB:6Þ
i¼1
The coefficients Lm in Eq. (B.1) of the Legendre series, the socalled Legendre cumulants, can be determined using the orthogonality property in (B.6) as Z Lm ¼ P m ðzÞf ðzÞ dz ¼ gm ðci Þ, i ¼ 0,. . ., m and m ¼ 0, 1, 2,. . . ðB:7Þ D
where ci is the ith-order cumulant of f. The expression of the righthand side function gm and other details about cumulants are given in Appendix A. Theoretically, the series (B.1), with coefficients Lm calculated from (B.7), converges to f(z) at every continuity point of f(z) as demonstrated by Lebedev (1965). Finally, if only cumulants of order smaller than or equal to o are given, then the function f(z) in Eq. (B.1) can be approximated as follows: f ðzÞ f~ o ðzÞ ¼
o X
Lm P m ðzÞ
ðB:8Þ
m¼0
References Arpat, B., Caers, J., 2007. Stochastic simulation with patterns. Mathematical Geosciences 39, 177–203. Bernardeau, F., Colombi, S., Gaztanaga, E., Scoccimarro, R., 2002. Large-scale structure of the universe and cosmological perturbation theory. Physics Reports 367 (1), 1–248. Billinger, D.R., Rosenblatt, M., 1966. Asymptotic theory of kth-order spectra. In: Harris, B. (Ed.), Spectral Analysis of Time Series. John Wiley, New York, pp. 153–188. Boucher, A., 2009. Considering complex training images with search tree partitioning. Computers & Geosciences 35, 1151–1158. Chatterjee, S., Dimitrakopoulos, R., Mustapha, H., in press. Fast wavelet based conditional simulation with training images. Mathematical Geosciences.
11
Chile s, J.P., Delfiner, P., 1999. Geostatistics—Modeling Spatial Uncertainty. John Wiley & Sons, New York 720 pp. Chiginova, T., Hu, L.Y., 2008. Multiple-point simulations constrained by continuous auxiliary data. Mathematical Geosciences 40, 133–146. Daly, C., 2004. Higher order models using entropy, Markov random fields and sequential simulation. In: Leuangthong, O., Deutsch, C.V. (Eds.), Geostatistics. Banff, Springer, Dordrecht, pp. 215–224. De Vries, L.M., Carrera, J., Falivene, O., Grataco´s, O., Slooten, L.J., 2008. Application of multiple point geostatistics to non-stationary images. Mathematical Geosciences 41, 29–42, doi:10.1007/s11004-008-9188-y. Deutsch, C.V., 2002. Geostatistical Reservoir Modelling. Oxford University Press, Oxford, 384 pp. Deutsch, C.V., Journel A.G., 1998, GSLIB geostatistical software library and user’s guide. Oxford University Press, New York. Delopoulos, A., Giannakis, G.B., 1996. Cumulant based identification of noisy closed loop systems. International Journal of Adaptive Control and Signal Processing 10, 303–317. Dimitrakopoulos, R., Mustapha, H., Gloaguen, E., 2010. High-order statistics of spatial random fields: exploring spatial cumulants for modelling complex, non-Gaussian and non-linear phenomena. Mathematical Geosciences 42 (1), 65–99. Dimitrakopoulos, R., Luo, X., 2004. Generalized sequential Gaussian simulation. Mathematical Geosciences 36, 567–591. Edgeworth, F.Y., 1905. The law of error. Transactions of Cambridge Society 20, 113–141. Gaztanaga, E.P., Fosalba, P., Elizalde, E., 2000. Gravitational evolution of the large-scale probability density distribution. The Astrophysical Journal 539, 522–531. Goovaerts, P., 1998. Geostatistics for Natural Resources Evaluation. Oxford University Press, New York, 496 pp. Gloaguen, E., Dimitrakopoulos, R., 2009. Two-dimensional conditional simulations based on the wavelet decomposition of training images. Mathematical Geosciences 41, 679–701. Hosny, K.M., 2007. Exact Legendre moment computation for gray level images. Pattern Recognition 40, 3597–3605. Journel, A.G., 1994. Modelling uncertainty: some conceptual thoughts. In: Dimitrakopoulos, R. (Ed.), Geostatistics for the Next Century. Kluwer Academic Publishers, Dordrecht, Holland, pp. 30–43. Journel, A.G., Alabert, F., 1989. Non-Gaussian data expansion in the earth sciences. Terra Nova 1, 123–134. Kendall, M.G., Stuart, A., 1977. The Advanced Theory of Statistics, fourth ed. Macmillan, New York, 700 pp. Lebedev, N.N., 1965. Special Functions and their Applications. Prentice-Hall Inc., New York, 308 pp. Liao, S.X., Pawlak, M., 1996. On image analysis by moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 18, 254–266. Mao, S., Journel, A.G., 1999. Generation of a reference petrophysical and seismic 3D data set: the Stanford V reservoir. Report-Stanford Center for Reservoir Forecasting Annual Meeting, Stanford, CA, USA. URL: /http://ekofisk.stanford. edu/SCRF.htmlSS. Mehrdad, H., Caers, J., 2010. Stochastic simulation of patterns using distance-based pattern modeling. Mathematical Geosciences 42, 487–517. Mustapha, H., Dimitrakopoulos, R., 2010a. A new approach for geological pattern recognition using high-order spatial cumulants. Computers & Geosciences 36, 313–334. Mustapha, H., Dimitrakopoulos, R., 2010b. High-order stochastic simulation of complex spatially distributed natural phenomena. Mathematical Geosciences 42, 457–485. Nikias, C.L., Petropulu, A.P., 1993. Higher-order Spectra Analysis: A Nonlinear Signal Processing Framework. Prentice-Hall PTR, Upper Saddle River, NJ 538 pp. Remy, N., Boucher, A., Wu, J., 2009. Applied Geostatistics with SGeMS: A User’s Guide. Cambridge University Press, New York, 284 pp. Rosenblatt, M., 1985. Stationary Sequences and Random Fields. Birkhauser, Boston, Stuttgart, 258 pp. Scheidt, C., Caers, J., 2009. Representing spatial uncertainty using distances and kernels. Mathematical Geosciences 41, 397–419. Spiegel, M.R., 1968. Mathematical Handbook of Formulaes and Tables. McGraw-Hill Book Co., New York, 278 pp. Strebelle, S., 2002. Conditional simulation of complex geological structures using multiple-point statistics. Mathematical Geology 34, 1–21. Tjelmeland, H., 1998. Markov random fields with higher order interactions. Scandinavian Journal of Statistics 25, 415–433. Tjelmeland, H., Eidsvik, J., 2004. Directional Metropolis: Hastings updates for conditionals with nonlinear likelihoods. In: Leuangthong, O., Deutsch, C.V. (Eds.), Geostatistics Banff 2004. Springer, Netherlands, pp. 95–104. Vargas-Guzma´n, J.A., 2008. Unbiased resource evaluations with kriging and stochastic models of heterogeneous rock properties. Natural Resources Research 17, 245–254. Vargas-Guzma´n, J.A., 2009. Unbiased estimation of intrinsic permeability with cumulants beyond the lognormal assumption. SPE Journal 14, 805–810. Welling, M., 1999. Robust series expansions for probability density estimation. Report-California Institute of Technology, Computational Vision Lab, p. 22.
Please cite this article as: Mustapha, H., Dimitrakopoulos, R., HOSIM: A high-order stochastic simulation algorithm for generating threedimensional complex geological patterns. Computers and Geosciences (2010), doi:10.1016/j.cageo.2010.09.007
12
H. Mustapha, R. Dimitrakopoulos / Computers & Geosciences ] (]]]]) ]]]–]]]
Wu, J., Boucher, A., Zhang, T., 2008. SGeMS code for pattern simulation of continuous and categorical variables: FILTERSIM. Computers & Geosciences 34, 1863–1876. Yap, P.T., Paramesran, R., 2005. An efficient method for the computation of Legendre moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1996–2002.
Zhang, F., 2005. A high order cumulants based multivariate nonlinear blind source separation method source. Machine Learning Journal 61, 105–127. Zhang, T., Switzer, P., Journel, A.G., 2006. Filter-based classification of training image patterns for spatial simulation. Mathematical Geology 38, 63–80.
Please cite this article as: Mustapha, H., Dimitrakopoulos, R., HOSIM: A high-order stochastic simulation algorithm for generating threedimensional complex geological patterns. Computers and Geosciences (2010), doi:10.1016/j.cageo.2010.09.007