Data within this "fall-off" axe not representative of the distribution and should be excluded from the ... more small values than large values in a power-law.
TECTONOPHYSICS ELSEVIER
Tectonophysics 248 (1995) 1-20
Sampling power-law distributions G. Pickering *, J.M. Bull, D.J. Sanderson Department of Geology, University of Southampton, Highfield, Southampton, S017 1BJ, UK
Received 30 August 1994; accepted 10 March 1995
Abstract
Power-law distributions describe many phenomena related to rock fracture. Data collected to measure the parameters of such distributions only represent samples from some underlying population. Without proper consideration of the scale and size limitations of such data, estimates of the population parameters, particularly the exponent D, are likely to be biased. A Monte Carlo simulation of the sampling and analysis process has been made, to test the accuracy of the most common methods of analysis and to quantify the confidence interval for D. The cumulative graph is almost always biased by the scale limitations of the data and can appear non-linear, even when the sample is ideally power law. An iterative correction procedure is outlined which is generally successful in giving unbiased estimates of D. A standard discrete frequency graph has been found to be highly inaccurate, and its use is not recommended. The methods normally used for earthquake magnitudes, such as a discrete frequency graph of logs of values and various maximum likelihood formulations can be used for other types of data, and with care accurate results are possible. Empirical equations are given for the confidence limits on estimates of D, as a function of sample size, the scale range of the data and the method of analysis used. The predictions of the simulations are found to match the results from real sample D-value distributions. The application of the analysis techniques is illustrated with data examples from earthquake and fault population studies.
1. Introduction
Many objects that occur naturally over different scales show a power-law distribution of relative abundance (Schroeder, 1991). This is particularly true for phenomena related to rock fracture such as earthquakes (Gutenberg and Richter, 1954), fault displacements (Kakimi, 1980), fault and fracture trace lengths (Heifer and Bevan,
* Corresponding author
1990) and fracture apertures (Barton and Zoback, 1992). In all of these papers the data represent samples from some underlying distribution. It is the parameters describing this distribution that are required to help our theoretical understanding of the processes involved and enable predictions from such theory. Therefore, it is crucial to know whether the analysis of samples can give us unbiased estimates of the distribution parameters, and how precise these unbiased estimates are. This paper gives a detailed description of the common sample biases and how these affect the analysis of samples from power-law distributions. Using Monte Carlo simulation of the sampling
0040-1951/95/$09.50 © 1995 Elsevier Science B.V. All rights reserved SSDI 0040-1951(95)00030-5
2
G. Pickering et aL / Tectonophysics 248 (1995) 1-20
process, the parameters derived from samples can be tested for bias and the random error quantified. With real data there is also the problem of testing whether the power-law distribution is an appropriate model. This question is addressed and the results of the simulation are illustrated using examples from earthquake and fault population studies.
2. Power-law distributions
There are three commonly used definitions of a power-law distribution. In many fault population studies the cumulative distribution Junction is used e.g., Childs et al. (1990), Walsh et al. (1991) and Jackson and Sanderson (1992):
N = a l u-b,
(1)
where u is a measure of size and N is the cumulative number of values > u, a~ is a constant and b I is the exponent (Fig. la). An alternative is to define the model in terms of the discrete frequency distribution, e.g., Kakimi (1980), Heifer and Bevan (1990) and Barton and Zoback (1992): n = a2 u - b 2
(2)
where n is the frequency of values in an interval u + 3u (Fig. lb). A third definition is to use the
discrete frequency of log u: log ntg = a 3 - b31og u
(3)
where nlg is the frequency of values in an interval log u _+ 8(log u) (Fig. lc). This is most common in earthquake studies, and is probably better known as the G u t e n b e r g - R i c h t e r relationship (Gutenberg and Richter, 1954), where earthquake magnitude, M is substituted for log u. Note that this is not equivalent to the logarithmic of (2) as the definition of n in the two equations is different. Any attempt to fit data to the model given by (2) must use a finite interval size 8u. Definition (2) is an idealised form where 6u --* 0. The function for n can be derived from N assuming a
finite interval 8u and using the binomial expansion, neglecting all second-order terms and above:
n=~N=N>_,-N~,,+a,,~=u
b _ ( u +Su) -~
(4) by assuming 8u 200, and the maximum likelihood method for all sample sizes >_ 100. In the following section any method which gives a bias of more than 5% for most data, will bc rejected as too inaccurate. From a synthesis of all the results of the simulation a simple, if approximate, relationship was found linking the standard deviation of the sample D-value distribution with sample size and population D-value, i.e:
D >_ 1"
1 size cr= k D V/ sample
(19a)
D,
_~__~L_*
=i
i!lli
i il
~r
~
95
--~
__
i
I
'i
68
i ¸'
i
ii
i
~ ~
!MII
!i!
,I
(b)
(a)
so ®
.-> 3 2 i
J
, ,
E
1
5
E (9 T h r o w (s)
0.001
1000
1
1
1000 ---~
i
T h r o w (s)
0.001
..........
~
i ':
' '
~ ~ 1 D !
(c)
!i
il
:°°2
=,
E
= 1.07
~ k
]
JO E
~--~ :
! i! "~\
i
i
~[
j i i
i I
*i
: !
~ I
r N
ilij
j i ~',I I il2!il
: ~i
0 1
1
1
0.001
T h r o w (s)
1
'
0.001
:i,,
, ~,ii:!
II'
i!i
i :l
T h r o w (s)
' i
1
Fig. 14. G r a p h s of fault throw data derived from UKCS Q u a d 53. The m e a s u r e m e n t s were made at the Base Zechstein horizon in the Permian, picked on migrated sections of good quality. The sections were all dip lines giving a 1D sample. (a) Uncorrected cumulative frequency. (b) Log-normal probability plot (see Fig. 10b). (c) Corrected cumulative frequency graph. (d) Log-interval graph.
G. Pickering et aL / Tectonophysics 248 (1995) 1-20
tive graph (Fig. 14c), the log-interval graph (Fig. 14d) and Page's maximum likelihood method are shown in Table 5. The narrow scale range reduces the chance of bias, but leads to an increase in uncertainty, reflected in the wide confidence interval.
8. Conclusions (1) The three commonly used definitions of the power-law distribution are compatible but not equivalent. The cumulative frequency definition is preferred as it is consistent with the two alternative definitions and embodies the hierarchial nature of the power law. (2) All data held to be power-law distributed represent samples from some underlying population. As these samples often cover a narrower scale range than that of the population as a whole they are truncated, and any analysis must recognise and allow for this. (3) In some cases the samples may be censored, have a truncation "fall-off" or contain other additional sources of bias, that must be resolved before the data are analysed. (4)Truncated data can be analysed by several methods: (a) cumulative frequency graphs; (b) discrete frequency graphs; (c) log-interval discrete frequency graph; and (d) maximum likelihood formulations. All of the methods are in some degree affected by the truncation of the data. The cumulative graph often gives biased D-values and appears to show deviations from a power-law distribution solely due to the scale range limitations of the data. This "finite-range" effect can be corrected and a procedure is outlined which is generally effective. The discrete frequency graph and log-interval graph are not affected by this bias and can be used as long as those intervals which straddle the truncation are ignored. Of the two commonly used maximum likelihood formulations, Utsu's approximate form is only suitable for data with a wide scale range ( > two orders of magnitude). Page's formulation, which was developed for truncated data, can be applied in all cases. (5) These methods were tested by making a
19
Monte Carlo simulation of the sampling and analysis process. The results of the simulations gave an empirical relationship between the confidence intervals, sample size and D-value, with a constant term (k) controlled by the scale range of the sample and the method of analysis. (6) The corrected cumulative graph, log-interval graph and Page's maximum likelihood method gave minimal bias and similar values for k. The discrete graph was rejected as it produced biased results with values of k two or three times those given by the other methods. (7) Sub-samples from large earthquake magnitude and fault throw data sets showed sample D-value distributions equivalent to the simulated distributions, giving added confidence in the resuits of the simulations. (8) The proper consideration of sampling effects and selection of analysis methods can give consistent results, as illustrated by analysing several published and unpublished data sets.
Acknowledgements The authors wish to thank David Peacock for access to the fault data from St. Bees Head and Flamborough Head, and his help with the Harvard CMT catalogue. The authors also wish to thank Seismograph Service Ltd. and SHELF for permission to use the SSL-MF89 survey. Constructive reviews by Patience Cowie, Michael Gross and Chris Scholz helped to improve the final version. This research is funded by a studentship from Mobil North Sea Ltd.
References Aki, K., 1965. Maximum likelihood estimate of b in the formula log N = a - brn and its confidence limits. Bull. Earthquake Res. Inst., Tokyo Univ., 43: 237-239. Barton, C.A. and Zoback, M.D., 1992. Self similar distribution and properties of macroscopic fractures at depth in crystalline rock in the Cajon Pass Scientific borehole. J. Geophys. Res., 97: 5181-5200. Bath, M., 1979. Earthquakes in Sweden 1951-1976. Swed. Geol. Surv. NTIS, C 750. Bath, M., 1981. Earthquake magnitude --recent research and current trends. Earth-Sci. Rev., 17: 315-398.
20
G. Pickering et al. / Tectonophysics 248 (1995) 1-20
Bender, B., 1983. Maximum likelihood estimation of b values for magnitude grouped data. Bull. Seismol. Soc. Am., 73: 831-85 i. Childs, C., Walsh, J.J. and Watterson, J., 1990. A method for estimation of the density of fault displacements below the limits of seismic resolution in reservoir formations. In: A.T. Buller (Editor), North Sea Oil and Gas Reservoirs II. Graham and Trotman, London, pp. 309-318. Cruden, D.M., 1977. Describing the size of discontinuities. Int. J. Rock Mech. Min. Sci. Geomech. Abstr., 14: 133-137. Einstein, H.H. and Baecher, G.B., 1983. Probabilistic and statistical methods in engineering geology, specific methods and examples Part I: Exploration. Rock Mech. Rock Eng., 16: 39-72. Frochlich, C. and Davis, S.D., 1993. Teleseismic b values: or much ado about 1.0. J. Geophys. Res., 98: 631-644. Gutenberg, B. and Richter, C.F., 1954. Seismicity of the Earth and Associated Phenomena (2nd ed.). Princetown Univ. Press, Princetown, NJ. Heifer, K. and Bevan, T., 1990. Scaling relationships in natural fractures-data, theory and applications. Proc. Eur. Petrol. Conf., 2:367-376 (SPE Pap. 20981). Jackson, P. and Sanderson, D.J., 1992. Scaling of fault displacements from the Badajoz Cordoba shear zone, SW Spain. Tectonophysics, 210: 179-190. Kakimi, T., 1980. Magnitude frequency relation for displacement of minor faults and its significance in crustal deformation. Bull. Geol. Soc. Jpn., 31: 467-487. Laslett, G.M., 1982. Censoring and edge effects in areal and line transect sampling of rock joint traces. Math. Geol.. 14: 125-140. Marrett, R. and Allmendinger, R.W., 1991. Estimates of strain due to brittle faulting: sampling of fault populations. J. Struct. Geol., 13: 735-738. Marrett, R. and Allmendinger, R.W., 1992. Amount of extension on "small" faults: An example from the Viking graben. Geology, 20: 47-50. Nelson, W. and Hahn, G.J., 1972. Linear estimation of a regression relationship from censored data --Part I. Simple methods and their application. Technometrics, 14: 247-269. Nelson, W. and Hahn, G.J., 1973. Linear estimation of a regression relationship from censored data --Part I1. Best linear unbiased estimation and theory. Technometrics, 15: 133-150. Pacheco, J.F., Scholz, C.H. and Sykes, L.R., 1992. Changes in
frequency-size relationship from small to large earthquakes. Nature, 355: 71-73. Page, R., 1968. Aftershocks and microaftershocks of the Great Alaska earthquake of 1964. Bull. Seismol. Soc. Am., 58: 1131-1168. Peacock, D.C.P. and Sanderson, D.J., 1994. Strain and scaling of faults in the chalk at Flamborough Head, U.K.J. Struct. Geol., 16: 97-107. Pickering, G., Bull, J.M., Sanderson, D.J. and Harrison, P.V., 1994. Fractal fault displacements: A case study from the Moray Firth, Scotland. In: J.H. Kruhl (Editor), Fractals and Dynamic Systems in Geosciences. Springer-Verlag, Berlin, pp. 105-120. Pickering, G., Bull, J.M. and Sanderson, D.J., in press. Scaling of fault displacements and implications for the estimation of sub-seismic strain. In: P. Buchanan and D.A. Nieuwland (Editors), Modern Developments in Structural Interpretation. Spec. Publ. Geol. Soc. London. Press, W.H., Flannery, B.P., Teukolsky, S.A. and Vetterling, W.T., 1986. Numerical Recipes: The Art of Scientific Computing. Cambridge Univ. Press, Cambridge. Priest, S.D. and Hudson, J.A., 1981. Estimation of discontinuity spacing and trace length using scanline surveys. Int. J. Rock Mech. Min. Sci. Geomech. Abstr., 18: 183-197. Rubinstein, R.Y., 1981. Simulation and the Monte Carlo Method. Wiley, New York, NY. Schroeder, M., 1991. Fractals, Chaos, Power-laws: Minutes from a Infinite Paradise. Freeman, New York. Utsu, T., 1965. A method for determining the value of b in a formula log n = a - bM showing the magnitude-frequency relation for earthquakes. Geophys. Bull. Hokkaido Univ., 13: 99-103. Walsh, J.J. and Watterson, J., 1988. Analysis of the relationship between displacements and dimensions of faults. J. Struct. Geol., 10: 239-247. Walsh, J.J., Watterson, J. and Yielding, G., 1991. The importance of small scale faulting in regional extension. Nature, 351: 391-393. Walsh, J.J., Watterson, J. and Yielding, G., 1994. Determination and interpretation of fault size populations: procedures and problems. In: North Sea Oil and Gas Reservoirs --III. Norw. Inst. Technol., pp. 141-155. Yielding, G., Walsh, J.J. and Watterson, J., 1992. The prediction of small-scale faulting in reservoirs. First Break, 10: 449-460.