Computers & Geosciences 26 (2000) 153±166
EZ-ROSE: a computer program for equal-area circular histograms and statistical analysis of two-dimensional p vectorial data Jaco H. Baas*
Department of Earth Sciences, University of Leeds, Leeds LS2 9JT, United Kingdom Received 28 January 1999; received in revised form 16 June 1999; accepted 16 June 1999
Abstract
EZ-ROSE 1.0 is a computer program for the statistical analysis of populations of two-dimensional vectorial data and their presentation in equal-area rose diagrams. The program is implemented as a Microsoft1 Excel workbook containing worksheets for the input of directional (circular) or lineational (semi-circular) data and their automatic processing, which includes the calculation of a frequency distribution for a selected class width, statistical analysis, and the construction of a rose diagram in CorelDraw2. The statistical analysis involves tests of uniformity for the vectorial population distribution, such as the nonparametric Kuiper and Watson tests and the parametric Rayleigh test. The statistics calculated include the vector mean, its magnitude (length) and strength (data concentration); the Batschelet circular standard deviation as an alternative measure of vectorial concentration; and a con®dence sector for the vector mean. The statistics together with the frequency data are used to prepare a Corel Script2 ®le that contains all the necessary instructions to draw automatically an equal-area circular frequency histogram (rose diagram) in CorelDraw2. The advantages of EZ-ROSE, compared to other software for circular statistics, are: (1) the ability to use an equal-area scale in rose diagrams; (2) the wide range of tools for a comprehensive statistical analysis; (3) the ease of use, as Microsoft1 Excel and CorelDraw2 are widely known to users of Microsoft1 Windows; and (4) the high degree of ¯exibility due to the application of Microsoft1 Excel and CorelDraw2, which oer a whole range of tools for possible addition of other statistical methods and changes of the rose-diagram layout. 2000 Elsevier Science Ltd. All rights reserved.
#
Keywords: Rose diagram; Vectorial statistics; Statistical tests; PC software; Spreadsheets
1. Introduction
Data populations of two-dimensional orientation p
Code available from http://www.iamg.org/CGEditor/ index.htm. * Tel.: +44-113-223-6624; fax: +44-113-233-5259.
[email protected] (J.H. Baas).
E-mail address:
0098-3004/00/$ - see front matter PII: S 0 0 9 8 - 3 0 0 4 ( 9 9 ) 0 0 0 7 2 - 2
measurements are widely used in the Earth Sciences. Typical examples include azimuthal measurements of the orientation of fault traces, fold axes and other linear features on geological maps or aerial photographs; the direction of glacial striations, drumlins and aeolian seif ridges; the direction of river channels and scour troughs; the dip direction of sedimentary cross-strata or bedform slipfaces; the orientation of grooves, ¯utes and elongate objects in sedimentary deposits; and the orientation of
# 2000 Elsevier Science Ltd. All rights reserved.
J.H. Baas/ Computers & Geosciences 26 (2000) 153±166
154
elongate sediment grains or mineral crystals in thin sections. Two types of orientation data are distinguished (Potter and Pettijohn, 1963; Davis, 1986; Swan and Sandilands, 1995): data represent features measured as vectors, which means lines of which one end is distinguishable from the other, pointing in one direction (arrows), whereas data represent features measured as common lines, rather than arrows. Both types of data can be analyzed as vectorial, but the distribution range of directional data is considered to be circular (3608), whereas that of the lineational data is semi-circular (1808), with the smaller of the coupled azimuth values used by convention. The statistical analysis of vectorial data diers from conventional Gaussian statistics by using trigonometrical functions to calculate the mean vector and the dispersion of data (e.g., Watson, 1966; Mardia, 1972; Batschelet, 1981; Cheeney, 1983; Davis, 1986; Swan and Sandilands, 1995). This renders vectorial statistics algebraically more complex, and apparently less attractive to earth scientists. Vectorial statistics are seldom used in geological research, although there is no other objective way to summarize and evaluate sets of azimuth data. Instead, geological studies have been relying mainly or totally on a visual assessment of circular frequency histograms, known as . Furthermore, the problem of an inadequate analytical approach is compounded by the fact that the vast majority of rose diagrams are constructed incorrectly (see discussion by Nemec, 1988), which pertains also to the available specialized software. As a consequence, subjective conclusions are drawn by researchers based on the shape of false roses. Rose diagrams are circular histograms in which the frequency of vectorial data in prede®ned azimuthal
directional
lineational
rose diagrams
1 Rockworks99 by RockWare Inc. http://www.rockware. com/catalog/rockware/rockworks/rockworks_speci®cations.html. 2 Stereographic Projections 2.00 by MWSoftware. http:// freespace.virgin.net/martin.walters/mwsoft1.html. 3 Stereo Nett 2.20 by J.P. Duyster. http://homepage.ruhruni-bochum.de/Johannes.P.Duyster/stereo/stereo1.htm. 4 StereoNet for Windows 3.03 by Geological Software. ftp://darwin.ibg.uit.no/pub/stereo/www/stereo.htm. 5 WinTek 2.0 by A. Peterek and S. Krumm. http:// www.geol.uni-erlangen.de/html/software/wintek/wintekmanual.html. 6 GEOrient 7.2 by R.J. Holcombe. http://www.earthsciences. uq.edu.au/~rodh/software/GeorientWhatsNew.html. 7 QuickPlot for Windows 3.00 by D. Van Everdingen. http://www.geomem.co.uk/geomem/products/quikplot.html. 8 Rose 1.03 by Thompson + Thompson. http://php.indiana.edu/~tthomps/programs/home.htm 9 Oriana for Windows 1.04 by Kovach Computing Services. http://www.kovcomp.co.uk/oriana/oribroc.html. 10 SpheriStat2 2.2 by Pangaea Scienti®c. http://www.pangaeasci.com/_sswin2.htm.
classes is plotted as sectors of circles (`pie slices') with a common origin. As in a common histogram, the area of each circular sector should be proportional to the frequency, or density, of the data it contains (Nemec, 1988). Therefore, it is crucial that a rose diagram be constructed using an equal-area frequency scale, with the sector , rather than length, made proportional to the class frequency. The use of an equal-length (arithmetic) frequency scale results in false diagrams, with the high frequencies exaggerated and low frequencies diminished, which may create preferential directions in perfectly random distributions and lead to serious misinterpretations, particularly when not accompanied by proper statistical analysis. Unfortunately, most geological textbooks and ®eld studies dealing with azimuth data use linear-scale rose diagrams (e.g., Collinson and Thompson, 1982; Lindholm, 1987; Bell, 1991; Le Roux, 1992; Yagishita et al., 1992; Benn, 1994; Glover and O'Beirne, 1994; Major, 1998). Likewise, the specialized computer programs commonly fail to account for the equal-area frequency scale of a rose diagram, thus misleading the users (examples include RockWorks99 by RockWare Inc.1; Stereographic Projections 2.00 by MWSoftware2; Stereo Nett 2.20 by J.P. Duyster3; StereoNet for Windows 3.03 by Geological Software4; WinTek 2.0 by A. Peterek & S. Krumm5). Other PC software, although providing an equal-area scaling facility, are short in statistical procedures needed for data-set evaluation (e.g., GEOrient 7.2 by R.J. Holcombe6; QuickPlot for Windows 3.00 by D. Van Everdingen7; Rose 1.03 by Thompson & Thompson8). Some other relevant software simply appears too costly for an occasional user of azimuth data (e.g., Oriana for Windows 1.04 by Kovach Computing Services9; SpheriStat2 2.2 by Pangaea Scienti®c10. The present paper introduces a new computer program, labelled EZ-ROSE 1.0, for a comprehensive analysis of two-dimensional vectorial data. This program combines the computational facilities of Microsoft1 Excel and the drawing facilities of CorelDraw2, both widely available applications for Microsoft1 Windows. EZ-ROSE employs templates that automate the calculation of statistical parameters from azimuthal data sets and their presentation as equal-area rose diagrams. The templates are user-friendly and require little more than a basic knowledge of MS1 Excel and CorelDraw2. Furthermore, the diverse tools available in these software packages increase the ¯exibility of EZ-ROSE beyond that of any of the programs mentioned in the footnotes. For example, other statistics or statistical tests can be added without the need for advanced programming, and the display of the rose diagrams (colour, size, explanatory labels) can be adjusted as desired. The paper ®rst describes the vectorial statistics and test procedures employed in the EZ-ROSE program, and then explains the attributes and the application of the program.
area
J.H. Baas/ Computers & Geosciences 26 (2000) 153±166
tests of uniformity. This statistical procedure means testing of the following null hypothesis (H0) against the corresponding alternative (H1):
2. Statistical treatment of two-dimensional vectorial data
Azimuth data cannot be treated with the same statistical methods as used for common, nonorientation data (Mardia, 1972; Cheeney, 1983; Davis, 1986; Swan and Sandilands, 1995). Suce it to consider the following simple example. Values 1 and 359 on a linear scale are far apart, and have an arithmetical mean of 180 that is perfectly valid. In contrast, values 1 and 3598 on a circular scale are close to each other, and the arithmetical mean of 1808 does not apply. The problem is solved by considering measurements like 1 and 3598 as unit vectors. In this case, the correct mean of 08 (or 3608) is given by the resultant vector direction, derived from the vectorial sum of the unit vectors (Watson, 1966; Davis, 1986). A similar reasoning applies to semi-circular (or lineational) data with azimuth values in the range from 0 to 1808. In a general case, the resultant vector of a population of two-dimensional directional data is represented by the following two coordinates, r and r:
n
X
Y
n
X XXi Xcos yi r
i 1
i1
n
n
Y XYi X r
i 1
n
i1
sin
yi
1
i
where is the number of measurements, is an index denoting the th value in the data population and yi is the azimuth value itself. The direction of the resultant (mean) vector, , is then obtained from the following trigonometrical relationship:
M
i
M
arctan
Y X
r
2
:
r
Since the lineation-type data are limited by the measurement convention to an azimuthal range of 08 to 1808, they have to be spread 8 for the purpose of statistical calculations. This is done by multiplying the individual azimuth values by a factor of 2 prior to the calculations. The resulting mean azimuth is subsequently divided by two. The vector mean, calculated by Eq. (2), has a statistical meaning only if the data set can be considered to derive from a vectorial population that is nonuniform, characterized by some unimodal preferential orientation. Formally, the vectorial population should have a circular-normal frequency distribution, known as the von Mises distribution, just like the frequency distribution in common statistics should be bellshaped, normal (Gaussian) for the mean and variance to be calculated. It is therefore desirable to begin the statistical analysis by testing the vectorial population for preferential distribution, using one of the so-called
modulo 360
155
H:
0 The vectorial population sampled has a uniform
H:
non-preferential distribution
1 The vectorial population sampled has a non-uniform
preferential distribution
3
The `null' hypothesis postulates a lack of dierence between the distribution of the studied population and a uniform (`chaotic') reference distribution. The three most frequently used tests of uniformity are, in the order of increasing strength: the nonparametric Kuiper test, the nonparametric Watson test and the parametric Rayleigh test. (In a `parametric' test, the statistical hypothesis is formulated in terms of the population distribution's parameters, rather than the frequency distribution as a whole; see below.)
2.1. The Kuiper test The vectorial data, yi (modulo 3608), are ®rst sorted from the smallest to the largest value and given indices from = 1 to . The unit deviation, y =360 ÿ = , is subsequently calculated for each azimuth value. The Kuiper test then uses the largest positive and negative deviations to calculate the following statistic:
i
Vn
i=n
max
yi ÿ 360
i
iÿ n
min
yi ÿ 360
i 1, n n
in
4
where the pre®xes `max' and `min' denote the largest positive and the largest negative deviation, respectively. The higher the n value, the more preferential the distribution of the vectorial population. The critical value of the test statistic, a, is calculated (for a selected risk-of-error value, a, referred to as the signi®cance level) as follows:
V
V
V
3 a
n 0 155 0 24 pn where V is 1.75 for a=5% and 2.00 for a=1%. If VnrV , the null hypothesis in Eq. (3) is rejected, a
p
V
:
:
=
5
3 a
a
which means that the vectorial population can be considered to have a preferential orientation, at a con®dence level of (100ÿa )%. Otherwise, the null hypothesis cannot be rejected and the alternative hypothesis has to be accepted.
2.2. The Watson test This nonparametric test is somewhat stronger, as it involves all the azimuth data, yi (modulo 3608), to calculate the following statistic:
156
u
2
u 0n 1 ÿ 0n1
:
2 3
:
J.H. Baas/ Computers & Geosciences 26 (2000) 153±166
2
1
0:8
n
6
,
q
R 1n X Y 2 r
2 3
n yi 2 2 X n yi 1 X n yi X ÿ i n i1 360 n i1 360 i1 360 8 9