May 17, 1984 - IF (ISC2 EQ ISM) ILAB=I. IF (ISC3 EQ ISM) ILAB=2. IF (ISC4 EQ ISM) ILAB=3 ...... IF ((ILAB EQ O) OR (ILAB EQ 2)) GOTO 120. XX(I )=XX(I )+ ...
Computers&GeosclencesVol 11 No 6, pp 725-766 1985
0098-3004]85 $ 3 0 0 + 00 © 1985 Pergamon Press Ltd
Pnnted m the U S A
SPHERE" A CONTOURING PROGRAM FOR SPHERICAL DATA PETER J DIGGLE CSIRO Division of Mathematics and Statlstlcs, PO Box 1965, Canberra, ACT 2601, Austraha and NICHOLAS I FISHER CSIRO Division of Mathematics and Statistics, PO BOX 218, Lmdfield, NSW 2070, Australia
(Recetved 17 May 1984, accepted 12 July 1985) Abstract--Thts paper descnbes a method for d~splaymg a sample of sphencal data, by computing an "optimally" smoothed eshmate of the underlying distribution and making a stereographic projection of the contours of this estimate An interactive FORTRAN program which applies this method is supplied and descnbed and examples given of its use
Key Words Axial data, Cross-validation, Density estimation, Nonparametnc method, Sphencal data, Vectorial data
INTRODUCTION In several branches of the Earth sciences, data arise as lines in three-dimensional space We refer to such data as vectors or axes according as each line is directed 0 e has a sense) or undirected These two sorts of data can be represented as points on the surface of a unit radius sphere or hemisphere, respectively The usual method of displaying the data is to project each hemisphere onto a disc and to plot the projections of the individual data points For vectorial data, the two hemispheres are distinguished either by using separate discs or different plotting symbols A useful supplement to this raw data plot, particularly for abundant data, is a contour map giving an esttmate of the probability density at each point on the surface of the sphere In this paper, we describe a nonparametnc probabdlty density estimator for spherical data and present a F O R T R A N program for its implementation The estimator is o f the kernel type (Rosenblatt, 1956), in which an equal probability mass is centered on each data point, but smeared over the surface of the sphere to an extent determined by the sample size and the statistical properties of the data Van Alstlne (1980) uses essentmlly the same estimator but does not prowde an objective method of determining the a m o u n t of smearing or smoothing In particular, he does not indicate that this should depend on the sample s i z e - large samples need less smoothing Schaeben (1982) has derived a kernel depending on the sample size but his recommendation seems to differ substantially from ours (This is discussed further in the next section) Watson (1983, p 37-39) suggests an objective method based on cross-validation The kernel estimator is a refinement of the widely used method of countmg the n u m b e r of data points in a moving spherical cap of fixed size A basic assumption in this work is that the 725
data arise as a set of independent measurements from some c o m m o n underlying distribution Our program calculates summary statistics, tests for significant departure from a uniform distribution on the sphere or hemisphere, and produces a raw data plot or contour map as requested by the user It prompts the user to supply values for various options to control the m a n n e r in which the results are displayed There is proxaslon for the user to supply further code in the form of subroutines, for example to compute Flshenan statistics for a unlmodal sample of vectors Also, estimates of density can be written to a file if the user wishes to create a shade plot The next section gives the technical detads of our implementation of the kernel method of density estimation The section on Program Operation describes the various options m the program Another section presents a n u m b e r of illustrative examples using simulated and real data sets The program listing is given in the Appendix A KERNEL ESTIMATOR FOR SPHERICAL DATA
General considerations The problem of nonparametric probability density estimation for linear data has generated an extensive literature Currently available methods include kernel (Rosenblatt, 1956), k-nearest neighbor (Loftsgaarden and Quesenberry, 1965), orthogonal senes (Kronmal and Tarter, 1968) and penalized m a x i m u m hkehhood (Good and Gasklns, 1971) estimators Fryer (1977) reviews developments up to that date Sllverman (1981) mentions some more recent contributions and, in partlcular, discusses bivanate density estimation It seems that for each method, the most critical fator in ensuring a succesful implementation is the selection of a nu-
P J DIGGLEand N I FISHER
726
merlcal value for a constant which determines the extent to which the raw data are smoothed With this important proviso, each method can give good results At least m pnncxple, all of the methods of density esumatlon developed for linear data can be adapted to the sphencal situation In practice, only the kernel estimator seemingly has been so adapted (Van Alstme, 1980, Watson 1983) We have selected the kernel estimator for two reasons ~t ~s computatmnally convement, and to a hm~ted extent, theoretical results are avadable to guide the selection of a value for the smoothing constant
Vectorial data We identify a point on the surface of the unit sphere as (0. q~),where 0 and 4~denote colatatude and longatude, respectively [See Mardla (1972, p 214) or Fisher, Lewis, and Embleton (1986, Section 2 2) for relaUonships between polar coordinates and other coorchnate systems ] A set of vectonal data then can be represented as points (0,, ~b,) t = 1, , n Now, let F(O, 4~, 0o. 4ao, c) denote a Flshenan density (Fisher, 1953) with mode at (0o, 4~0)and concentration parameter c Our kernel estimator of the density at the point (0, q~) ~s
f(O, $), we use a robust estimator for r to protect against distortion by outlyrag observatmns For densmes with two or more modes, Equation (2) may lead to gross over-smoothing, particularly if the modes are separated widely As an extreme example, if about one-half the observations are concentrated around each of two antipodean modes, ~ will be close to zero and f(O, 4~) will be approximately constant, whereas a better value of c could be obtained in p n n clple by applying Equation (2) to each half-set of data separately and averaging the results In pracuce, it is seldom possible to assign each data point unamblgously to a particular modal group However, Watson (1983, p 38-39) has described an objective method which is not tied to any particular distnbutional family This consists of selecting c to maximize a cross-validated log-hkehhood, n
L(c) = Z log{f~(0,, ~b,)},
(3)
t=l
where, for any given value of c, f~(0, q~) denotes the kernel estimator Equatmn (1) calculated from all the data-points except (0,, c~,) A limitation of Equation (3) is that an outlying data point will contribute a large, negative value to L(c) f(o, 4~) = n-' ~ F(O, 4~, 0,, 4,. c) (1) t=l unless c is small, thus, the method leads to oversmoothing of data-sets with one or more outhers In In Equation (1), if c is large each Fishenan density view of this, and remarks in the previous paragraph, a in the s u m m a t m n is concentrated around ~ts corre- sensible and conservative policy is to use the larger of sponding mode (0,, 4Q, and a contour m a p off(0, 4~) the two values of c lmphed by Equations (2) and (3) ~s t a n t a m o u n t to a raw data plot As c decreases, respectively Schaeben (1982, Eq (23)) seems to be suggesting f(O, 4~) becomes progressively smoother until, when c = 0,f(0, 4~) = (4¢r)-I for all (0, 40, thus, a larger value that c should be proportional to n rather than n 1/3 (cf of c ~mphes less smoothing and v~ce versa Relevant Eq (2)) for Flshenan densities We are unable to recfactors in the determination of a statable value for c oncde these two recommendations, our expenence has include the sample size and the degree of concentration been that our selection produces reasonable contours with synthetic Flshenan data, and for such data yields of the data Ideally, we should select c to opum~ze an a value of c similar to the cross-validatmn estimate appropriate performance criterion One approach is the following Define the m e a n inAxial data tegrated squared error (MISE) as A set of axial data again can be represented as points (0,, q~,) l = 1, , n on the surface of the unit sphere except that now, each (0,, 4~,)is indistinguishable from its antipode It follows that the estimate of density, f(O, ok), should incorporate antipodal symmetry To where f(O, ~b) is the true probability density function Now suppose that f(O, c~) is a Flshenan density with achieve this, we consider a bipolar Watson density S(O, ~b,00, 4~o,c) which has a polar axis through (00, tho)and concentration parameter r (t e F(O, ~b, 00, ~bo, ~) For large K, this IS approximated closely by the blvanate concentration parameter c > 0 [This distnbutlon has a variety of names, for further details and properties normal centered on (00, $o) with variances a 2 = r -l see Fisher, Lewis, and Embleton ( 1986, Chapter 1 and and zero correlation, the approxImatmn is reasonable even for K as small as 3 or 4 An adaptation of a result Section 4 4 4] We then define an estimate
.,s~,~,= ~[f f {f(o,~)- f(o,~)}Zslnodod(o],
m Cacoullos (1966) then gwes the value of c which approximately mlmmlzes MISE(c) as
f(O, (p) = rt-1 ~ S(O, ~p, 0,, (p,, c),
(4)
l=l
c = xn 1/3
(2)
Replacing r m Equation (2) by an estimate k gives an objective method for selecting c which gives good results for ummodal, approximately symmetric densities
by direct analogy with Equation (1) There remains the question of selecting the value of c in Equation (4) Suppose that the true density IS itself of the bipolar Watson type, with concentration pa-
SPHERE a contounng program for spherical data
727
Table 1 Sample interactive analysis for data of Example 3 (file-name nclat dat) enter file-name for data nclat dat enter format (in parentheses) for data, e g (2f10 4) (2f10 4) enter number of data-points (14) 63 enter file-name for plot-file nclat p 1 enter 1 ff function values are to be written to a named file else 0 (ll) 0 enter 0 ff data are vectors, 1 if axes,2 if plants 01 ) 0 indicate coordinate system of data as follows (ll) enter 0 for (colatltude,longltude) 1 for (dechnatlon,mchnatlon) 2 for (latitude, longitude) 3 for (dlp,&p direction) o f plane or (plunge,plunge azimuth) 2 enter 1 to reverse some directions,else 0 (l l) 0
0 enter 1 to reanalyze all or part of this data set, else 0 (11)
summary statistics for 63 data-points
enter 1 if plot of raw data is reqmred,else 0 (11)
mean &rectlon = - 8 0 85 97 77 (direction cosines) = - 0 0215 0 1576
enter 1 if contour plot IS required,else 0 (11)
1
data set contains a total of 63 observations enter first and last observation numbers for block of data to be reanalyzed (214) 1 63 enter file-name for plot-file nclat p2 enter 1 to reverse some directions,else 0 (11) 0 enter 1 for rotation,else 0 (11) 0 enter 0 for equal-angle (1 e Wulff or stereo) projecUon 1 for equal-area (l e Schmldt) projection (11) 1
enter 1 to obtain graduations on orcle,else 0 (11) 1
enter 1 for labehng of axes, else 0 (11) 1 1
resultant length eigenvalues
- 0 9873
1
enter 1 for guidance on value of smoothing constant, else 0 (11)
= 49 1 0 6818
0 2214
1
0 0968
elgenvectors as direction cosines 1 m n
0 0083 - 0 1918 09814
0 3102 0 9335 0 1798
estimate of optimal smoothing constant from Flshenan model 18 12 cross-vahdatlon method 14 36 enter value of cappa for smoothing (f8 2)
0 9506 - 0 3029 -00672
value of smoothing constant is cappa= 18 12
and as coordinates of end-points thetal phil
78 93 272 48
l0 36 71 62
- 3 86 342 32
theta2 phi2
- 7 8 93 92 48
- 1 0 36 251 62
3 86 162 32
enter 0 for automatic selecUon of contour heights, else 1 (11) 0 enter number of contours, maximum 20 (12) 6 enter 0 for contours equally spaced by function value 1 for contours equally spaced by probablhty 2 for contours equally spaced in log(prob ) (11)
enter 1 for test of umformlty, else 0 (11) 1
test of uniformity statistic= 36 418 p-value 0 As w~th F l s h e r ' s d i s t r i b u t i o n in t h e vectorial sltuaUon, this c a n b e a p p r o x i m a t e d b y t h e p r o d uct o f t w o N o r m a l densities, b u t n o w w i t h c o m m o n v a r i a n c e a 2 = (2v) -1 T h e a n a l o g o f E q u a t i o n (2) is = 2 v n 1/3,
(5)
a n d t h e u n k n o w n value o f v c a n b e r e p l a c e d by t h e usual e s t i m a t e ( M a r d i a , 1972, p 2 5 3 - 2 5 4 ) A negative value o f v IS legitimate, a n d d e t e r m i n e s a girdle d e n s i t y w h i c h is c o n c e n t r a t e d a r o u n d a great circle In this situation, t h e r e is n o c o n v i n c i n g analog o f E q u a t i o n (2), essentially b e c a u s e t h e d e n s i t y is c o n c e n t r a t e d
728
P J DIGGLEand N I FISHER plicable equally to bipolar, girdle, or more comphcated &stnbutlonal forms However, it remains susceptible to outhers Thus, for bipolar data sets (for which only one elgenvalue o f T m Equation (6) is larger than J) we r e c o m m e n d using the larger of the two values o f c obtmnable from Equations (3) and (5), for gardle data sets, we r e c o m m e n d selecting c by cross-validation but vath subjective adjustment if there are outhers m the data PROGRAM OPERATION
Graphical subroutmes
Figure 1 Tnangulatton of disc used as basis of contounng Each equdateral triangle (a) is sub&wded Into four tnangles, each scalene triangle (b) is subdivided into three about a line rather than symmetrically about a unique point The cross-validated log-likelihood approach is available as an alternative to Equation (5), and is ap-
Subroutines which generate information for transmission to a plotting device are necessarily specific to particular graphical software systems and must be supphed by the user Their names, syntax, and purpose are as follows (a) S U B R O U T I N E P L O P E N (XMIN, X M A X , YMIN, YMAX, NAME) REAL*4 X M I N , X M A X , Y M I N , Y M A X C H A R A C T E R * 15 N A M E 1nitrates hst of plotting instructions to be
Table 2 Information input to program SPHERE Input
OpUons
file-name for data
maximum 15 characters
format for data
format statement in parentheses, e g (2f10 4)
number of data-points
maximum 1000
file-name for plot-file
maximum 15 characters
projections of selected points (0, $) and corresponding values off(O, 4)) written to named file
yes/no
nature of data
vectors, axes or planes
coordinate system
colatltude and longitude, declination and mchnatlon, latitude and longitude, dip and &p direction
reversal of some directions
yes/no
directions to be reversed
first and last radices of block of data to be reversed
test of umformtty
yes/no
rotatton of data
yes/no
direction of rotation
to pnnctpal axes, or to specified new polar d~rection
projection
equal-angle (Wulff) or equal-area (Schmldt)
graduations marked on circles
yes/no
axes labeled
yes/no
raw data plot
yes/no
contour plot
yes/no
guidance on value of smoothing constant
yes/no
value of smoothing constant
positive real, maximum 1500 0
contour hoghts
specified by user, or selected automatically
method of automatic selection of contour heights
equally spaced by function values, by probablhty or by log (probablhty) (see text)
number of contours
maximum 20
reanalysls of same data set
yes/no
729
SPHERE a contounng program for spherical data
!
qn
180
I
i
i
i
i
i
I
I
4.
qfl
I
I
I
i
I
I
¢
i
st
i
1
I
30
I
i
I
I
i
i
~
i
i
I
i
181-1
i
O[
270
27n
Figure 2A 50 observations simulated from F((90, 0), 10) distribution (see example 1)
,~
...........
,
'ii((g/i!'" -
i |
J
,
4S l -
~r
k+=. + ~
.............
"i
)l sl, l,J
-
•
.
]
t
-
F~gure 2B Equal-area projection of contours of estimated density
placed in file N A M E , defines rectangular plotting w i n d o w X M I N < x --< X M A X , Y M I N _-< y < Y M A X (b) S U B R O U T I N E P L C L O S ends hat o f plotting instructions (may be red u n d a n t in some software systems) (c) S U B R O U T I N E P O I N T S (X, Y, N ) R E A L * 4 X(N), Y(N) INTEGER N plots coordinates (X(l), Y(t)) i = 1, ,N using whatever plotting symbol is preferred (d) S U B R O U T I N E LINES (X, Y, N, L T Y P E ) REAL*4 X ( N ) , Y(N) I N T E G E R N, L T Y P E draws h n e segments to c o n n e c t points (X(l), Y(I)),(X(I+ 1), Y(I+ 1)) 1 = 1, , N - I, called with L T Y P E = 0, l, or 2 to correspond to hght, m t e r m e d m t e , a n d heavy hnes if desired
S u m m a r y statlstws A basic s u m m a r y staUstlc for vectorial data is the m e a n direction defined as follows Let (x,, v,, z,) denote the d l r e c n o n cosines c o r r e s p o n d i n g to the p o i n t (0,, ~b,), l = l, , n T h e n , the m e a n direction is the u n i t
Figure 2C Rotated contours and data
vector m t h r e e - d i m e n s i o n a l space j o i n i n g the o n g m to the point (Xx,/R, ~,y,/R, ~,z,/R), where R 2 = (~x,) 2 + (2;y,)2 + (~z,) 2 T h e m e a n resultant length, /~ = R/n, xs a measure of the degree of c o n c e n t r a t i o n a b o u t the m e a n direction Note that 0 < / ~ < I F u r t h e r clues as to the shape o f the underlying dist r i b u t i o n are provided by a n elgenanalysls of the symmetric m a t r i x
730
P J DIGGLEand N I FISHER 90
90
180
b
o
270
270
'
~80
Figure 3A Mixture of data from Example 1 with 50 observations mmulated from F((O, 0), 10) distribution (see example 2)
Ftgure 3B Equal-area projection of contours of estimated denmty
Table 1 shows the form m which summary statistics are output by SPHERE For axial data, only the elgenanalyms Is relevant, and calculations assooated with the resultant vector are omitted A test ofumformtty
There is no point m attempting to contour data that do not devmte sIgmficantly from a umform distribution on the sphere or hemisphere Accordingly, SPHERE incorporates an optional test of uniformity and outputs its result Dlggle, Fisher, and Lee (1985) investigate a number of such tests, and recommend a statistic due to Gln~ (1975) which is consistent against all alternatwes to umform~ty In the vectonal mtuataon, the test statlst~c
Figure 3C Rotated contours and data
IS
gt = 3n/2 - {4/(nr)} Z ~ (~k,j + sin ~b,j), t