sphere" a contouring program for spherical data - Science Direct

Computers&GeosclencesVol 11 No 6, pp 725-766 1985

0098-3004]85 $ 3 0 0 + 00 © 1985 Pergamon Press Ltd

Pnnted m the U S A

SPHERE" A CONTOURING PROGRAM FOR SPHERICAL DATA PETER J DIGGLE CSIRO Division of Mathematics and Statlstlcs, PO Box 1965, Canberra, ACT 2601, Austraha and NICHOLAS I FISHER CSIRO Division of Mathematics and Statistics, PO BOX 218, Lmdfield, NSW 2070, Australia

(Recetved 17 May 1984, accepted 12 July 1985) Abstract--Thts paper descnbes a method for d~splaymg a sample of sphencal data, by computing an "optimally" smoothed eshmate of the underlying distribution and making a stereographic projection of the contours of this estimate An interactive FORTRAN program which applies this method is supplied and descnbed and examples given of its use

Key Words Axial data, Cross-validation, Density estimation, Nonparametnc method, Sphencal data, Vectorial data

INTRODUCTION In several branches of the Earth sciences, data arise as lines in three-dimensional space We refer to such data as vectors or axes according as each line is directed 0 e has a sense) or undirected These two sorts of data can be represented as points on the surface of a unit radius sphere or hemisphere, respectively The usual method of displaying the data is to project each hemisphere onto a disc and to plot the projections of the individual data points For vectorial data, the two hemispheres are distinguished either by using separate discs or different plotting symbols A useful supplement to this raw data plot, particularly for abundant data, is a contour map giving an esttmate of the probability density at each point on the surface of the sphere In this paper, we describe a nonparametnc probabdlty density estimator for spherical data and present a F O R T R A N program for its implementation The estimator is o f the kernel type (Rosenblatt, 1956), in which an equal probability mass is centered on each data point, but smeared over the surface of the sphere to an extent determined by the sample size and the statistical properties of the data Van Alstlne (1980) uses essentmlly the same estimator but does not prowde an objective method of determining the a m o u n t of smearing or smoothing In particular, he does not indicate that this should depend on the sample s i z e - large samples need less smoothing Schaeben (1982) has derived a kernel depending on the sample size but his recommendation seems to differ substantially from ours (This is discussed further in the next section) Watson (1983, p 37-39) suggests an objective method based on cross-validation The kernel estimator is a refinement of the widely used method of countmg the n u m b e r of data points in a moving spherical cap of fixed size A basic assumption in this work is that the 725

data arise as a set of independent measurements from some c o m m o n underlying distribution Our program calculates summary statistics, tests for significant departure from a uniform distribution on the sphere or hemisphere, and produces a raw data plot or contour map as requested by the user It prompts the user to supply values for various options to control the m a n n e r in which the results are displayed There is proxaslon for the user to supply further code in the form of subroutines, for example to compute Flshenan statistics for a unlmodal sample of vectors Also, estimates of density can be written to a file if the user wishes to create a shade plot The next section gives the technical detads of our implementation of the kernel method of density estimation The section on Program Operation describes the various options m the program Another section presents a n u m b e r of illustrative examples using simulated and real data sets The program listing is given in the Appendix A KERNEL ESTIMATOR FOR SPHERICAL DATA

General considerations The problem of nonparametric probability density estimation for linear data has generated an extensive literature Currently available methods include kernel (Rosenblatt, 1956), k-nearest neighbor (Loftsgaarden and Quesenberry, 1965), orthogonal senes (Kronmal and Tarter, 1968) and penalized m a x i m u m hkehhood (Good and Gasklns, 1971) estimators Fryer (1977) reviews developments up to that date Sllverman (1981) mentions some more recent contributions and, in partlcular, discusses bivanate density estimation It seems that for each method, the most critical fator in ensuring a succesful implementation is the selection of a nu-

P J DIGGLEand N I FISHER

726

merlcal value for a constant which determines the extent to which the raw data are smoothed With this important proviso, each method can give good results At least m pnncxple, all of the methods of density esumatlon developed for linear data can be adapted to the sphencal situation In practice, only the kernel estimator seemingly has been so adapted (Van Alstme, 1980, Watson 1983) We have selected the kernel estimator for two reasons ~t ~s computatmnally convement, and to a hm~ted extent, theoretical results are avadable to guide the selection of a value for the smoothing constant

Vectorial data We identify a point on the surface of the unit sphere as (0. q~),where 0 and 4~denote colatatude and longatude, respectively [See Mardla (1972, p 214) or Fisher, Lewis, and Embleton (1986, Section 2 2) for relaUonships between polar coordinates and other coorchnate systems ] A set of vectonal data then can be represented as points (0,, ~b,) t = 1, , n Now, let F(O, 4~, 0o. 4ao, c) denote a Flshenan density (Fisher, 1953) with mode at (0o, 4~0)and concentration parameter c Our kernel estimator of the density at the point (0, q~) ~s

f(O, $), we use a robust estimator for r to protect against distortion by outlyrag observatmns For densmes with two or more modes, Equation (2) may lead to gross over-smoothing, particularly if the modes are separated widely As an extreme example, if about one-half the observations are concentrated around each of two antipodean modes, ~ will be close to zero and f(O, 4~) will be approximately constant, whereas a better value of c could be obtained in p n n clple by applying Equation (2) to each half-set of data separately and averaging the results In pracuce, it is seldom possible to assign each data point unamblgously to a particular modal group However, Watson (1983, p 38-39) has described an objective method which is not tied to any particular distnbutional family This consists of selecting c to maximize a cross-validated log-hkehhood, n

L(c) = Z log{f~(0,, ~b,)},

(3)

t=l

where, for any given value of c, f~(0, q~) denotes the kernel estimator Equatmn (1) calculated from all the data-points except (0,, c~,) A limitation of Equation (3) is that an outlying data point will contribute a large, negative value to L(c) f(o, 4~) = n-' ~ F(O, 4~, 0,, 4,. c) (1) t=l unless c is small, thus, the method leads to oversmoothing of data-sets with one or more outhers In In Equation (1), if c is large each Fishenan density view of this, and remarks in the previous paragraph, a in the s u m m a t m n is concentrated around ~ts corre- sensible and conservative policy is to use the larger of sponding mode (0,, 4Q, and a contour m a p off(0, 4~) the two values of c lmphed by Equations (2) and (3) ~s t a n t a m o u n t to a raw data plot As c decreases, respectively Schaeben (1982, Eq (23)) seems to be suggesting f(O, 4~) becomes progressively smoother until, when c = 0,f(0, 4~) = (4¢r)-I for all (0, 40, thus, a larger value that c should be proportional to n rather than n 1/3 (cf of c ~mphes less smoothing and v~ce versa Relevant Eq (2)) for Flshenan densities We are unable to recfactors in the determination of a statable value for c oncde these two recommendations, our expenence has include the sample size and the degree of concentration been that our selection produces reasonable contours with synthetic Flshenan data, and for such data yields of the data Ideally, we should select c to opum~ze an a value of c similar to the cross-validatmn estimate appropriate performance criterion One approach is the following Define the m e a n inAxial data tegrated squared error (MISE) as A set of axial data again can be represented as points (0,, q~,) l = 1, , n on the surface of the unit sphere except that now, each (0,, 4~,)is indistinguishable from its antipode It follows that the estimate of density, f(O, ok), should incorporate antipodal symmetry To where f(O, ~b) is the true probability density function Now suppose that f(O, c~) is a Flshenan density with achieve this, we consider a bipolar Watson density S(O, ~b,00, 4~o,c) which has a polar axis through (00, tho)and concentration parameter r (t e F(O, ~b, 00, ~bo, ~) For large K, this IS approximated closely by the blvanate concentration parameter c > 0 [This distnbutlon has a variety of names, for further details and properties normal centered on (00, $o) with variances a 2 = r -l see Fisher, Lewis, and Embleton ( 1986, Chapter 1 and and zero correlation, the approxImatmn is reasonable even for K as small as 3 or 4 An adaptation of a result Section 4 4 4] We then define an estimate

.,s~,~,= ~[f f {f(o,~)- f(o,~)}Zslnodod(o],

m Cacoullos (1966) then gwes the value of c which approximately mlmmlzes MISE(c) as

f(O, (p) = rt-1 ~ S(O, ~p, 0,, (p,, c),

(4)

l=l

c = xn 1/3

(2)

Replacing r m Equation (2) by an estimate k gives an objective method for selecting c which gives good results for ummodal, approximately symmetric densities

by direct analogy with Equation (1) There remains the question of selecting the value of c in Equation (4) Suppose that the true density IS itself of the bipolar Watson type, with concentration pa-

SPHERE a contounng program for spherical data

727

Table 1 Sample interactive analysis for data of Example 3 (file-name nclat dat) enter file-name for data nclat dat enter format (in parentheses) for data, e g (2f10 4) (2f10 4) enter number of data-points (14) 63 enter file-name for plot-file nclat p 1 enter 1 ff function values are to be written to a named file else 0 (ll) 0 enter 0 ff data are vectors, 1 if axes,2 if plants 01 ) 0 indicate coordinate system of data as follows (ll) enter 0 for (colatltude,longltude) 1 for (dechnatlon,mchnatlon) 2 for (latitude, longitude) 3 for (dlp,&p direction) o f plane or (plunge,plunge azimuth) 2 enter 1 to reverse some directions,else 0 (l l) 0

0 enter 1 to reanalyze all or part of this data set, else 0 (11)

summary statistics for 63 data-points

enter 1 if plot of raw data is reqmred,else 0 (11)

mean &rectlon = - 8 0 85 97 77 (direction cosines) = - 0 0215 0 1576

enter 1 if contour plot IS required,else 0 (11)

1

data set contains a total of 63 observations enter first and last observation numbers for block of data to be reanalyzed (214) 1 63 enter file-name for plot-file nclat p2 enter 1 to reverse some directions,else 0 (11) 0 enter 1 for rotation,else 0 (11) 0 enter 0 for equal-angle (1 e Wulff or stereo) projecUon 1 for equal-area (l e Schmldt) projection (11) 1

enter 1 to obtain graduations on orcle,else 0 (11) 1

enter 1 for labehng of axes, else 0 (11) 1 1

resultant length eigenvalues

- 0 9873

1

enter 1 for guidance on value of smoothing constant, else 0 (11)

= 49 1 0 6818

0 2214

1

0 0968

elgenvectors as direction cosines 1 m n

0 0083 - 0 1918 09814

0 3102 0 9335 0 1798

estimate of optimal smoothing constant from Flshenan model 18 12 cross-vahdatlon method 14 36 enter value of cappa for smoothing (f8 2)

0 9506 - 0 3029 -00672

value of smoothing constant is cappa= 18 12

and as coordinates of end-points thetal phil

78 93 272 48

l0 36 71 62

- 3 86 342 32

theta2 phi2

- 7 8 93 92 48

- 1 0 36 251 62

3 86 162 32

enter 0 for automatic selecUon of contour heights, else 1 (11) 0 enter number of contours, maximum 20 (12) 6 enter 0 for contours equally spaced by function value 1 for contours equally spaced by probablhty 2 for contours equally spaced in log(prob ) (11)

enter 1 for test of umformlty, else 0 (11) 1

test of uniformity statistic= 36 418 p-value 0 As w~th F l s h e r ' s d i s t r i b u t i o n in t h e vectorial sltuaUon, this c a n b e a p p r o x i m a t e d b y t h e p r o d uct o f t w o N o r m a l densities, b u t n o w w i t h c o m m o n v a r i a n c e a 2 = (2v) -1 T h e a n a l o g o f E q u a t i o n (2) is = 2 v n 1/3,

(5)

a n d t h e u n k n o w n value o f v c a n b e r e p l a c e d by t h e usual e s t i m a t e ( M a r d i a , 1972, p 2 5 3 - 2 5 4 ) A negative value o f v IS legitimate, a n d d e t e r m i n e s a girdle d e n s i t y w h i c h is c o n c e n t r a t e d a r o u n d a great circle In this situation, t h e r e is n o c o n v i n c i n g analog o f E q u a t i o n (2), essentially b e c a u s e t h e d e n s i t y is c o n c e n t r a t e d

728

P J DIGGLEand N I FISHER plicable equally to bipolar, girdle, or more comphcated &stnbutlonal forms However, it remains susceptible to outhers Thus, for bipolar data sets (for which only one elgenvalue o f T m Equation (6) is larger than J) we r e c o m m e n d using the larger of the two values o f c obtmnable from Equations (3) and (5), for gardle data sets, we r e c o m m e n d selecting c by cross-validation but vath subjective adjustment if there are outhers m the data PROGRAM OPERATION

Graphical subroutmes

Figure 1 Tnangulatton of disc used as basis of contounng Each equdateral triangle (a) is sub&wded Into four tnangles, each scalene triangle (b) is subdivided into three about a line rather than symmetrically about a unique point The cross-validated log-likelihood approach is available as an alternative to Equation (5), and is ap-

Subroutines which generate information for transmission to a plotting device are necessarily specific to particular graphical software systems and must be supphed by the user Their names, syntax, and purpose are as follows (a) S U B R O U T I N E P L O P E N (XMIN, X M A X , YMIN, YMAX, NAME) REAL*4 X M I N , X M A X , Y M I N , Y M A X C H A R A C T E R * 15 N A M E 1nitrates hst of plotting instructions to be

Table 2 Information input to program SPHERE Input

OpUons

file-name for data

maximum 15 characters

format for data

format statement in parentheses, e g (2f10 4)

number of data-points

maximum 1000

file-name for plot-file

maximum 15 characters

projections of selected points (0, $) and corresponding values off(O, 4)) written to named file

yes/no

nature of data

vectors, axes or planes

coordinate system

colatltude and longitude, declination and mchnatlon, latitude and longitude, dip and &p direction

reversal of some directions

yes/no

directions to be reversed

first and last radices of block of data to be reversed

test of umformtty

yes/no

rotatton of data

yes/no

direction of rotation

to pnnctpal axes, or to specified new polar d~rection

projection

equal-angle (Wulff) or equal-area (Schmldt)

graduations marked on circles

yes/no

axes labeled

yes/no

raw data plot

yes/no

contour plot

yes/no

guidance on value of smoothing constant

yes/no

value of smoothing constant

positive real, maximum 1500 0

contour hoghts

specified by user, or selected automatically

method of automatic selection of contour heights

equally spaced by function values, by probablhty or by log (probablhty) (see text)

number of contours

maximum 20

reanalysls of same data set

yes/no

729

SPHERE a contounng program for spherical data

!

qn

180

I

i

i

i

i

i

I

I

4.

qfl

I

I

I

i

I

I

¢

i

st

i

1

I

30

I

i

I

I

i

i

~

i

i

I

i

181-1

i

O[

270

27n

Figure 2A 50 observations simulated from F((90, 0), 10) distribution (see example 1)

,~

...........

,

'ii((g/i!'" -

i |

J

,

4S l -

~r

k+=. + ~

.............

"i

)l sl, l,J

-

•

.

]

t

-

F~gure 2B Equal-area projection of contours of estimated density

placed in file N A M E , defines rectangular plotting w i n d o w X M I N < x --< X M A X , Y M I N _-< y < Y M A X (b) S U B R O U T I N E P L C L O S ends hat o f plotting instructions (may be red u n d a n t in some software systems) (c) S U B R O U T I N E P O I N T S (X, Y, N ) R E A L * 4 X(N), Y(N) INTEGER N plots coordinates (X(l), Y(t)) i = 1, ,N using whatever plotting symbol is preferred (d) S U B R O U T I N E LINES (X, Y, N, L T Y P E ) REAL*4 X ( N ) , Y(N) I N T E G E R N, L T Y P E draws h n e segments to c o n n e c t points (X(l), Y(I)),(X(I+ 1), Y(I+ 1)) 1 = 1, , N - I, called with L T Y P E = 0, l, or 2 to correspond to hght, m t e r m e d m t e , a n d heavy hnes if desired

S u m m a r y statlstws A basic s u m m a r y staUstlc for vectorial data is the m e a n direction defined as follows Let (x,, v,, z,) denote the d l r e c n o n cosines c o r r e s p o n d i n g to the p o i n t (0,, ~b,), l = l, , n T h e n , the m e a n direction is the u n i t

Figure 2C Rotated contours and data

vector m t h r e e - d i m e n s i o n a l space j o i n i n g the o n g m to the point (Xx,/R, ~,y,/R, ~,z,/R), where R 2 = (~x,) 2 + (2;y,)2 + (~z,) 2 T h e m e a n resultant length, /~ = R/n, xs a measure of the degree of c o n c e n t r a t i o n a b o u t the m e a n direction Note that 0 < / ~ < I F u r t h e r clues as to the shape o f the underlying dist r i b u t i o n are provided by a n elgenanalysls of the symmetric m a t r i x

730

P J DIGGLEand N I FISHER 90

90

180

b

o

270

270

'

~80

Figure 3A Mixture of data from Example 1 with 50 observations mmulated from F((O, 0), 10) distribution (see example 2)

Ftgure 3B Equal-area projection of contours of estimated denmty

Table 1 shows the form m which summary statistics are output by SPHERE For axial data, only the elgenanalyms Is relevant, and calculations assooated with the resultant vector are omitted A test ofumformtty

There is no point m attempting to contour data that do not devmte sIgmficantly from a umform distribution on the sphere or hemisphere Accordingly, SPHERE incorporates an optional test of uniformity and outputs its result Dlggle, Fisher, and Lee (1985) investigate a number of such tests, and recommend a statistic due to Gln~ (1975) which is consistent against all alternatwes to umform~ty In the vectonal mtuataon, the test statlst~c

Figure 3C Rotated contours and data

IS

gt = 3n/2 - {4/(nr)} Z ~ (~k,j + sin ~b,j), t

sphere" a contouring program for spherical data - Science Direct

sphere" a contouring program for spherical data - Science Direct

Suggest Documents

Triassico: A Sphere Positioning System for Surface ... - Science Direct

Development of a Spherical Solar Collector with a ... - Science Direct

Development of a Spherical Solar Collector with a ... - Science Direct

ON TilE MOTION OF A SPHERE WITH ARBITRARY ... - Science Direct

A transient spherical source method to determine ... - Science Direct

DATA BANK - Science Direct

Scattering of a spherical wave by a small sphere

A Computer Program (COMPOST) for Predicting Mass ... - Science Direct

A Family-School Homework Intervention Program for ... - Science Direct

The Benefits of Implementing a Program for ... - Science Direct

A Mathematica Program for the Degrees of Certain ... - Science Direct

Data Centers for Physical Research - Science Direct

Spherical $ t_\epsilon $-Designs for Approximations on the Sphere

Spherical metrics with conical singularities on a 2-sphere - Sapienza

Semantic Data Extraction - Science Direct

A New Approach for Data Acquisition Using Wearables - Science Direct

Railway Assets: A Potential Domain for Big Data ... - Science Direct

A Novel Algorithmic Approach for an Automatic Data ... - Science Direct

data basement: a geochemical database for the study ... - Science Direct

A Common Software Framework for Energy Data ... - Science Direct

A Cloud-based Distributed Data Collection System for ... - Science Direct

A New Data Mining Model Adopted for Higher ... - Science Direct

The Institutional Program for Scholarships for Initiation ... - Science Direct

Local quadrature formulas on the sphere - Science Direct