On the Robust Estimation of Power Spectra ... - Semantic Scholar

8 downloads 0 Views 1MB Size Report
Jan 10, 1987 - DAVID J. THOMSON. AT& T Bell Laboratories, Murray Hill, ... Markov theorem [e.g., Kendall and Stuart, 1977, chapter. 19]. For example, linear ...
JOURNAL

OF GEOPHYSICAL

RESEARCH,

VOL.

92, NO.

B1, PAGES

633-648,

JANUARY

10, 1987

On the Robust Estimation of Power Spectra, Coherences, and Transfer Functions ALAN D. CHAVE1 Earth and SpaceSciencesDivision, Los Alamos National Laboratory, Los Alamos, New Mexico

DAVID

J. THOMSON

AT& T Bell Laboratories,Murray Hill, New Jersey

MARK E. ANDER Earth and SpaceSciencesDivision, Los Alamos National Laboratory, Los Alamos, New Mexico

Robust estimation of power spectra, coherences,and transfer functions is investigatedin the context of geophysicaldata processing.The methodsdescribedare frequency-domainextensionsof current techniquesfrom the statisticalliterature and are applicablein caseswhere section-averaging methods would be used with data that are contaminated by local nonstationarity or isolated outliers. The paper begins with a review of robust estimation theory, emphasizingstatisticalprinciplesand the maximum likelihood or M-estimators. These are combined with section-averagingspectral techniquesto obtain robust estimates of power spectra, coherences,and transfer functions in an automatic, data-adaptive fashion. Because robust methods implicitly identify abnormal data, methods for monitoring the statisticalbehavior of the estimation processusing quantile-quantile plots are also discussed. The results are illustrated using a variety of examples from electromagnetic geophysics.

INTRODUCTION

Reliable estimation of power spectra for single data sequencesor of transfer functions and coherencesbetween multiple time seriesis of central importance in many areas of geophysicsand engineering. While the effects of the underlying Gaussian distributional assumptions on such estimatesare generally understood, the ability of a small fraction of non-Gaussiannoise or localized nonstationarity to affect them is not. These phenomena can destroy conventional estimates, often in a manner that is difficult to detect.

residuals are drawn from a multivariate normal probability distribution, then the least squares result is also a maximum likelihood, fully efficient, minimum variance estimate. In practice,the regressionmodel is rarely an accurate description due to departures of the data from the model requirements. Most data contain a small fraction of unusual

model

observations

distribution

or

"outliers"

that

do

or share the characteristics

not

fit

the

of the bulk

of the sample. These can often be described by a probability distribution which has a nearly Gaussian shape in the

center

and

tails

which

are

heavier

than

would

be

expected for a normal one, or by mixtures of Gaussian

Problemswith conventional(i.e., nonrobust)time series distributions with different variances.

procedures arise because they are essentially copies of classicalstatisticalproceduresparameterized by frequency. Once Fourier transforms are taken, estimating a spectrum is the same processas computing a variance, and estimating a transfer function is a similar procedure to linear regression. Because these methods are based on the least squares or Gaussian maximum likelihood approachesto statistical inference, their advantages include simplicity and the optimality properties established by the Gauss-

Two forms of data outliers are common: point defects and local nonstationarity. Point defects are isolated outliers that exist independent of the structure of the process under study. Typical examples include dropped bits in digital data, transient instrument failures, and spike

noisedue to natural phenomena(e.g., lightning). Local nonstationarity means a departure from a stationary base state that is of finite duration

and must be differentiated

from complete nonstationarity, in which the concept of a

Markov theorem [e.g., Kendall and Stuart, 1977, chapter spectrummust be reformulated[e.g., Priestley,1965; Mar19]. For example, linear regressionyields the best linear tin and Flandrin, 1985]. A geophysicalexample of local unbiased estimate when the errors are uncorrelated and share a common variance; this holds independent of any distributional assumptionsabout them. If, in addition, the

nonstationarity is seen in observationsof the time-varying geomagneticfield: most of the time the data statisticsare approximately constant, but this stationary processis inter-

1OnleavefromInstituteof Geophysics andPlanetary Physics, rupted sporadicallyby .brief but intensedisturbancessuch University of California, San Diego, La Jolla; now at AT&T Bell Laboratories, Murray Hill, New Jersey. Copyright 1987 by the American GeophysicalUnion. Paper number 5B5911 0148-0227/87/005B-5911505.00 633

as magnetic storms with markedly different characteristics. In some studies these events are regarded as contaminating noise, and they must be removed to study the underlying process.The influence of these types of outliers on regressionproblems can be complicated,as aberrant data in the dependent and independentvariables produce quite

634

CHAVE ET AL.: ROBUSTSPECTRALESTIMATION

different changesin the output, and correlationsbetween denotedby F (x), where apparent data outliers in both variables can still yield reasonable regressionparameters. In addition, new classesof outliers can occur in linear regression. In any case, it is a serious statisticalerror to blindly acceptmixture situations To make the correspondencebetween theoretical and samof these types and analyze the combination as a unit. ple entitiesclearer, write dF (x) for dxf (x) and indicate With such data, conventional least squares-basedtech-

F(x) - • du f (u)

(1)

--oo

niques will give inefficientand often seriouslymisleading estimates. This breakdown of the least squares model, while sometimes spectacularin form, is more typically insidious in that a seemingly reasonable answer is obtained, and considerableeffort has gone into devising

theempirical or sample cdfbydP(x). For a setof N data samples {x• }, thisis givenby

(x)=

1 2v

(x- )dr

(2)

diagnosticsto detectthis sort of problem [e.g., Belsleyet where 15(x) is the Dirac delta function. Substitutionof (2) into (1) yieldsthe usualempiricalcdf in which eachdata al., 1980; Cookand Weisberg,1982]. Because the Fourier transform of even moderately point correspondsto a stepin probabilityof 1/N. A set of N real samples{x•} may be sortedinto the long-tailed data tends to be Gaussianas the length of the ascending order seriesincreases(essentiallyby the centrallimit theorem,

but seeBrillinger[1981, chapter5] for details),it is some-

x(•) •< x(2)•< .... •< X(2v)

times claimed that outliers in time series are not a serious

wherex V) is calledthe j th order statistic. Note that the power spectrumexamplesof Thomson[1977b], Kleineret probability distribution of the ordered data is different onx(,__ l) al., [1979], and Martin and Thomson[1982]. Further- fromthat of the originaldata;clearlyx(i) depends problem. However, this is often not true, as shown by the

more,coherences and transferfunctionsare substantiallyandx(,+•)evenwhenthe {x•} areindependent.Assuming more sensitive to the presenceof outli•ers,since multiple time series and ratios of spectra are involved. It is frequently argued that a careful analyst will examine a data set and use ad hoc remedies

to avoid outlier

difficulties.

these samplesto be independentand characterizedby the

pdff (x), the corresponding theoreticalentitiesare the set of N quantilesQ;. Theseare foundfrom the inversecdf or quantilefunctionF -1 (a), definedfor 0 •< a •< 1 as the

While this may work for obvious discordanciesin small samples, it is impractical for large data sets or when, as often happens in time series, the outliers have a scale comparable to or smaller than that of the processunder study. It is preferable to use statisticalproceduresthat are robust, in the sense of being relatively insensitive to the presence of a moderate amount of bad data or to inadequaciesin the model, and that react gradually rather than abruptly to perturbations of either. Such methods have been developed over the past two decades and are

solution

reviewedby Huber [1981], Hoaglinet al. [1983], andHampel et al. [1986].

points.

of

--oo

adx f(x) =a

(3)

witha = (j- l/5)/N for j = 1,2,..., N. The Q• dividethe area under the distributioninto N+ 1 probabilityintervals, with the first and last pointsassignedcumulativeprobabilities of 1/2N and 1-1/2N, respectively,and with a probabilitystep of 1/N occurringat each of the intermediate In the following, it will usuallybe assumedthat the data

or at leastuncorrelated.While this In this paper the principles of robust statistics are {x•.}are independent, adapted to univariate and multivariate spectral analysis mayappearstrangein a time seriessetting,the ix,} should within a geophysicalcontext. The treatment beginswith a be thought of as Fourier transforms of a windowed section

review of some critical statistical concepts, especially of data at some frequency,not as the originaldata in the time domain. Explicitly, consider N segments of data robustness in the estimation of location and scale. This is followed by the introduction of the maximum likelihood from a nominallystationaryprocessy (t), with each segor M-estimators for computing robust averagesand solv- ment consistingof T discretesamplesspacedA time units ing robust regressionproblems. After consideringnumeri- apart and offset by an amount b from the precedingone, cal implementation of the M-estimators, some diagnostic and compute plotting methods to help elucidate the extent of outlier contamination or nonstationarity in data are discussed. These tools are then combinedwith the section-averaging method of spectral analysis and applied to the estimation of power spectra, transfer functions, and coherences. The results are illustrated with a variety of examples from natural source electromagnetic geophysics. The paper closeswith a discussionof distributionalaspectsand some suggestionsfor further work. STATISTICAL PARAMETERS AND ROBUSTNESS

Given a continuous probability density function (pdf) f(x), the cumulative distribution function (calf) is

T--1

Xk(co)= •

vty(tA+ (k- 1)b ) etø'tA k= 1,...,N

t=0

where vt is the data window. The quantity Xk(•o) is the windowedFourier transform of the k th data segmentat radian frequency•o. Even if closelyspacedsamplesof the

originalprocess y (t) are highlycorrelated,the {Xk}will be uncorrelated at a given frequencyfor reasonablevalues of the base offset b. Similarly, information in a single data section at frequencies spaced at least the window bandwidth apart is uncorrelated. One of the most useful proceduresin statisticsis the summarycharacterizationof a distributionor sampleusing

CHAVE ET AL.: ROBUST SPECTRAL ESTIMATION

various types of averages. Of these, the most common

one is location,specifying(in a loosesense)the centeror peak value of a distribution or sample; examples include the mean and median. It is less commonly realized that such averagesare the result of minimizing various norms. For example, the distribution mean /x and the sample mean, or average,• are obtainedby minimizing the L2 or least squares norms of the residuals about the distribu-

tions[f ix-

IaF]1/2 and[f Jx--• 12dP(x) I1/2 with

respectto/x and •, respectively.Performingthe minimization gives the familiar expressions

635

= Ix-

Ix-l

(9)

for the L1 norm. A second class of scale estimate is obtained by reapplying the same estimator used for location to its absolute residuals. The median absolute deviation (MAD) is obtained by taking the median value of the absolute residuals about the L• location estimate

SMA D= median{ I Xl- 5: l}

(10)

The expected value of the MAD is the solution O'MADof

F (• + o'MAD) -- F (• - o'MAD) = 1/2

(11)

The interquartile distance is a related scale estimate given by

iF(x)

SIQ= X(3N/4)-X(N/4 )

The sample median 5: is obtained by minimizing the L1

(12)

and is the spacingbetween the 75% and 25% points of the

normof theresiduals fix-5:l ciF(x). Thisis thesame sample distribution, or the center range containing half of asminimizingthe summedabsolute deviations Z I Xl- • I t=l

the probability. The correspondingtheoretical value is

O'IQ=Q3 A- Q1A

and reduces to the middle order statistic for N odd,

(13)

5:=X(lN/21+l), where[]denotes theintegerpart. Thesam- using (3), and is just twice the MAD for symmetricdistriple median is ambiguous for N even but is typically butions. Both the average absolute deviation • and the

chosenas (X(tN/21)+X(lN/21+l))/2. This is the simplest standard deviation s example of an order statistic. The population median is

•, = Ql/2from (3). The L• and L2 norms have the advantagesof being well known and, for the L2 norm in particular, of resulting in algebraicallysimple equations. However, there are few good reasons to consider them exclusively or to believe that they are the best choices except in restricted circumstances.

Much

of

the

work

in

robustness

has

effectively been on finding more appropriate norms for actual data, rather than applying what J. W. Tukey describesas "over-utopian" assumptions. The second most common characterization

of a distribu-

are sensitive to outliers, and • is also inefficient. Further information on the MAD, interquartile distance, and other robust estimates of scale is avail-

able from Mostellerand Tukey[1977]. The simplest extension of these ideas to the general linear regressionmodel is obtained by minimizing a norm of the residuals in p

Xl = E UljfiJ + rl

i= 1,...,N

(14)

j=l

where{Xl}is an N vectorof dataor observations, {Ul•}is an N x p matrixof knowncoefficients, {fi;} is a p vector of model parameters,and {r•} is an N vectorof residuals.

tion or sample is a measure of its width, dispersion, The solution of (14) dependson the norm that is chosen, spread,variability, or standarddeviation, which is included and will not be the same even for the L• and L2 cases, as under the generic term scale. There are numerous avail-

able estimatesof scale;Gross[1976] compares25 variants of 12 distinct forms, and many others have been sug-

gested. Among these are the minimum value of the norms achieved around the corresponding location estimates.

This class includes the theoretical

standard devia-

tion

f --c•o

12 dr(x)

(6)

Ix- l dp(x) =

(7) n=l

for the L2 norm, and the averageabsolutedeviation

&=f Ix-• IdF(x) __•

or its sample counterpart

medianand the scaleestimatesSlQand SMADto outliersis also well-established, and there are many applications where L• norm methods yield dramatic improvements over their L2 norm counterparts. This has led to suggesabsolute deviation

for the

least squares method for many geophysical problems N

__•

It is well-known that least squares estimates are notoriously lacking in robustness, and both the sample mean and standard deviation may be strongly influenced by a singlediscordantdata point. The resistanceof the sample

tions to substitute the minimum

and the sample standard deviation

s=

discussed by ClaerboutandMuir [1973].

(8)

[Claerboutand Muir, 1973]. There are at least three reasons why this course of action is imprudent. First, the Gauss-Markov theorem [e.g., Kendall and Stuart, 1977, chapter 19] establishesthe optimality of the L2 estimator under general conditionson the structureof the regression residuals. No comparableresult is known for the L1 estimator, and it is not difficult to show, for example, that with Gaussiandata, var(•)=vr/2var(•). This reduction in efficiency for the L1 estimator means that about 60% more data is required to achieve equivalent parameter

636

CHAVEETAL.:ROBUST SPECTRAL ESTIMATION oo

uncertaintiesto the L2 estimator. Second,the natural probabilitydistributionfor L• estimatesis the Laplaceor double exponential type, whereas L2 estimatesinvolve the familiar normal distribution.

This makes statistical infer-

ences based on L• results more complexand less intuitively appealing.Finally,for mostdatathe residualsfrom a least squaresprocedureare in large part Gaussian,with

N

M(O) =f p(x-O )dP(x) =•p(r,) --•o

i=1

(16)

Clearly, if p---log f, minimizingM yields the ordinary maximum likelihood result (15). Furthermore, the estimate is identical to that obtained by the norm minimization methods discussed earlier. This equivalence is the

statistics.This suggeststhat some method for treatingthe

basisfor identifyingL• and L2 as the natural norms for the Laplaceand Gaussiandistributions,respectively. Perform-

contamination within the framework of a Gaussian model,

ingthe minimizationof (16) givesthe equation

the addition of a small fraction of outliers having different

rather than outright abandonmentof that model, is the

N

52 q,(r,)= 0

logicalcourseof action[e.g.,Tukey,1975;Mallows,1983]. One obviousway to achieverobustnessis by the rejection of outliers on the basisof some type of statisticaltest, followedby the use of leastsquareson the remainingand presumablygood data. While a substantialamount of

i=1

where ½(x)=Oxp(X) is calledan influencefunction. Equations(16) and (17) reduce to the least squaresor least absolute deviationsforms when p(x)=x2/2+c,

efforthasgoneinto the development of outliertests[Bar- ½(x) = x or p (x) = Ix I+c, ½(x) - sgn (x), respectively, nett and Lewis, 1978; Hawkins, 1980; Barnett, 1983; Beck- yielding the sample mean or the sample median as soluman and Cook, 1983], the bulk of the resultsapplyonly to tions. The formulation in (16) or (17) is equallyvalid for single,isolatedoutliers,usuallyin a parentnormalpopula- morecomplicated distributions, andthesolution • iscalled tion. The dual phenomenaof maskingand swamping,in an M-estimate. which one bad value may hide the presenceof others, has If enoughdata are available,the samplingpdf or its logbeen widely documentedfor the more commonmultiple arithm can be estimated directly from it, resulting in a outlier situation [Barnett, 1983]. Outlier detection data adaptive, maximum likelihood estimate of location. becomeseven more complicatedin time seriesbecauseof In practice,it is rarely feasibleto characterizethe distribucorrelations between the data [Fox, 1972; Abraham and tion tails with the available data, and the loss function is Box, 1979]. As a consequence, methodsare soughtthat usually chosen on theoretical grounds to retain high accommodate outliers and minimize their influence in a efficiencyover a family of expecteddistributionsin prefersemi-automatic

fashion.

ence to yielding the maximum likelihood estimate for a singledistribution. Sincemost data yield largelyGaussian residuals,with a small outlier fraction, it is customaryto

ROBUST ESTIMATION

In addition to heuristic methods, two major classesof robust proceduresare the L-estimators,basedon combinations of the order statisticsand the M-estimators, a variant of maximum likelihood. L-estimates are especially useful for location problemswith nonsymmetricdistributions. In a time seriescontext, Thomson[1977a] applieda maximum likelihood L-estimator proposed by Mehrotra and Nanda [1974] for censored,exponentiallydistributed populationsto get robust, section-averaged power spectra of contaminated data. While simpler than an M-estimator,

this approachsuffersfrom a lossof efficiencybecausethe truncation point must be fixed in advance, so that the result is not data adaptive. L-estimatorsdo not generalize

readily to regressionproblems and are not considered further in this paper.

To motivate the concept of an M-estimator, consider again the problem of determiningthe locationparameter0

use lossfunctionsgivinghigh efficiency(say, > 95%) for normal data but which still provide reasonable protection for contaminated data. This slight loss in nominal efficiencyis the inevitablepenaltythat must be paid to get a stable estimate in the general case. Since typical data contain 1-20% outliers, the increase in sample size needed to compensatefor this rejection is small compared to the extra •60% needed for the L• estimator. Some commonly used loss functions are describedby Holland

and Welsch[1977] and Hampe!et al. [1986]. There is an additional complication that must be con-

sideredwith M-estimators:the solution• of (16) or (17) will not be scale invariant in the sensethat multiplying the

data{xi} by a constantwill not necessarily resultin a simi-

lar changein b. To correctfor this, it is necessary to replace(16) and (17) by N

f (x-0)

ri

min • p(-•)

givenindependent samples {xi} drawnfroma commonpdf

i=1

(18)

by maximum likelihood. The logarithmof the

likelihood function L (0) is obtained in the usual way by

insertingthe data into the samplingpdf, yielding

N

N

logL(0)-- • logf (r,)

and

(15)

ri

• •(-• ) = 0 i=l

(19)

where d is a robust estimate of scale. While joint optimi-

where r, =x•-0

is the i th residual. The strict maximum zation with respectto both location and scaleis described

likelihoodsolutionb for 0 comesfrom maximizingL (0),

by Huber [1981, chapter6] and Hampelet al. [1986,

or its logarithm, and obviouslydependson knowingthe

chapter4], two practicalbut lower efficiencychoicesare

pdf f (x) exactly. A generalization of (15) basedon a quantityp (x), whichis calleda lossfunction,canbe written

dl= SMAD O'MA D

(20)

CHAVE ET AL.: ROBUSTSPECTRALESTIMATION

637

and

(26)

III

d2'-SIQ

_.• 2k k2 Ixl>k

(21)

O'lQ

whereSMAD and $IQare the sampleMAD and interquartile where a value of k--1.5 gives better than 95% efficiency

distance(10) and (12), and CrMAD and O'iQare their for outlier-free normal data. The corresponding Huber function½(x) = x for Ix I < k andk sgn(x) oththeoretical counterparts for the appropriate standard pdf influence

from (11) and (13). Becauseof the extreme sensitivityof the L2 estimate to outliers, use of the standard deviation (6) and (7)is not recommended. The concept of an M-estimator may be extended to the

general linear regressionmodel (14) by identifyingthe error r, as the regressionresidual[Andrews,1974]. While (18) is not changedin form, (19) must be rewrittenas N

ß =½(-•) x•'= 0 j=1,...,p

(22)

where the superscript asterisk denotes the complex conjugate.

A variety of numerical proceduresto solve the generally

nonlinear forms of (18), (19), and (22) exist, but it is easiest to rewrite the problem as a weighted least squares one and iterate to linearize it. This allows the use of fast, accurate,and stable matrix algorithms. The weighted least

squaresforms of (18) and (22) are, respectively, N

min•2 •'i2r,2

(23)

erwise has weights which never descendto zero. Because it has a discontinuousderivative, the Huber function may introduce slight distortions in time series work if used

exclusively. Sincethe weights(26) fall off slowlyfor large residuals, they provide inadequate protection against

severe outliers. However, (25) is a convex function, so that convergence to a local, as opposed to a global, minimum cannot occur in the iterative weighted least squares solution. Use of the Huber weights gives a good starting point for the application of more severe types of weight functions. Another

class of influence functions

Mostellerand Tukey [1977, chapter 10] is a typicalredescending influence function. However, for time series work it has too much curvature near the origin and can

introduceseriousdistortion[Thomson,1977b]. To reduce this effect, Thomson [1977a] proposeda new weightfunction

i--1

w(x) = exp{-e•(Ixl--•)}

where •v2--p (r/d)/r2and from

N

•, wirix•*= 0

(24)

i=1

where w--½(r/d)/r.

Note that this proceduregivestwo

different weights: the •v• from treating the loss function formulation as an equivalent weighted least squares problem, and w• from its equivalencewith the influence function. While distinct, they are not always clearly separated in the literature. In addition, w or ½ are often chosena priori, implicitly defining •v and p. To linearize the weighted least squares problem, an initial solution is

obtainedusing ordinary (unweighted)least squares,and both the residuals and a scale estimate like (20) or (2!) are computed. The weights are calculatedfrom these, and

the solution to (23) or (24) is found. This procedureis repeated using the residuals and scale estimate from the previous iteration at each stage until convergence is achieved. Note that the weights are data adaptive; the robust formulation ensures that data corresponding to residuals which are large compared to the scale will be downweighted.

There are several forms of the weights in (23) that work well in spectral problems. The first of these was

introducedby Huber [1964] on theoreticalgroundsand is based on a density function with a Gaussian center and Laplacian tails, yielding

p(x)--

and a weight function

Ixl k

2

- klxl

is called redescend-

ing because the influence function and weights approach zero in the presence of large outliers. The biweight of

k2 2

Ixl > k

(25)

a heuristic

extension

of the extreme

(27) value

distribu-

tion. The parameter fi determines the scale at which downweighting begins. While strictly empirical, the N th quantile of the appropriate probability distribution from

(3) is an excellentchoicefor fl. This form hasthe advantages of being smooth, close to unity near the origin, and including an implicit dependencyon the number of data in the weighting procedure; as the number of samples rises, extreme values become more common, and fi must increase to avoid affecting valid data. For example,

fi -- Q•v corresponding to N= 103, 104, 105, and 106 for a normal distribution are 3.09, 3.72, 4.26, and 4.75 standard deviations, respectively. However, as with other rede-

scendinginfluencefunctions,the solution of (23) or (24) with (27) is not unique, so this weight should only be used after a good starting value has been found. NUMERICAL

CONSIDERATIONS

Equations(18) and (23) are generalizations of the usual linear regressionstatement (14), while (19), (22), and (24) are generalizations of the familiar normal equations. Since the numerical ill-conditioning of the normal equations is well-known, it is best to solve the matrix form

(23) by a numericallystablemethodsuchas QR or singular value decomposition[e.g., Lawsonand Hanson, 1974]. The QR decompositionmethod of solving (23) is used throughout this work. Since spectralanalysisinvolves the Fourier transform, the matrices in (23) are complex. Complex least squares proceduresare reviewed by K. S.

Miller [1974] and are availablein standardpackages(e.g., CQRDC and CQRSL in LINPACK [Dongarra et al., 1979]). Alternately, the problem may be broken down

638

CHAVE ET AL.: ROBUSTSPECTRALESTIMATION

into real and imaginary parts and solved using standard 1982], a simpleand effectivemethod is basedon the order real algorithms. To avoid problems with numerical statistics.The quantile-quantile(q-q) plot servesto simulround-off, it is recommended that 64-bit arithmetic be taneously yield indications of goodnessof fit, sketch the utilized. extent of outlier contamination, and provide the key locaAdditional modes of failure in the solution of regression tion and scale parameters. An important property of the equations from collinearity of the coefficient matrix and N orderstatisticsof a data sample{x•} is that they divide similar effects are possible[Belsleyet al., 1980], and a the area under a pdf into N+ 1 parts of not necessarily method for checking for these based on the condition equal size. The q-q plot is obtained by comparing the

numberof uo is givenbyLamerot# etal. [1986].To illus- quantilesof a specifieddistribution,Q;, whichdo divide trate, consider a geomagnetic example where x, is a the area under the pdf into equal-sizedpieces, to the order Fourier-transformed electric field from the i th data seg- statistics x V). If the latter are drawnfrom the assumed ment. As independent variables in (14), take the distribution, they will be similar to the quantiles, and the correspondingtransforms of the p- 3 magneticfield vari- q-q plot will be an approximately straight line. Systematic ablesuil=/-/•, ui2=Di, andui3=Z, andsolvefor the{/•;}, departures from a straight line show that the model is or impedances. If H, D, and Z are linearly independent, inconsistent and can be used to guide the search for a a solution will exist, and collinearity is not a problem. better one. Outliers usually appear as deviations from a However, in the presenceof complex geologicstructureor line at the extreme quarttiles. Finally, the slope and intersource field inhomogeneity, Z may depend on H and D cept of the line give the scaleand location parametersfor the experimental distribution. In the present work, q-q through the tipper functions Ta and To defined by plots of the residuals from a robust procedure are examZ = Tall + ToD ined as a function of frequency. However, instrumentain which case the coefficient matrix would be theoretically tion problems, such as stuck bits, often appearas plateaus of rank 2 (or collinear), and the numerical solution of or staircasesin q-q plots of the raw data. For a lucid dis(14) would be unstable. This problem is flagged by large cussionof q-q plots and their use, see Kleiner and Graedel values of the condition number, and a better solution is [1980] or Lewis and Fisher [1982]. Other regressiondiagnostics,such as plots of the residobtained by taking p-2 and, as is the usual practice, analyzing the vertical magnetic field separately. Note also ual against the input or output power, become unmanagethat the presence of outliers in data can induce collinearity able if applied directly at all frequenciesbut can be tried at into an otherwisestable systemof equations[Masonand specificfrequenciesof interest. Another worthwhile check Gunst, 1985]. For methodsto help decide when (and requiring only one additional plot is investigationof the how) to truncate near-collinear matrix systems,see Law- condition number of the coefficient matrix as a function of frequency. The condition number is just the ratio of the sonandHanson[1974, chapter26] or Vogel[1986]. The matrix elements in the normal equations in a time largestto smallestsingularvalues at each frequencyand is series setting may be identified as the auto-spectra and availabletrivially if (23) is solvedwith the SVD method. cross-spectraof the data and coefficient variables. This Estimates of the condition number are also available from may lead to the temptation to estimate these quantities QR solutions. One important aspect of robust techniques is their abilindependently and robustly, followed by solution of the normal equations for the transfer functions, rather than ity to identify unusual data. Typically, anomalous effects treating the entire problem as a unit using a common set are either localized in frequency or in time and may occaof weights as describedin the last section. While such a sionally be localized simultaneously in both. A common procedure will result in good estimates of the individual type of outlier is local nonstationarity and may be detected matrix elements, the resulting matrix structure may be by the stationaritytest given by Thomson[1977a,b]. This seriously in error because the normal equations encoun- is Bartlett's M-test (no relation to M-estimates) for varitered in spectral applicationsare often marginally condi- ancehomogeneity[Bartlett,1937] computedas a function tioned even with ideal data, and small changes in the of frequency matrix elements induced by the individual robust proN 1 N cedures can easily result in nonphysicalsolutions. Such an M (ro) = N v log v• log• (6o)(28) j=l j=l approachis discouragedeven more than use of the ordinary normal equations.

where•. (w) is thespectral estimate forthej th datasec-

DIAGNOSTICS

It is useful to develop diagnostic procedures that disclose the extent of outlier contamination and give a visual indication of goodnessof fit to a specifiedprobability distribution. These find application both in the early stages of the analysis,when decisionsabout the suitability of particular statistical models must be made, and as the final step in robust estimation, when the efficacyof the method in reducing outlier influence must be assessed. While a myriad of techniques based on residual plotting have been

proposed [e.g., Belsleyet al., 1980; Cook and Weisberg,

tion at radian frequency •o and there are N individual estimates, each having v degreesof freedom, typically, 2 for raw spectra. If the series is stationary, M will be distri-

buted approximatelyas X•2__•. Excessive variability between sets appears as large values of M, while narrowband processes produce values that are smaller than expected. Unusual sections may be identified by observ-

ingthechange in M whena given•k (w) is deleted. In this case, it may prove simpler to examine the variance or total

powerin the raw spectralestimates(i.e., the integralover frequency of the spectrum) against the subset index to find anomalous

sections.

CHAVEET AL.:ROBUSTSPECTRAL ESTIMATION

Another useful procedure to detect sections of a data series which are different is based on the innovations vari-

639

throughout this paper; this gives over 100 dB of bias protection outside an inner domain of full width 8/(T/X ),

where T is the number of samples in the data section and the difference between an observation of a process and a /x is their spacing, but shows partial correlation of the linear predictionof it basedon all of the earlier data. The result inside that band. The correlation is the value of the one-step prediction variance is treated in detail by Priestley equivalent lag window at the subset offset; see Thomson

ance. This is just the variance or power associatedwith

[1981, chapter10]. In the presentcontext,an estimateof

[1977a, section3.3] for details. The raw spectralestimates

the innovations

obtained in this way serve as the input to an M-estimator,

variance is

as described in the next two sections.

o-•,k = exp

&o1og•k (•o)

Cases where only a few disjoint sections of data are

(29) available,or where the whole time seriesis short (in the

--toN .

sense that the frequencies of interest are comparable to

where •oN-is the Nyquist frequency. Higher than normal values for the innovations variance suggest a change in the structure of the underlying processbecause it cannot be predicted adequatelyfrom earlier data. In addition, the ratio of the innovations and ordinary variances is useful as a measure of the relative predictability of the processand is also scale invariant.

the reciprocal series length), have traditionally been treated by band averaging with variable width smoothers and are not amenable to the robust procedures of this paper. In any case, band-averagedspectrahave a low variance efficiencyif a low-bias data window is employed, and

the multiple prolate window method of Thomson[1982] offers vastly superior performance without a significant loss of bias protection. Robustification of the latter is feasible under some circumstances

SPECTRUM ESTIMATION

and will be treated in a

future paper.

A major problem in time seriesanalysisis the choice of an algorithmthat yieldsa spectralestimate,given a finite observation of the processof interest, such that the result is not badly biased, yet remains statisticallyconsistent and efficient. That these requirementsare usually in conflict is attested to by the plethora of techniquesin the literature. For the purposesof this paper, only nonparametric estimates will be considered, eliminating those in which a

Nonparametric spectral estimates formally characterize only purely nondeterministic, stochastic processes. In many practical cases, additional deterministic signals, or componentswith a bandwidthcomparableto the reciprocal series length, and so apparentlydeterministic, are present in a time series and can complicate the spectrum estimation problem considerably. Special methods are required to handle these even for contamination-free data [Thom-

specificfunctionalform for the spectrumis assumed(e.g., son, 1977a,b, 1982], and attentionto the presenceof line maximum entropy). In addition, only direct estimates terms (periodic signals) is required when using robust based on the discrete Fourier interest.

transform

are of immediate

techniquesto avoid treating them incorrectly as outliers.

Computation of a direct estimate consistsof the follow-

ROBUST ESTIMATION OF POWER SPECTRA

ing steps:(1) taperinga data sequenceor a subsetof a data sequence(either the raw data or the residualsfrom a Robust computationof power spectrais a specialcaseof prewhiteningoperation) with a data window, (2) taking the simple location parameter problem discussedearlier. the discreteFourier transform,(3) convertingthe resultto At each frequency a set of independent raw power spectra a cross-spectrumor auto-spectrum by a suitable multiplica- are to be averaged together using an M-estimator; for tion, (4) smoothingthe result to achieve statisticalcon- slowly varying spectra, additional band averaging can be sistency,and (5) correctingfor any prewhitening. The incorporated by combining several adjacent frequencies smoothing operation may be done by some combination of

convolution with a second type of data window (bandaveraging)and combininga set of independentraw estimates computed from a longer data sequence (sectionaveraging). It is usually assumedthat the data window primarily controls bias, while the smoothing operation primarily controls variance, but interactions between the two procedures do exist.

In applying robust M-estimation to spectral problems, only the section-averaging approach will be used. The data are assumed to consist of long sequencesof contiguous values that may be subdivided into smaller pieces of equal length. A data window with good bias characteristics is then applied to the subsets with enough overlap between them to yield high efficiency,yet ensure approximate independence of the raw spectra. For this purpose, the superiority of the prolate spheroidal sequencesas data

from each raw spectrum. It should be remembered that outlier contamination of some of the spectral subsets can only result in a power spectrum that is biased upwards since it is not possible to subtract from the power in the backgroundprocess. This means that only individual estimates deviating in a positive sense from the current robust average are downweightedduring the iterative processing. The robust algorithm employed for univariate power spectrais as follows: given a set of spectralsectionsto be averaged on a frequency-by-frequencybasis, an initial robust solution is obtained from the sample median and used

to find

both

the

residuals

and

a scale estimate.

Either the MAD or interquartiledistancescaling(20) or (21) yields satisfactoryresults, althoughthe latter offers a slight computational advantage. An iterative solution to

the weightedlocationproblem (23) is then soughtusing windows is well-documented[Thomson,1977a; Slepian, the weight function (27) modified to affect only data 1978]. A prolatedata window with a time-bandwidthpro- whose scaledresidualsri/d exceedthe robust averageby a duct of 4 and 70% overlap between estimates is used

critical amount (i.e., the absolute value operation in the

640

CHAVEETAL.:ROBUST SPECTRAL ESTIMATION

new weight parameter may be taken directly from the ordinate of the q-q plot as the point where the distribution tails begin to be evident.

10 4 10 2

10 0 10-2 I

I

IIIIIII

I

10-1

I

IIIIIII

I

10 0

I

IIIIIII

I

I

1

101

Frequency (cph) Fig. 1. Conventional (top curve) and robust (bottom curve) power spectrafor the time variations of the north magneticfield at Victoria, British Columbia, Canada, during July 1982. The nonrobust spectrum is the arithmetic average of 42 independent pieces of data, with additional band-averagingyielding 84 estimates over 0.1--0.5 cph and 168 estimates over 0.5--30 cph. Becauseof severe nonstationarityit actually possesses only 10--20 degreesof freedom. The bottom line is the robust averageof the same raw spectra, such that the high-amplitude magnetic storms are eliminated, yielding a much higher equivalentdegreesof freedom (see text).

The robust power spectral method is best illustrated with some examples. Figure 1 compares robust and conventional spectra of the north magnetic field time variations at Victoria Observatory for the month of July 1982. The data are typical of the mid- to high-latitude geomagnetic field, consisting of a quiet background component interspersed with violent, short-duration storm activity. The latter comprise only 10-20% of the data but exhibit power levels that are easily a decade above the back-

ground. The spectra in Figure 1 were computed by averaging,both arithmetically and robustly, raw estimates of 2 daysduration. The conventional,arithmetic-averaged spectrum is about a factor of 10 larger than the robust spectrum and displaysmuch greater point-to-point variability. Owing to the data adaptive weighting, the nominal number of estimates per frequency for the robust spectrum is variable but lower by a factor of about 0.9 compared to the conventional type. An argument ignoring robustness would imply that the equivalent degrees of

freedom per frequency (about 90 in this example) is

higher for the straight arithmetic-averagedspectrum, but this is contradictedby the higher variability seen in Figure answer does not change substantially, typically after 3--5 1. In fact, the conventionalestimate is dominatedby only iterations. a fraction of the data, so that it has only 10-20 degreesof If outliers have been eliminated, power spectra are the freedom at most, accounting for the larger uncertainty. sums of squares of almost normally distributed variates, The robust spectrum is a much better measure of the and hencedistributedas chi-square(X2). It is crucialto time-averagedbehaviorof the geomagneticfield, while the the correct operation of the nonlinear M-estimator that conventional spectrum is little different from that given by the proper distribution be used in getting the theoretical analysisof only the storm time data. MAD and interquartiledistancein (20) and (21). Using The q-q plots in Figure 2 illustrate the statisticalnature the discrete Fourier transform, each raw power estimate of the robust averaging procedure. For conveniencein possesses2 degreesof freedom at each frequency, neglect- plotting,the raw and weightedestimatesin eachfrequency ing the DC and Nyquist components, which have only 1 bin have been scaledso that their sum of squaresis 8, the

exponentis removed). Convergenceis achievedwhen the

degreeof freedom. The X2 distributionwith 2 degreesof value expectedfor the secondmomentof a X22variate. freedom(X•) is equivalentto the exponential distribution The top plot showthe original(unweighted)q-q plot for a and has a pdf given by the standardform [Johnsonand few frequenciesin the range 0.3-0.4 cph which are typical Kotz, 1970, chapter18] of the entire spectrum. The infrequent but intense storm activity gives a typicallong-taileddistribution. The shape f (x) = •,5e--x/2 (x >• 0) (30) of the weight function causesthe bottom q-q plot of the for which the median • is 21og2, the MAD itI,MAD is final, weightedpower estimatesto be slightlyshort-tailed. 2sinh--•(•h)• 0.9624, and the interquartile distance is Increasingthe weight parameter/3 would bring this closer

O-iQ---21og3• 2.1972. The quantilesof the exponential to a true X• result. However,the short-tailednatureof distribution are given by

the result does not appreciably alter the robust spectrum

of Figure 1, especiallyon the logarithmicscaleover which N significantpower changesoccur. j= ],...,N (31) log N-jq-•A The stationaritytest (28) was computed using the 2day-long raw sectionestimatesof Figure 1. The value of Using the Nth quantileof the X• distributionas the M was about 130 from the lowest frequency to about 0.2 weightparameter/3in (27), robustaveragingbecomesan cph, decreased slowly to a value of about 70 at the adaptive procedure that operates automatically in all save Nyquist frequency, and did not display any structure that exceptional cases. Detection of such exceptional cases is was localized in frequency. The expected value of M is facilitated by examination of the q-q plot of the final, 64.3, indicating strong nonstationarityat low frequencies. weighted data. If the weighting has been performed The reduction of M at high frequenciesis causedby a rise correctly, then the final q-q plot of the weighted spectral in the noise level as the spectrum decreases;instrument estimates(after discarding valueswith zero weight)against noise is generallyquite stationary. Further examinationof

the X22quantiles(31) shouldapproximatea straightline.

the series shows that

the detailed

form

of the nonsta-

If it remains long-tailed, then the outlier fraction exceeds tionarity is complicated. The ordinary variance in each a typical value of 10-20%, and the weight parameter/3 section of data normalized to its average over all of the must be reduced. Since the scale parameters (20) and sectionsis plotted againsttime in Figure 3 (top). The (21) are chosento be consistent with a X22distribution,the subset variance exceeds twice the average value at five

CHAVE ET AL.: ROBUSTSPECTRALESTIMATION

641

Figure 4 shows both conventional and robust spectra of

the electricfield variationsin the frequencyband 10--4 to

16

1 Hz collected on Adams Mesa, Arizona, in 1979. The differences between the robust and nonrobust

12 ll

8 4 0

0

4

8

X2 Quanfile 16 u

ROBUST ESTIMATION OF TRANSFER FUNCTIONS AND COHERENCES

12

m

8

ß

4

o

0

results are

more substantial than for Figure 1, especially at the highest frequencies, where the conventional spectrum is dominated by outliers. Note in particular the oscillatory nature of the conventionalestimate, rememberingthat it is plotted on a logarithmic frequency scale. This behavior is typical of a few large outliers in a singlesubset. Figure 5 shows original and final q-q plots for two frequency intervals, 0.010-0.013 Hz and 0.200-0.204 Hz. The former band shows typical long-tailed behavior caused by a small fraction of outliers that is easily correctedby the robust method. The higher-frequency interval also shows the effects of a few extremely large outliers; these are again removed by the robust averaging procedure with a dramatic effect on the spectrum.



I

I

I

I

The robust computation of transfer functions is the frequency-domain equivalent of multivariate robust linear regression, while the coherencesare similar to the correlation coefficientsof statistical inference theory, and are derived from the output and residual powers obtained during the M-estimation procedure. The spectral problem differs from more conventional

X2 Quanfile Fig. 2. Quantile-quantileplots for the conventional(top) and robust (bottom) spectra of Figure 1 in the frequency range

ones because the data are

complex rather than real numbers, and there are at least two frameworks

within which determination

of the robust

scale and iterative reweighting may be performed. In the

0.3--0.4 cph. The plots show the N ranked raw power estimates

in a given frequencybin and scaledso that their sums of squares

is 8 against the N quantiles of theX22distribution; eachplotted symbol correspondsto a single data section, and different symbols correspondto different frequency bands. The nonrobust q-q plot

c• 6

at thetopshowsa largely X22population, withan additional longtailed component that is attributed to brief but intense magnetic storms. The q-q plot at the bottom shows the effect of adaptive weighting of the data, eliminating the storms and yielding a

slightly short-tailed X22result. separate and isolated places; these are associated with small magnetic storms. The innovations variance from

(29) normalizedby its averageover all of the datasections is shown against time in Figure 3 (bottom). This was much higher than the mean for extended periods after the

five events of Figure 3 (top), suggestingcomplex and long-term changesin the underlyingprocess. The innovations variance is also higher between days 14 and 20 without any obvious associationwith the power, implying a different type of change in the structure. The ratio of the innovations and ordinary variances stayed near its median value during the high power event at day 3, imply-

O 2.0

, o

ing that the structureof the processdid not changemuch even though the ordinary variance increased dramatically. However, the processwas altered after this event, as evidenced by an increase in the ratio, and returned slowly to normal with time. The largest overall change in the structure of the processoccursduring a 4-day interval near the center of the record where the relative prediction variance decreasesby a factor of 4 from its median value; this is

alsoapparentin Figure 3 (bottom).

lo

20

30

TIME(doys) Fig. 3. The top panel shows the total power or variance in each windowed data subset used in obtaining Figure 1 as a function of the center time of the subset.

These values have been normalized

by the average power for the entire data series. The bottom panel shows the innovations variance for the different subsets against the center time of the subset.

The innovations

variance has been

normalized by its mean over all of the subsets. See text for details.

642

CHAVE ET AL.: ROBUSTSPECTRALESTIMATION

-r •

ence reinforces this: in severely contaminated data, treating the complex magnitude, rather than the independent

10 2

real and imaginaryparts, yields more consistenttransfer

0

E 10 •,--10-2

functions and better convergence. Only the Rayleigh method is used in this paper.



putation is similar to the standard procedures described earlier. The data input to the M-estimator are a set of k raw section Fourier transforms for a single output data

The robust M-estimator

10-4

10-6 [ I IIIIII

I

I

IIIIIII

10-5

I

I

10-2

I IIIIII

I

I

used for transfer function com-

series{Xk}andsimilarsetsfor p inputdataseries

I II

both are also parameterizedby frequency. The solution is initialized by computing an unweighted least squaresresult

10-1

Frequency (Hz)

using QR decompositionon (14), from which residuals

Fig. 4. Conventional(top curve) and robust (bottom curve) power spectraof the north electricfield collectedon Adams Mesa, Arizona. The nonrobust spectrumis the arithmetic averageof 43 independent raw estimates, with additional band-averaging increasingthis to 86 estimatesover 0.01--0.1 Hz and 172 estimates over 0.1--0.5 Hz. The robust spectrumhas •70 degreesof freedom over 0--0.01 Hz, •130 degrees of freedom over 0.01--0.1 Hz, and •250 degrees of freedom over 0.1--0.5 Hz. The severe effect of outliers is apparent on comparing the two results, especiallyat the highestfrequencies.

and a scale estimate (20) or (21) are obtained. The Huber

weights(26) computedusingthe scaledresidualsare then appliedto the rowsof the matrix regression problem(23). The weighted regressionproblem is solved, and the entire processis repeated until the total residual power does not change below a threshold value. This proceduregives a final scale estimate and an initial set of residuals for use in

the last part of the algorithm. This is also based on iteratively reweightedleast squaresbut uses the weight function (27) and a fixed scale estimate derived from the final first instance,the data (i.e., the raw Fourier transforms) Huber iteration, again terminating when the residual may be regardedas having independentGaussianreal and power does not vary. A modification of this procedure imaginary parts, so that separate weights are applied to which substitutes an L• simplex programming algorithm them. In the second case, only the magnitudes of the for the iterativi• Huber one has also been used successcomplex numbers are considered,and identical weights are fully. The residualsfrom an L• solutionare found and applied to the real and imaginary parts of the data. Zeger their MAD is used to get a final scaleestimate. Iteratively [1985] has shown that the latter choice,which has a Ray- reweightedleastsquaresusingthe weightfunction (27) is leigh distribution, is preferable becauseit is rotationally then used until the residual power does not change sub(phase) invariant. It is also more conservativein that an stantially. The standardprobabilitydistribution for a variate which outlier in either the real or the imaginary part of the data will result in downweighting. Extensive practical experi- is the square root of the sum of the squares of two

16

o15



12

10

8

ß

5

4

o

0

0

I

o

4

I

I

8

X2 Ouoni'ile

X2 Ouonfile o

•-

20

:.r_16

:• 15

212

• 10

m 8

g 4

5 0

o o

4

8

X2 Ouanfile

0

0

4

8

12

X2 Ouanfile

Fig. 5. Quantile-quantile plotsfor the nonrobust(top panels)and robust(bottompanels)powerspectraof Figure4. The left-hand side shows the results in the 0.010--0.013 Hz band, while the right-hand side covers the range

0.200--0.204 Hz. In bothcasesthe originaldistribution is a long-tailed X:22 typeandis changed to a slightly shorttailed one by the robust averaging. The effect of only a few severe outliers on the spectrumis seen at the higher frequenciesand by examining Figure 4.

CHAVE ET AL.: ROBUSTSPECTRALESTIMATION

643

where Sx• and Srr are the weighted output and residual

1.0

powers



08

N

E I Sx•= •=•N • w,2

i!

--c 06'

0.4

i--1

N

0.2 0.0

E w,2lr,I• Srr = i=lN • w,2

_

I

I 1111111

I

10 -1

I IIIllll

I

I IIIIIII

10 ¸

I

I I

101

and (34) is understoodto be zero if Srr> Sx•. The weight w, is computedusing (27) and the scaledresidualsfrom the last iteration in the weighted linear regression, and setting wi- 1 yields the conventional, nonrobust coherence. Note that (34) reduces to the amplitude of the ordinary magnitude-squaredcoherencefunction when there is only

lO

ß08/

i•

iI

Ii

one input data sequence (p=l). The weighted power spectrumin (35) is not the same as the robust result that

o

C.) ,•,,,'

,

• 04

is found using the methods of the last section becausethe weights are derived on the basisof regressionrather than

=02

10-1

(36)

I----1

Frequency (cph) ß

(35)

10 ¸

location residuals. The processesproducing large regression outliers may be different from those that generate anomalous power spectra; in particular, the local nonstationarity seen in Figure 1 may not lead to regressionproblems if the data and coefficient variables change in similar

101

Frequency (cph)

ways. In addition, new classesof regressionoutliers can occur that do not affect power spectra, and the origin of

Fi•. 6. Conventional (dashed lines) and robust (solid hn½s) these outliers can be difficult to determine. squared mulfip|½coherenceplots for the north magnetic fi½|d at Figure 6 compares the robust and nonrobust squared two seafloorstationso• Vancouver Island, site A (bottom) and site B (top), with al| three magneticfi½|dcomponentsat ¾JctorJa multiple coherence(34) betweenthe north magneticfield Observatory. The data have been contaminated by an instrument fault, accounfin• for the low conventional coherence between ! and $ cph. Robust r½•r½ssJon completelyremoves this ½•½ct. See text for details.

independentGaussian variablesis Rayleigh, a transformed

at two seafloor

sites off Vancouver

Island

and all three

magnetic field componentsfrom the standard observatory at Victoria. The seafloor data were contaminated to vary-

ing degreesby a nearly sinusoidalcomponent of about 1hour period that was later attributed to magnetized tape cassettes in

the

instrument

data

recorders.

Both

the

versionof the X22distribution.The pdf is givenby [John- period and the amplitude of the offending signal decreased sonandKotz, 1970, p. 197] continuously over the 1-month-long record, making it a

f (x)----xe--x2/2(x>•0)

(32)

Using (32), the median and interquartile distance are

•=•/21og2 andO-•Q=•/21og4--•/21og4/3. TheMAD is a solution of the transcendentalequation

2sinh • O'MA D)= e(•rMAD)2/2 yielding a value of •0.44845.

Q; =

(33)

As for the other robust methods, the N th quantile from

(33) servesas the weightparameter/•in (27). The robust squared multiple coherence of the output with the p input time series is computed on a frequencyby-frequency basisfrom

,,2 Sxx-Srr

3'-

$x•

ministic component. Site B (top) was more heavily affectedthan site A (bottom), and the noise component was readily visible as a large amplitude sinusoid in the former case but only produced a slight fuzziness in the site A record. In either case, the conventional coherence (dashedlines) is reduced substantiallyby the contamina-

tion, while the robustalgorithmhas readily'eliminatedits

The quantiles are

2log N-½-•!/2 j --1,..., N

frequency-localized outlier, rather than simply a deter-

effects. The conventional coherence possessesabout 100 nominal degrees of freedom per frequency, while the robust coherence has 85-90 degrees of freedom outside the zone of contamination

and somewhat fewer inside it.

Note also the presence of harmonics of the • 1 cph fundamental. Because of instrument drift, a reduced signal level, and the electromagneticeffectsof internal waves the coherence falls off at low and high frequencies in both examples: the robust method does not artificially increase the coherence when the underlying process is itself incoherent. This example illustrates the ability of the

(34) robust algorithmto handle both gross(site B) and subtle (site A) outliers in an automatic fashion.

644

CHAVE ET AL.: ROBUSTSPECTRALESTIMATION

•-

•- 2

0.0

1.5

3.0

,

o0

0.0

Rayleigh Quanfile

1.5

3.0

,

Rayleigh Quanfile

•- 2 0

I

0.0

1.5

o 0 0.0

3.0

Rayleigh Quantile

1.5

3.0

Rayleigh Quantile

Fig. 7. Unweighted(top) and weighted(bottom)residualq-q plotsfor the resultsin Figure5 computedusingthe Rayleighresidualmethod describedin the text. The left-handside showsthe initial long-tailedand final, slightly short-tailed distributionsfor the site A data over 0.97--1.14 cph. The right-hand side showsequivalent values over 1.2--1.4 cph for the more severely contaminated site B data.

Figure 7 shows sample Rayleigh q-q plots for the site B Figure 10 comparesthe original and final q-q plots for data over 1.2--1.4 cph (right panels) and for the site A two frequencybands obtained using the Rayleigh residual data over 0.97-1.14 cph (left panels). For conveniencein method to get Figures 8 and 9. At the lower frequency plotting, the residualsin each frequency band have been (0.0013-0.0017 Hz) the contamination consists of a few scaled so that the sum of squaresis 2, as expected for the second moment of a Rayleigh-distributed variate. In both cases the contamination causes the original, unweighted q-q plots to be extremely long-tailed; this is especiallyevident for the site B data. The final, weighted q-q plot shows a slightly short-tailed Rayleigh distribution for the residuals. There is a suggestionthat a smaller value of the weight parameter ,6 would improve the result at site Bg this is corroborated by the dips in the robust coherence

large outliers, and the conventional coherence is degraded only a little. The final q-q plot shows a typical short-tailed Rayleigh result. In the higher-frequency band of 0.0063-0.0067

Hz

the

nonrobust

coherence

is substan-

tially lower and the outlier contamination much worse, but the robust algorithm is still successfulin removing them with a concomitant improvement in the coherence and transfer

functions.

between 1 and 3 cph in Figure 6 (top). Such tuning is normally not required, but this is an exceptional case becausethe interfering signal is both strongand persistent. Figure 8 shows robust and conventional coherences between

the north

electric

field

collected

on the bed of

Lake Washingtonnear Seattlein 1981 (A. Schultz, private communication,1985) and two horizontalmagneticcomponents from a nearby land site computed using 223 raw estimates. The roll-off at low and high frequencies is caused by the filters applied during data collection, but outlier effects seriously degrade the conventional coherence

estimate

between

0.001

and

0.03 Hz.

The

robust

method yields a coherence of better than 0.99 over the same frequency band. Figure 9 compares the nonrobust and robust

transfer

functions

for the same north

and east magnetic components; this is just an element of the magnetotelluric response function. Outlier noise in the electric field causes the conventional

FREQUENCY (Hz)

electric

transfer function

to be excessively variable and consistently biased downward. By contrast, the robust transfer function is much smoother over the frequency interval of Figure 8 where the coherenceis high.

Fig. 8. The conventional(dashedline) and robust (solid line) squaredmultiple coherencebetween the north electric field at the bed of Lake Washington and the two horizontal magnetic field componentsfrom a nearby land site. There are 223 raw estimates used in both cases,but the robust coherencehas fewer than 446 degrees of freedom because outliers are downweighted. The robust squared coherence exceeds 0.99 over most of the range 0.0005--0.02

Hz.

CHAVE ETAL.'ROBUST SPECTRAL ESTIMATION

645

DISTRIBUTIONS AND RELIABILITYOF ESTIMATES 0.4

A detailed discussionof the distributions for estimates

of power spectra,coherences,and transfer functions is beyondthe scopeof this paper,andthe subjectis covered

0.2

in detail by Brillinger [1981]. Traditionally,tests of hypotheses or the placement of confidence intervals on

spectralparametershave requireddistributionalassumptions, with the complexityof the exactdistributionsneces-

sitatingfurther simplification and the use of asymptotic forms to give a tractableresult. However,the standard

-0.2 Ill

assumption of independence and identical statistics for

eitherthe dataor the regression residualsimpliesfreedom from outliers. This suggests that the robustspectralestimateswill matchthe statistical modelmore correctly,so that more realistichypothesistestingcan be performed usingthem. For the robustestimators of thispaper,reasonableapproximations are obtainedusingGaussian-based

0.4

0.2

statistics if

0

N

Neff(to) = • wi2(to) i=1

-0.2

is usedas the effective or nominalnumberof estimates,

[11

wherewi(to) is the robustweightfor the univariatecase. 10-4 10-3 In the multivariatecasethisnumberis averaged overthe FREQUENCY (Hz) p inputs,and Neff is reducedby a factorof p-1. This result is usuallynot as reliableas the jackknifedvalue Fig. 9. The real (top) and imaginary(bottom)partsof the con- mentionedbelow. In either case, variancecalculations ventional(dashedlines)and robust(solidlines)transferfunctions causedby the overlapof the between the northelectric andeastmagnetic fielddataof Figure mustincludethe correlation 8. The nonrobust androbustcurveshavebeenoffsetby 0.4 for clarity. The robust result is smootherthan its conventionalcoun-

terpart,whilethe latteris alsobiaseddownwardby outliernoise.

o

windowed subsections, about25% for the prolatetaper

with a time-bandwidthproductof 4.

In keepingwith the nonparametric philosophy adopted

0 6 6 4

0

1.5 3.0 RAYLEIGHQUANTILE

0

1.5 3.0 RAYLEIGHQUANTILE

Fig.10.Unweighted (top)andweighted (bottom) residual q-qplots fortheresults inFigures 8and9 using theRay-

leighresidual method described in thetext.Twofrequency bands areshown, 0.0013--0.0017 Hz (leftpanels) and 0.0063--0.0067 Hz (rightpanels). Theinitial result istypically long-tailed, andtheoutlier fraction islarger atthe higher frequency. Theadaptive weighting makes thefinalresiduals slightly short-tailed.

646

CHAVE ET AL.: ROBUSTSPECTRALESTIMATION

in this work, it is possibleto compute error and bias esti- nonzero expected value and the distribution becomes nonmatesby use of the jackknife method [R. G. Miller, 1974; centralX2 [Johnson andKotz, 1970,chapter28], and locaElton, 1982]. The jackknife is basedon N repeatedana- tion and scale estimates must be made separately. Similar lysesof the data using N-1 subsetseach time, in addition to the original estimate using all N subsets. Appropriate combinations of these N+ 1 estimates give values for both the bias and the variance that are valid under a wide range of parent distributions and estimation procedures,and are generally considered to be more reliable than error esti-

arguments apply to the transfer functions and coherences, although their distributions are substantiallymore complicated in the noncentral case. In other instances, deter-

ministic componentsmay be known to be present (e.g., tidal signals) and are removed by a preliminary least

squares procedure. Such approaches must be used with caution: first, by using a robust fitting procedure; second, clearly involves more computing than that for a single by being careful to remove the right frequencies. estimate, but the amount is not excessive, as it is not The effect of noise in the data channels in introducing necessaryto recomputethe Fourier transforms. bias and erratic behavior into computed magnetotelluric responsefunctions is well-known, and many techniques to circumvent this problem have been proposed. Numerical DISCUSSION approachesinclude the selection or weighting of different The methods for computing robust power spectra, data sectionsusing coherence [Kao and Rankin, 1977; coherences, and transfer functions presented in this paper Jonesand Joaticke,1984] or iterative refinement of the operate in an automatic, data adaptive fashion that breaks responsefunction [Larsen,1975, 1980]. The most widely down only under unusual circumstances. Detection of used and generally effective method is employment of a the uncorrelated noise their failure may be obvious on inspection and is eased by remote reference to minimize q-q diagnosticplotting. In addition, there are a few tools betweenseveraldistincttime series [Gambleet al., 1978]. that can prove useful either in preventing problems or in These techniques are aimed at the removal of outliers correcting them when they appear. As has been noted, from the data (especiallythe electricfield), since it is a outliers in power spectra are usually completely different few discordant data values rather than persistent backfrom those in the transfer functions and coherences. The ground noise componentsthat producemost of the erratic former are generally associated with nonstationarity or responsefunction behavior. The effectivenessof a robust obviouscontaminationof the data (e.g., spike noise) and estimation approach in substantially reducing bias and are not difficult to detect or eliminate. Problems with the variability is seen in Figure 9 and has been verified on robust regressionmethods used to find transfer functions many other magnetotellurictime series. The use of robust and coherences are more complicated and may be due to spectral analysis to reduce bias and related problems in failures of either the data or the model. The following magnetotelluricprocessingappearsto be quite promising. The methods presented in this paper are based on conideas are aimed primarily at improving the robust transfer function and coherencecomputations. ventional M-estimators and provide good protection Under some physical conditions there may be a correla- againstoutliersin the data {xi} of (14). They are less tion of the rate of occurrence or the size of outliers with effective when outliers occur only in the coefficientvarithe power in either the input or output data sequences. ables {u0} andcannotcopewithgrossly aberrant valuesin One obvious example of this is an instrument problem them. This leads to the concept of a breakdown point E* where clipping, saturation, or distortion at high signal lev- which is the smallest percentageof contaminated data that els on one or more data channels will result in poor corre- will cause the estimator to yield incorrect estimates. The lations with other, clean, time series. There is also some breakdown point is zero for ordinary least squares in the evidence for an increase in the frequency of outlier presenceof outliers in either the data vector or coefficient appearance during large magnetic storms at mid- to high- matrix and is also zero for outliers in the coefficient latitudes that can affect electromagnetic induction data. matrix with L• or M-estimators. M-estimators are preThese problems may be detected by plotting the power in ferred over L1 types on statisticaland efficiencygrounds, the unweighted regression residuals against the power in not becauseof improved outlier resistance. This observation has led to the introduction of different the individual data series and looking for correlations. While the standard robust algorithms usually work with robustification schemes for linear regression. Generalized such data, a substantial improvement may result if the M or G-M estimates were proposed with the goal of rows of (23) are weightedby the inverse of the power in bounding the influence of outlying coefficient variables the pertinent section of the offending time series at all and are discussedby Mallows [1975]. The breakdown stages in the iterative procedure. Weighting that is pro- point for G-M estimatesis at most 1/(p+ 1), or about 30% portional to the data power is appropriate if the outliers for simplelinear regression.Rousseeuw [1984] suggested occur during intervals of low activity. an alternatemethod basedon replacingthe sum in (14) In both the robust power spectrum and transfer func- with the median operator, yielding a breakdown point tion algorithms the residualsand a scale estimate are com- which approaches 50%. Larger valuesfor E* are meaningputed independently using the solution from an earlier less, since discriminationof good from bad data becomesa iteration. At first glance this seems to be unnecessary matter of semantics. Neither of these approaches to since, for example, a Gaussian model for the data implies robust regression have been applied to time series, but a scaledX2 distributionfor the powerspectrum,yielding both are promising candidatesto further improve robust coupled estimates of location (i.e., the residuals) and spectral analysis. scale. However, if periodic or other deterministic backWhile the examples of this paper have been drawn from ground signals are present the Fourier transform has a electromagnetic geophysics, the applications of robust mates based on Gaussian statistics. Such a procedure

CHAVE ET AL.: ROBUSTSPECTRALESTIMATION

spectral estimation are by no means limited to that field and will find uses in many branches of geophysicsand oceanography. Since spectralanalysisis fundamentally a statistical tool, emphasis has been placed on the importance of getting consistencybetween data and model, as well as on verifying the underlying statisticalassumptions. The techniques presented here are capable of achieving this by giving substantial, automatic, and data adaptive protection from at least two classesof outliers and should be used whenever the section-averagingspectral methods would be appropriate. Use of robust spectrum estimates can also substantially reduce data editing chores; within reason, the effects of isolated, bad data are virtually eliminated without user intervention. In conclusion, three points should be reemphasized. First, both spectrum estimation and robust statisticsare fields of active research, so that the methods presentedhere are unlikely to be the last word on the subject. Second, while these methods can help to identify subtle outliers, this does not mean that they should always be summarily rejected: in some instances, the outliers may be the important part of the data. Finally, while there are many arguments about which robust methods are best, there is general agreement among statisticiansthat almost any robust method is better than none at all.

647

symmetricdistributions,d. Am. Stat. Assoc.,71, 409-416, 1976. Hampel, F. R., E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel, RobustStatistics,John Wiley, New York, 1986. Hawkins, D. M., Identificationof Outliers,Chapmanand Hall, London, 1980. Hoaglin, D.C., F. Mosteller, and J. W.Tukey, Understanding Robust and ExploratoryData Analysis, John Wiley, New York, 1983.

Holland, P. W., and R. E. Welsch, Robust regressionusing iteratively reweighted least squares, Commun. Stat., A6, 813--827, 1977.

Huber, P. J., Robust estimation of a location parameter, Ann. Math. Stat., 35, 73-101, 1964. Huber, P. J., RobustStatistics,John Wiley, New York, 1981. Johnson, N. L., and S. Kotz, ContinuousUnivariateDistributions-l, Houghton Mifflin, Boston, 1970. Jones, A. G., and H. Jodicke, Magnetotelluric transfer function improvement by a coherence-basedrejection technique, paper presented at 1984 Society of Exploration Geophysicsmeeting, Atlanta, Ga., 1984. Kao, D., and D. Rankin, Enhancement of signal-to-noiseratio in magnetotelluric data, Geophysics, 42, 103-110, 1977. Kendall, M., and A. Stuart, The AdvancedTheoryof Statistics,2 vols., MacMillan, New York, 1977. Kleiner, B., and T. E. Graedel, Exploratory data analysis in the geophysicalsciences,Rev. Geophys.,18, 699--717, 1980. Kleiner, B., R. D. Martin, and D. J. Thomson, Robust estimation of power spectra,J. R. Stat. Soc. Set. B, 41, 313- 351, 1979. Lanzerotti, L. J., D. J. Thomson, A. Meloni, L. V. Medford, and C. G. Maclennan, Electromagnetic study of the Atlantic continental margin using a section of a transatlanticcable, J. Geophys.Res., 91, 7417-7428, 1986.

Larsen,J. C., Low frequency(0.1-6.0 cpd) electromagnetic study

Acknowledgments.This work was supportedby National Science of deep mantle electrical conductivity beneath the Hawaiian Foundation grants OCE83-08595 and OCE84-20578 and by fundIslands, Geophys.J. R. Astron.Soc., 43, 17- 46, 1975. ing from the Institute of Geophysicsand PlanetaryPhysicsbranch Larsen, J. C., Electromagnetic response functions from interat Los Alamos National Laboratory (LANL). It was completed rupted and noisy data, J. Geomagn. Geoelectr., 32, Supp. I, during A.D.C.'s tenure as J. Robert Oppenheimer Fellow at SI89-SI103, 1980. LANL. We would like to thank L. K. Law for permission to use Lawson, C. L., and R. J. Hanson, SolvingLeast SquaresProblems, the seafloordata for Figures6 and 7 and A. Schultzfor providing Prentice-Hall, Englewood Cliffs, N.J., 1974. the data for Figures 8-10. Lewis, T., and N. I. Fisher, Graphical methods for investigating the fit of a Fisher distribution to sphericaldata, Geophys.J. R. Astron.Soc., 69, 1-13, 1982. Mallows, C. L., On some topics in robustness, technical REFERENCES memorandum, Bell Telephone Lab., Murray Hill, N.J., 1975. Abraham, B., and G. E. P. Box, Bayesiananalysisof some outlier Mallows, C. L., Robust methods, Proc. Syrup. Appl. Math, 28, problemsin time series,Biometrika,66, 229-236, 1979. 49- 74, 1983. Andrews, D. F., A robust method for multiple linear regression, Martin, R. D., and D. J. Thomson, Robust-resistant spectrum Technometrics, 16, 523--531, 1974. estimation, Proc. IEEE, 70, 1097-- 1115, 1982. Barnett, V., Principlesand methods for handling outliers in data Martin, W., and P. Flandrin, Wigher-Ville spectral analysis of sets, in StatisticalMethodsand the Improvementof Data Quality, nonstationary processes,IEEE Trans. Acoust. SpeechSignal Process.,ASSP-33, 1461-1470, 1985. edited by T. Wright, pp. 131-166, Academic, Orlando, Fla., 1983. Mason, R. L., and R. F. Gunst, Outlier-induced collinearities, Technometrics, 2 7, 401-407, 1985. Barnett, V., and T. Lewis, Outliersin StatisticalData, John Wiley, New York, 1978. Mehrotra, K. G., and P. Nanda, Unbiased estimation of parameBartlett, M. S., Propertiesof sufficiencyand statisticaltests, Proc. ters by order statisticsin the case of censoredsamples,BiomeR. Soc. London,Set. A, 160, 268--282, 1937. trika, 61, 601-606, 1974. Beckman, R. J., and R. D. Cook, Outlier....s, Technometrics, 25, Miller, K. S., ComplexStochastic Processes, Addison-Wesley, Read119-163, 1983. ing, Mass., 1974. Belsley, D. A., E. Kuh, and R. E. Welsch, RegressionDiagnostics: Miller, R. G., The jackknife-a review, Biometrika,61, 1-15, 1974. IdentiJ)/ingInfluential Data and Sources of Collinearity, John Mosteller, F., and J. W. Tukey, Data analysisand linear regression, Wiley, New York, 1980. Addison-Wesley, Reading, Mass., 1977. Brillinger, D. R., Time Series:Data Analysisand Theory,Holden- Priestley, M. B., Evolutionary spectra and non-stationary Day, San Francisco,Calif., 1981. processes,J. R. Stat. Soc. Set. B, 27, 204-229, 1965. Claerbout, J. F., and F. Muir, Robust modeling with erratic data, Priestley, M. B., Spectral Analysis and Time Series, Academic, Geophysics, 38, 826-844, 1973. Orlando, Fla., 1981. Cook, R. D., and S. Weisberg, Residualsand Influencein Regres- Rousseeuw,P. J., Least median of squaresregression,J. Am. Stat. sion, Chapman and Hall, New York, 1982. Assoc., 79, 871- 880, 1984. Dongarra, J. J., C. B. Moler, J. R. Bonch, and G. W. Stewart, Slepian, D., Prolate spheroidal wavefunctions, Fourier analysis, LINPACK User'sGuide,SIAM, Philadelphia,Pa., 1979. and uncertainty, V, The discrete case, Bell Syst. Tech. J., 57, 1371-1429, 1978. Efron, B., TheJackknife,theBootstrap, and OtherResampling Plans, SIAM, Philadelphia, Pa., 1982. Thomson, D. J., Spectrum estimation techniquesfor characterizaFox, A. J., Outliers in time series, J. R. Stat. Soc. Set. B, 34, tion and development of WT4 waveguide, I, Bell Syst. Tech.J., 340-363, 1972. 56, 1769-1815, 1977a. Gamble, T. D., W. M. Goubau, and J. Clarke, Magnetotellurics Thomson, D. J., Spectrum estimation techniquesfor characterizawith a remote reference, Geophysics, 44, 53-68, 1978. tion and development of WT4 waveguide, II, Bell Syst. Tech.J., 56, 1983--2005, 1977b. Gross, A.M., Confidence interval robustnesswith long-tailed

648

CHAVEET AL.: ROBUSTSPECTRAL ESTIMATION

Thomson, D. J., Spectrum estimation and harmonic analysis, M. E. Ander, Earth and Space SciencesDivision, Los Alamos Proc. IEEE, 70, 1055--1096, 1982. National Laboratory, Los Alamos, NM 87545. Tukey, J. W., Instead of Gauss-Markov, what?, in AppliedStatisA.D. Chave, AT&T Bell Laboratories, 600 Mountain Ave., tics, edited by R.P. Gupta, pp. 351--372, Amsterdam, NorthMurray Hill, NJ 07974. Holland, 1975. D. J. Thomson, AT&T Bell Laboratories, 600 Mountain Ave., Vogel, C. R., Optimal choice of a truncation level for the trun- Murray Hill, NJ 07974. cated SVD solution of linear first kind integral equationswhen the data are noisy, SIAMJ. Numer. Anal., 23, 109--117, 1986. (ReceivedDecember26, 1985; revised September 30, 1986; Zeger, S. L., Exploring an ozone spatial time series in the freacceptedOctober6, 1986.) quency domain, J. Am. Stat. Assoc.,80, 323--331, 1985.

Suggest Documents