Jan 10, 1987 - DAVID J. THOMSON. AT& T Bell Laboratories, Murray Hill, ... Markov theorem [e.g., Kendall and Stuart, 1977, chapter. 19]. For example, linear ...
JOURNAL
OF GEOPHYSICAL
RESEARCH,
VOL.
92, NO.
B1, PAGES
633-648,
JANUARY
10, 1987
On the Robust Estimation of Power Spectra, Coherences, and Transfer Functions ALAN D. CHAVE1 Earth and SpaceSciencesDivision, Los Alamos National Laboratory, Los Alamos, New Mexico
DAVID
J. THOMSON
AT& T Bell Laboratories,Murray Hill, New Jersey
MARK E. ANDER Earth and SpaceSciencesDivision, Los Alamos National Laboratory, Los Alamos, New Mexico
Robust estimation of power spectra, coherences,and transfer functions is investigatedin the context of geophysicaldata processing.The methodsdescribedare frequency-domainextensionsof current techniquesfrom the statisticalliterature and are applicablein caseswhere section-averaging methods would be used with data that are contaminated by local nonstationarity or isolated outliers. The paper begins with a review of robust estimation theory, emphasizingstatisticalprinciplesand the maximum likelihood or M-estimators. These are combined with section-averagingspectral techniquesto obtain robust estimates of power spectra, coherences,and transfer functions in an automatic, data-adaptive fashion. Because robust methods implicitly identify abnormal data, methods for monitoring the statisticalbehavior of the estimation processusing quantile-quantile plots are also discussed. The results are illustrated using a variety of examples from electromagnetic geophysics.
INTRODUCTION
Reliable estimation of power spectra for single data sequencesor of transfer functions and coherencesbetween multiple time seriesis of central importance in many areas of geophysicsand engineering. While the effects of the underlying Gaussian distributional assumptions on such estimatesare generally understood, the ability of a small fraction of non-Gaussiannoise or localized nonstationarity to affect them is not. These phenomena can destroy conventional estimates, often in a manner that is difficult to detect.
residuals are drawn from a multivariate normal probability distribution, then the least squares result is also a maximum likelihood, fully efficient, minimum variance estimate. In practice,the regressionmodel is rarely an accurate description due to departures of the data from the model requirements. Most data contain a small fraction of unusual
model
observations
distribution
or
"outliers"
that
do
or share the characteristics
not
fit
the
of the bulk
of the sample. These can often be described by a probability distribution which has a nearly Gaussian shape in the
center
and
tails
which
are
heavier
than
would
be
expected for a normal one, or by mixtures of Gaussian
Problemswith conventional(i.e., nonrobust)time series distributions with different variances.
procedures arise because they are essentially copies of classicalstatisticalproceduresparameterized by frequency. Once Fourier transforms are taken, estimating a spectrum is the same processas computing a variance, and estimating a transfer function is a similar procedure to linear regression. Because these methods are based on the least squares or Gaussian maximum likelihood approachesto statistical inference, their advantages include simplicity and the optimality properties established by the Gauss-
Two forms of data outliers are common: point defects and local nonstationarity. Point defects are isolated outliers that exist independent of the structure of the process under study. Typical examples include dropped bits in digital data, transient instrument failures, and spike
noisedue to natural phenomena(e.g., lightning). Local nonstationarity means a departure from a stationary base state that is of finite duration
and must be differentiated
from complete nonstationarity, in which the concept of a
Markov theorem [e.g., Kendall and Stuart, 1977, chapter spectrummust be reformulated[e.g., Priestley,1965; Mar19]. For example, linear regressionyields the best linear tin and Flandrin, 1985]. A geophysicalexample of local unbiased estimate when the errors are uncorrelated and share a common variance; this holds independent of any distributional assumptionsabout them. If, in addition, the
nonstationarity is seen in observationsof the time-varying geomagneticfield: most of the time the data statisticsare approximately constant, but this stationary processis inter-
1OnleavefromInstituteof Geophysics andPlanetary Physics, rupted sporadicallyby .brief but intensedisturbancessuch University of California, San Diego, La Jolla; now at AT&T Bell Laboratories, Murray Hill, New Jersey. Copyright 1987 by the American GeophysicalUnion. Paper number 5B5911 0148-0227/87/005B-5911505.00 633
as magnetic storms with markedly different characteristics. In some studies these events are regarded as contaminating noise, and they must be removed to study the underlying process.The influence of these types of outliers on regressionproblems can be complicated,as aberrant data in the dependent and independentvariables produce quite
634
CHAVE ET AL.: ROBUSTSPECTRALESTIMATION
different changesin the output, and correlationsbetween denotedby F (x), where apparent data outliers in both variables can still yield reasonable regressionparameters. In addition, new classesof outliers can occur in linear regression. In any case, it is a serious statisticalerror to blindly acceptmixture situations To make the correspondencebetween theoretical and samof these types and analyze the combination as a unit. ple entitiesclearer, write dF (x) for dxf (x) and indicate With such data, conventional least squares-basedtech-
F(x) - • du f (u)
(1)
--oo
niques will give inefficientand often seriouslymisleading estimates. This breakdown of the least squares model, while sometimes spectacularin form, is more typically insidious in that a seemingly reasonable answer is obtained, and considerableeffort has gone into devising
theempirical or sample cdfbydP(x). For a setof N data samples {x• }, thisis givenby
(x)=
1 2v
(x- )dr
(2)
diagnosticsto detectthis sort of problem [e.g., Belsleyet where 15(x) is the Dirac delta function. Substitutionof (2) into (1) yieldsthe usualempiricalcdf in which eachdata al., 1980; Cookand Weisberg,1982]. Because the Fourier transform of even moderately point correspondsto a stepin probabilityof 1/N. A set of N real samples{x•} may be sortedinto the long-tailed data tends to be Gaussianas the length of the ascending order seriesincreases(essentiallyby the centrallimit theorem,
but seeBrillinger[1981, chapter5] for details),it is some-
x(•) •< x(2)•< .... •< X(2v)
times claimed that outliers in time series are not a serious
wherex V) is calledthe j th order statistic. Note that the power spectrumexamplesof Thomson[1977b], Kleineret probability distribution of the ordered data is different onx(,__ l) al., [1979], and Martin and Thomson[1982]. Further- fromthat of the originaldata;clearlyx(i) depends problem. However, this is often not true, as shown by the
more,coherences and transferfunctionsare substantiallyandx(,+•)evenwhenthe {x•} areindependent.Assuming more sensitive to the presenceof outli•ers,since multiple time series and ratios of spectra are involved. It is frequently argued that a careful analyst will examine a data set and use ad hoc remedies
to avoid outlier
difficulties.
these samplesto be independentand characterizedby the
pdff (x), the corresponding theoreticalentitiesare the set of N quantilesQ;. Theseare foundfrom the inversecdf or quantilefunctionF -1 (a), definedfor 0 •< a •< 1 as the
While this may work for obvious discordanciesin small samples, it is impractical for large data sets or when, as often happens in time series, the outliers have a scale comparable to or smaller than that of the processunder study. It is preferable to use statisticalproceduresthat are robust, in the sense of being relatively insensitive to the presence of a moderate amount of bad data or to inadequaciesin the model, and that react gradually rather than abruptly to perturbations of either. Such methods have been developed over the past two decades and are
solution
reviewedby Huber [1981], Hoaglinet al. [1983], andHampel et al. [1986].
points.
of
--oo
adx f(x) =a
(3)
witha = (j- l/5)/N for j = 1,2,..., N. The Q• dividethe area under the distributioninto N+ 1 probabilityintervals, with the first and last pointsassignedcumulativeprobabilities of 1/2N and 1-1/2N, respectively,and with a probabilitystep of 1/N occurringat each of the intermediate In the following, it will usuallybe assumedthat the data
or at leastuncorrelated.While this In this paper the principles of robust statistics are {x•.}are independent, adapted to univariate and multivariate spectral analysis mayappearstrangein a time seriessetting,the ix,} should within a geophysicalcontext. The treatment beginswith a be thought of as Fourier transforms of a windowed section
review of some critical statistical concepts, especially of data at some frequency,not as the originaldata in the time domain. Explicitly, consider N segments of data robustness in the estimation of location and scale. This is followed by the introduction of the maximum likelihood from a nominallystationaryprocessy (t), with each segor M-estimators for computing robust averagesand solv- ment consistingof T discretesamplesspacedA time units ing robust regressionproblems. After consideringnumeri- apart and offset by an amount b from the precedingone, cal implementation of the M-estimators, some diagnostic and compute plotting methods to help elucidate the extent of outlier contamination or nonstationarity in data are discussed. These tools are then combinedwith the section-averaging method of spectral analysis and applied to the estimation of power spectra, transfer functions, and coherences. The results are illustrated with a variety of examples from natural source electromagnetic geophysics. The paper closeswith a discussionof distributionalaspectsand some suggestionsfor further work. STATISTICAL PARAMETERS AND ROBUSTNESS
Given a continuous probability density function (pdf) f(x), the cumulative distribution function (calf) is
T--1
Xk(co)= •
vty(tA+ (k- 1)b ) etø'tA k= 1,...,N
t=0
where vt is the data window. The quantity Xk(•o) is the windowedFourier transform of the k th data segmentat radian frequency•o. Even if closelyspacedsamplesof the
originalprocess y (t) are highlycorrelated,the {Xk}will be uncorrelated at a given frequencyfor reasonablevalues of the base offset b. Similarly, information in a single data section at frequencies spaced at least the window bandwidth apart is uncorrelated. One of the most useful proceduresin statisticsis the summarycharacterizationof a distributionor sampleusing
CHAVE ET AL.: ROBUST SPECTRAL ESTIMATION
various types of averages. Of these, the most common
one is location,specifying(in a loosesense)the centeror peak value of a distribution or sample; examples include the mean and median. It is less commonly realized that such averagesare the result of minimizing various norms. For example, the distribution mean /x and the sample mean, or average,• are obtainedby minimizing the L2 or least squares norms of the residuals about the distribu-
tions[f ix-
IaF]1/2 and[f Jx--• 12dP(x) I1/2 with
respectto/x and •, respectively.Performingthe minimization gives the familiar expressions
635
= Ix-
Ix-l
(9)
for the L1 norm. A second class of scale estimate is obtained by reapplying the same estimator used for location to its absolute residuals. The median absolute deviation (MAD) is obtained by taking the median value of the absolute residuals about the L• location estimate
SMA D= median{ I Xl- 5: l}
(10)
The expected value of the MAD is the solution O'MADof
F (• + o'MAD) -- F (• - o'MAD) = 1/2
(11)
The interquartile distance is a related scale estimate given by
iF(x)
SIQ= X(3N/4)-X(N/4 )
The sample median 5: is obtained by minimizing the L1
(12)
and is the spacingbetween the 75% and 25% points of the
normof theresiduals fix-5:l ciF(x). Thisis thesame sample distribution, or the center range containing half of asminimizingthe summedabsolute deviations Z I Xl- • I t=l
the probability. The correspondingtheoretical value is
O'IQ=Q3 A- Q1A
and reduces to the middle order statistic for N odd,
(13)
5:=X(lN/21+l), where[]denotes theintegerpart. Thesam- using (3), and is just twice the MAD for symmetricdistriple median is ambiguous for N even but is typically butions. Both the average absolute deviation • and the
chosenas (X(tN/21)+X(lN/21+l))/2. This is the simplest standard deviation s example of an order statistic. The population median is
•, = Ql/2from (3). The L• and L2 norms have the advantagesof being well known and, for the L2 norm in particular, of resulting in algebraicallysimple equations. However, there are few good reasons to consider them exclusively or to believe that they are the best choices except in restricted circumstances.
Much
of
the
work
in
robustness
has
effectively been on finding more appropriate norms for actual data, rather than applying what J. W. Tukey describesas "over-utopian" assumptions. The second most common characterization
of a distribu-
are sensitive to outliers, and • is also inefficient. Further information on the MAD, interquartile distance, and other robust estimates of scale is avail-
able from Mostellerand Tukey[1977]. The simplest extension of these ideas to the general linear regressionmodel is obtained by minimizing a norm of the residuals in p
Xl = E UljfiJ + rl
i= 1,...,N
(14)
j=l
where{Xl}is an N vectorof dataor observations, {Ul•}is an N x p matrixof knowncoefficients, {fi;} is a p vector of model parameters,and {r•} is an N vectorof residuals.
tion or sample is a measure of its width, dispersion, The solution of (14) dependson the norm that is chosen, spread,variability, or standarddeviation, which is included and will not be the same even for the L• and L2 cases, as under the generic term scale. There are numerous avail-
able estimatesof scale;Gross[1976] compares25 variants of 12 distinct forms, and many others have been sug-
gested. Among these are the minimum value of the norms achieved around the corresponding location estimates.
This class includes the theoretical
standard devia-
tion
f --c•o
12 dr(x)
(6)
Ix- l dp(x) =
(7) n=l
for the L2 norm, and the averageabsolutedeviation
&=f Ix-• IdF(x) __•
or its sample counterpart
medianand the scaleestimatesSlQand SMADto outliersis also well-established, and there are many applications where L• norm methods yield dramatic improvements over their L2 norm counterparts. This has led to suggesabsolute deviation
for the
least squares method for many geophysical problems N
__•
It is well-known that least squares estimates are notoriously lacking in robustness, and both the sample mean and standard deviation may be strongly influenced by a singlediscordantdata point. The resistanceof the sample
tions to substitute the minimum
and the sample standard deviation
s=
discussed by ClaerboutandMuir [1973].
(8)
[Claerboutand Muir, 1973]. There are at least three reasons why this course of action is imprudent. First, the Gauss-Markov theorem [e.g., Kendall and Stuart, 1977, chapter 19] establishesthe optimality of the L2 estimator under general conditionson the structureof the regression residuals. No comparableresult is known for the L1 estimator, and it is not difficult to show, for example, that with Gaussiandata, var(•)=vr/2var(•). This reduction in efficiency for the L1 estimator means that about 60% more data is required to achieve equivalent parameter
636
CHAVEETAL.:ROBUST SPECTRAL ESTIMATION oo
uncertaintiesto the L2 estimator. Second,the natural probabilitydistributionfor L• estimatesis the Laplaceor double exponential type, whereas L2 estimatesinvolve the familiar normal distribution.
This makes statistical infer-
ences based on L• results more complexand less intuitively appealing.Finally,for mostdatathe residualsfrom a least squaresprocedureare in large part Gaussian,with
N
M(O) =f p(x-O )dP(x) =•p(r,) --•o
i=1
(16)
Clearly, if p---log f, minimizingM yields the ordinary maximum likelihood result (15). Furthermore, the estimate is identical to that obtained by the norm minimization methods discussed earlier. This equivalence is the
statistics.This suggeststhat some method for treatingthe
basisfor identifyingL• and L2 as the natural norms for the Laplaceand Gaussiandistributions,respectively. Perform-
contamination within the framework of a Gaussian model,
ingthe minimizationof (16) givesthe equation
the addition of a small fraction of outliers having different
rather than outright abandonmentof that model, is the
N
52 q,(r,)= 0
logicalcourseof action[e.g.,Tukey,1975;Mallows,1983]. One obviousway to achieverobustnessis by the rejection of outliers on the basisof some type of statisticaltest, followedby the use of leastsquareson the remainingand presumablygood data. While a substantialamount of
i=1
where ½(x)=Oxp(X) is calledan influencefunction. Equations(16) and (17) reduce to the least squaresor least absolute deviationsforms when p(x)=x2/2+c,
efforthasgoneinto the development of outliertests[Bar- ½(x) = x or p (x) = Ix I+c, ½(x) - sgn (x), respectively, nett and Lewis, 1978; Hawkins, 1980; Barnett, 1983; Beck- yielding the sample mean or the sample median as soluman and Cook, 1983], the bulk of the resultsapplyonly to tions. The formulation in (16) or (17) is equallyvalid for single,isolatedoutliers,usuallyin a parentnormalpopula- morecomplicated distributions, andthesolution • iscalled tion. The dual phenomenaof maskingand swamping,in an M-estimate. which one bad value may hide the presenceof others, has If enoughdata are available,the samplingpdf or its logbeen widely documentedfor the more commonmultiple arithm can be estimated directly from it, resulting in a outlier situation [Barnett, 1983]. Outlier detection data adaptive, maximum likelihood estimate of location. becomeseven more complicatedin time seriesbecauseof In practice,it is rarely feasibleto characterizethe distribucorrelations between the data [Fox, 1972; Abraham and tion tails with the available data, and the loss function is Box, 1979]. As a consequence, methodsare soughtthat usually chosen on theoretical grounds to retain high accommodate outliers and minimize their influence in a efficiencyover a family of expecteddistributionsin prefersemi-automatic
fashion.
ence to yielding the maximum likelihood estimate for a singledistribution. Sincemost data yield largelyGaussian residuals,with a small outlier fraction, it is customaryto
ROBUST ESTIMATION
In addition to heuristic methods, two major classesof robust proceduresare the L-estimators,basedon combinations of the order statisticsand the M-estimators, a variant of maximum likelihood. L-estimates are especially useful for location problemswith nonsymmetricdistributions. In a time seriescontext, Thomson[1977a] applieda maximum likelihood L-estimator proposed by Mehrotra and Nanda [1974] for censored,exponentiallydistributed populationsto get robust, section-averaged power spectra of contaminated data. While simpler than an M-estimator,
this approachsuffersfrom a lossof efficiencybecausethe truncation point must be fixed in advance, so that the result is not data adaptive. L-estimatorsdo not generalize
readily to regressionproblems and are not considered further in this paper.
To motivate the concept of an M-estimator, consider again the problem of determiningthe locationparameter0
use lossfunctionsgivinghigh efficiency(say, > 95%) for normal data but which still provide reasonable protection for contaminated data. This slight loss in nominal efficiencyis the inevitablepenaltythat must be paid to get a stable estimate in the general case. Since typical data contain 1-20% outliers, the increase in sample size needed to compensatefor this rejection is small compared to the extra •60% needed for the L• estimator. Some commonly used loss functions are describedby Holland
and Welsch[1977] and Hampe!et al. [1986]. There is an additional complication that must be con-
sideredwith M-estimators:the solution• of (16) or (17) will not be scale invariant in the sensethat multiplying the
data{xi} by a constantwill not necessarily resultin a simi-
lar changein b. To correctfor this, it is necessary to replace(16) and (17) by N
f (x-0)
ri
min • p(-•)
givenindependent samples {xi} drawnfroma commonpdf
i=1
(18)
by maximum likelihood. The logarithmof the
likelihood function L (0) is obtained in the usual way by
insertingthe data into the samplingpdf, yielding
N
N
logL(0)-- • logf (r,)
and
(15)
ri
• •(-• ) = 0 i=l
(19)
where d is a robust estimate of scale. While joint optimi-
where r, =x•-0
is the i th residual. The strict maximum zation with respectto both location and scaleis described
likelihoodsolutionb for 0 comesfrom maximizingL (0),
by Huber [1981, chapter6] and Hampelet al. [1986,
or its logarithm, and obviouslydependson knowingthe
chapter4], two practicalbut lower efficiencychoicesare
pdf f (x) exactly. A generalization of (15) basedon a quantityp (x), whichis calleda lossfunction,canbe written
dl= SMAD O'MA D
(20)
CHAVE ET AL.: ROBUSTSPECTRALESTIMATION
637
and
(26)
III
d2'-SIQ
_.• 2k k2 Ixl>k
(21)
O'lQ
whereSMAD and $IQare the sampleMAD and interquartile where a value of k--1.5 gives better than 95% efficiency
distance(10) and (12), and CrMAD and O'iQare their for outlier-free normal data. The corresponding Huber function½(x) = x for Ix I < k andk sgn(x) oththeoretical counterparts for the appropriate standard pdf influence
from (11) and (13). Becauseof the extreme sensitivityof the L2 estimate to outliers, use of the standard deviation (6) and (7)is not recommended. The concept of an M-estimator may be extended to the
general linear regressionmodel (14) by identifyingthe error r, as the regressionresidual[Andrews,1974]. While (18) is not changedin form, (19) must be rewrittenas N
ß =½(-•) x•'= 0 j=1,...,p
(22)
where the superscript asterisk denotes the complex conjugate.
A variety of numerical proceduresto solve the generally
nonlinear forms of (18), (19), and (22) exist, but it is easiest to rewrite the problem as a weighted least squares one and iterate to linearize it. This allows the use of fast, accurate,and stable matrix algorithms. The weighted least
squaresforms of (18) and (22) are, respectively, N
min•2 •'i2r,2
(23)
erwise has weights which never descendto zero. Because it has a discontinuousderivative, the Huber function may introduce slight distortions in time series work if used
exclusively. Sincethe weights(26) fall off slowlyfor large residuals, they provide inadequate protection against
severe outliers. However, (25) is a convex function, so that convergence to a local, as opposed to a global, minimum cannot occur in the iterative weighted least squares solution. Use of the Huber weights gives a good starting point for the application of more severe types of weight functions. Another
class of influence functions
Mostellerand Tukey [1977, chapter 10] is a typicalredescending influence function. However, for time series work it has too much curvature near the origin and can
introduceseriousdistortion[Thomson,1977b]. To reduce this effect, Thomson [1977a] proposeda new weightfunction
i--1
w(x) = exp{-e•(Ixl--•)}
where •v2--p (r/d)/r2and from
N
•, wirix•*= 0
(24)
i=1
where w--½(r/d)/r.
Note that this proceduregivestwo
different weights: the •v• from treating the loss function formulation as an equivalent weighted least squares problem, and w• from its equivalencewith the influence function. While distinct, they are not always clearly separated in the literature. In addition, w or ½ are often chosena priori, implicitly defining •v and p. To linearize the weighted least squares problem, an initial solution is
obtainedusing ordinary (unweighted)least squares,and both the residuals and a scale estimate like (20) or (2!) are computed. The weights are calculatedfrom these, and
the solution to (23) or (24) is found. This procedureis repeated using the residuals and scale estimate from the previous iteration at each stage until convergence is achieved. Note that the weights are data adaptive; the robust formulation ensures that data corresponding to residuals which are large compared to the scale will be downweighted.
There are several forms of the weights in (23) that work well in spectral problems. The first of these was
introducedby Huber [1964] on theoreticalgroundsand is based on a density function with a Gaussian center and Laplacian tails, yielding
p(x)--
and a weight function
Ixl k
2
- klxl
is called redescend-
ing because the influence function and weights approach zero in the presence of large outliers. The biweight of
k2 2
Ixl > k
(25)
a heuristic
extension
of the extreme
(27) value
distribu-
tion. The parameter fi determines the scale at which downweighting begins. While strictly empirical, the N th quantile of the appropriate probability distribution from
(3) is an excellentchoicefor fl. This form hasthe advantages of being smooth, close to unity near the origin, and including an implicit dependencyon the number of data in the weighting procedure; as the number of samples rises, extreme values become more common, and fi must increase to avoid affecting valid data. For example,
fi -- Q•v corresponding to N= 103, 104, 105, and 106 for a normal distribution are 3.09, 3.72, 4.26, and 4.75 standard deviations, respectively. However, as with other rede-
scendinginfluencefunctions,the solution of (23) or (24) with (27) is not unique, so this weight should only be used after a good starting value has been found. NUMERICAL
CONSIDERATIONS
Equations(18) and (23) are generalizations of the usual linear regressionstatement (14), while (19), (22), and (24) are generalizations of the familiar normal equations. Since the numerical ill-conditioning of the normal equations is well-known, it is best to solve the matrix form
(23) by a numericallystablemethodsuchas QR or singular value decomposition[e.g., Lawsonand Hanson, 1974]. The QR decompositionmethod of solving (23) is used throughout this work. Since spectralanalysisinvolves the Fourier transform, the matrices in (23) are complex. Complex least squares proceduresare reviewed by K. S.
Miller [1974] and are availablein standardpackages(e.g., CQRDC and CQRSL in LINPACK [Dongarra et al., 1979]). Alternately, the problem may be broken down
638
CHAVE ET AL.: ROBUSTSPECTRALESTIMATION
into real and imaginary parts and solved using standard 1982], a simpleand effectivemethod is basedon the order real algorithms. To avoid problems with numerical statistics.The quantile-quantile(q-q) plot servesto simulround-off, it is recommended that 64-bit arithmetic be taneously yield indications of goodnessof fit, sketch the utilized. extent of outlier contamination, and provide the key locaAdditional modes of failure in the solution of regression tion and scale parameters. An important property of the equations from collinearity of the coefficient matrix and N orderstatisticsof a data sample{x•} is that they divide similar effects are possible[Belsleyet al., 1980], and a the area under a pdf into N+ 1 parts of not necessarily method for checking for these based on the condition equal size. The q-q plot is obtained by comparing the
numberof uo is givenbyLamerot# etal. [1986].To illus- quantilesof a specifieddistribution,Q;, whichdo divide trate, consider a geomagnetic example where x, is a the area under the pdf into equal-sizedpieces, to the order Fourier-transformed electric field from the i th data seg- statistics x V). If the latter are drawnfrom the assumed ment. As independent variables in (14), take the distribution, they will be similar to the quantiles, and the correspondingtransforms of the p- 3 magneticfield vari- q-q plot will be an approximately straight line. Systematic ablesuil=/-/•, ui2=Di, andui3=Z, andsolvefor the{/•;}, departures from a straight line show that the model is or impedances. If H, D, and Z are linearly independent, inconsistent and can be used to guide the search for a a solution will exist, and collinearity is not a problem. better one. Outliers usually appear as deviations from a However, in the presenceof complex geologicstructureor line at the extreme quarttiles. Finally, the slope and intersource field inhomogeneity, Z may depend on H and D cept of the line give the scaleand location parametersfor the experimental distribution. In the present work, q-q through the tipper functions Ta and To defined by plots of the residuals from a robust procedure are examZ = Tall + ToD ined as a function of frequency. However, instrumentain which case the coefficient matrix would be theoretically tion problems, such as stuck bits, often appearas plateaus of rank 2 (or collinear), and the numerical solution of or staircasesin q-q plots of the raw data. For a lucid dis(14) would be unstable. This problem is flagged by large cussionof q-q plots and their use, see Kleiner and Graedel values of the condition number, and a better solution is [1980] or Lewis and Fisher [1982]. Other regressiondiagnostics,such as plots of the residobtained by taking p-2 and, as is the usual practice, analyzing the vertical magnetic field separately. Note also ual against the input or output power, become unmanagethat the presence of outliers in data can induce collinearity able if applied directly at all frequenciesbut can be tried at into an otherwisestable systemof equations[Masonand specificfrequenciesof interest. Another worthwhile check Gunst, 1985]. For methodsto help decide when (and requiring only one additional plot is investigationof the how) to truncate near-collinear matrix systems,see Law- condition number of the coefficient matrix as a function of frequency. The condition number is just the ratio of the sonandHanson[1974, chapter26] or Vogel[1986]. The matrix elements in the normal equations in a time largestto smallestsingularvalues at each frequencyand is series setting may be identified as the auto-spectra and availabletrivially if (23) is solvedwith the SVD method. cross-spectraof the data and coefficient variables. This Estimates of the condition number are also available from may lead to the temptation to estimate these quantities QR solutions. One important aspect of robust techniques is their abilindependently and robustly, followed by solution of the normal equations for the transfer functions, rather than ity to identify unusual data. Typically, anomalous effects treating the entire problem as a unit using a common set are either localized in frequency or in time and may occaof weights as describedin the last section. While such a sionally be localized simultaneously in both. A common procedure will result in good estimates of the individual type of outlier is local nonstationarity and may be detected matrix elements, the resulting matrix structure may be by the stationaritytest given by Thomson[1977a,b]. This seriously in error because the normal equations encoun- is Bartlett's M-test (no relation to M-estimates) for varitered in spectral applicationsare often marginally condi- ancehomogeneity[Bartlett,1937] computedas a function tioned even with ideal data, and small changes in the of frequency matrix elements induced by the individual robust proN 1 N cedures can easily result in nonphysicalsolutions. Such an M (ro) = N v log v• log• (6o)(28) j=l j=l approachis discouragedeven more than use of the ordinary normal equations.
where•. (w) is thespectral estimate forthej th datasec-
DIAGNOSTICS
It is useful to develop diagnostic procedures that disclose the extent of outlier contamination and give a visual indication of goodnessof fit to a specifiedprobability distribution. These find application both in the early stages of the analysis,when decisionsabout the suitability of particular statistical models must be made, and as the final step in robust estimation, when the efficacyof the method in reducing outlier influence must be assessed. While a myriad of techniques based on residual plotting have been
proposed [e.g., Belsleyet al., 1980; Cook and Weisberg,
tion at radian frequency •o and there are N individual estimates, each having v degreesof freedom, typically, 2 for raw spectra. If the series is stationary, M will be distri-
buted approximatelyas X•2__•. Excessive variability between sets appears as large values of M, while narrowband processes produce values that are smaller than expected. Unusual sections may be identified by observ-
ingthechange in M whena given•k (w) is deleted. In this case, it may prove simpler to examine the variance or total
powerin the raw spectralestimates(i.e., the integralover frequency of the spectrum) against the subset index to find anomalous
sections.
CHAVEET AL.:ROBUSTSPECTRAL ESTIMATION
Another useful procedure to detect sections of a data series which are different is based on the innovations vari-
639
throughout this paper; this gives over 100 dB of bias protection outside an inner domain of full width 8/(T/X ),
where T is the number of samples in the data section and the difference between an observation of a process and a /x is their spacing, but shows partial correlation of the linear predictionof it basedon all of the earlier data. The result inside that band. The correlation is the value of the one-step prediction variance is treated in detail by Priestley equivalent lag window at the subset offset; see Thomson
ance. This is just the variance or power associatedwith
[1981, chapter10]. In the presentcontext,an estimateof
[1977a, section3.3] for details. The raw spectralestimates
the innovations
obtained in this way serve as the input to an M-estimator,
variance is
as described in the next two sections.
o-•,k = exp
&o1og•k (•o)
Cases where only a few disjoint sections of data are
(29) available,or where the whole time seriesis short (in the
--toN .
sense that the frequencies of interest are comparable to
where •oN-is the Nyquist frequency. Higher than normal values for the innovations variance suggest a change in the structure of the underlying processbecause it cannot be predicted adequatelyfrom earlier data. In addition, the ratio of the innovations and ordinary variances is useful as a measure of the relative predictability of the processand is also scale invariant.
the reciprocal series length), have traditionally been treated by band averaging with variable width smoothers and are not amenable to the robust procedures of this paper. In any case, band-averagedspectrahave a low variance efficiencyif a low-bias data window is employed, and
the multiple prolate window method of Thomson[1982] offers vastly superior performance without a significant loss of bias protection. Robustification of the latter is feasible under some circumstances
SPECTRUM ESTIMATION
and will be treated in a
future paper.
A major problem in time seriesanalysisis the choice of an algorithmthat yieldsa spectralestimate,given a finite observation of the processof interest, such that the result is not badly biased, yet remains statisticallyconsistent and efficient. That these requirementsare usually in conflict is attested to by the plethora of techniquesin the literature. For the purposesof this paper, only nonparametric estimates will be considered, eliminating those in which a
Nonparametric spectral estimates formally characterize only purely nondeterministic, stochastic processes. In many practical cases, additional deterministic signals, or componentswith a bandwidthcomparableto the reciprocal series length, and so apparentlydeterministic, are present in a time series and can complicate the spectrum estimation problem considerably. Special methods are required to handle these even for contamination-free data [Thom-
specificfunctionalform for the spectrumis assumed(e.g., son, 1977a,b, 1982], and attentionto the presenceof line maximum entropy). In addition, only direct estimates terms (periodic signals) is required when using robust based on the discrete Fourier interest.
transform
are of immediate
techniquesto avoid treating them incorrectly as outliers.
Computation of a direct estimate consistsof the follow-
ROBUST ESTIMATION OF POWER SPECTRA
ing steps:(1) taperinga data sequenceor a subsetof a data sequence(either the raw data or the residualsfrom a Robust computationof power spectrais a specialcaseof prewhiteningoperation) with a data window, (2) taking the simple location parameter problem discussedearlier. the discreteFourier transform,(3) convertingthe resultto At each frequency a set of independent raw power spectra a cross-spectrumor auto-spectrum by a suitable multiplica- are to be averaged together using an M-estimator; for tion, (4) smoothingthe result to achieve statisticalcon- slowly varying spectra, additional band averaging can be sistency,and (5) correctingfor any prewhitening. The incorporated by combining several adjacent frequencies smoothing operation may be done by some combination of
convolution with a second type of data window (bandaveraging)and combininga set of independentraw estimates computed from a longer data sequence (sectionaveraging). It is usually assumedthat the data window primarily controls bias, while the smoothing operation primarily controls variance, but interactions between the two procedures do exist.
In applying robust M-estimation to spectral problems, only the section-averaging approach will be used. The data are assumed to consist of long sequencesof contiguous values that may be subdivided into smaller pieces of equal length. A data window with good bias characteristics is then applied to the subsets with enough overlap between them to yield high efficiency,yet ensure approximate independence of the raw spectra. For this purpose, the superiority of the prolate spheroidal sequencesas data
from each raw spectrum. It should be remembered that outlier contamination of some of the spectral subsets can only result in a power spectrum that is biased upwards since it is not possible to subtract from the power in the backgroundprocess. This means that only individual estimates deviating in a positive sense from the current robust average are downweightedduring the iterative processing. The robust algorithm employed for univariate power spectrais as follows: given a set of spectralsectionsto be averaged on a frequency-by-frequencybasis, an initial robust solution is obtained from the sample median and used
to find
both
the
residuals
and
a scale estimate.
Either the MAD or interquartiledistancescaling(20) or (21) yields satisfactoryresults, althoughthe latter offers a slight computational advantage. An iterative solution to
the weightedlocationproblem (23) is then soughtusing windows is well-documented[Thomson,1977a; Slepian, the weight function (27) modified to affect only data 1978]. A prolatedata window with a time-bandwidthpro- whose scaledresidualsri/d exceedthe robust averageby a duct of 4 and 70% overlap between estimates is used
critical amount (i.e., the absolute value operation in the
640
CHAVEETAL.:ROBUST SPECTRAL ESTIMATION
new weight parameter may be taken directly from the ordinate of the q-q plot as the point where the distribution tails begin to be evident.
10 4 10 2
10 0 10-2 I
I
IIIIIII
I
10-1
I
IIIIIII
I
10 0
I
IIIIIII
I
I
1
101
Frequency (cph) Fig. 1. Conventional (top curve) and robust (bottom curve) power spectrafor the time variations of the north magneticfield at Victoria, British Columbia, Canada, during July 1982. The nonrobust spectrum is the arithmetic average of 42 independent pieces of data, with additional band-averagingyielding 84 estimates over 0.1--0.5 cph and 168 estimates over 0.5--30 cph. Becauseof severe nonstationarityit actually possesses only 10--20 degreesof freedom. The bottom line is the robust averageof the same raw spectra, such that the high-amplitude magnetic storms are eliminated, yielding a much higher equivalentdegreesof freedom (see text).
The robust power spectral method is best illustrated with some examples. Figure 1 compares robust and conventional spectra of the north magnetic field time variations at Victoria Observatory for the month of July 1982. The data are typical of the mid- to high-latitude geomagnetic field, consisting of a quiet background component interspersed with violent, short-duration storm activity. The latter comprise only 10-20% of the data but exhibit power levels that are easily a decade above the back-
ground. The spectra in Figure 1 were computed by averaging,both arithmetically and robustly, raw estimates of 2 daysduration. The conventional,arithmetic-averaged spectrum is about a factor of 10 larger than the robust spectrum and displaysmuch greater point-to-point variability. Owing to the data adaptive weighting, the nominal number of estimates per frequency for the robust spectrum is variable but lower by a factor of about 0.9 compared to the conventional type. An argument ignoring robustness would imply that the equivalent degrees of
freedom per frequency (about 90 in this example) is
higher for the straight arithmetic-averagedspectrum, but this is contradictedby the higher variability seen in Figure answer does not change substantially, typically after 3--5 1. In fact, the conventionalestimate is dominatedby only iterations. a fraction of the data, so that it has only 10-20 degreesof If outliers have been eliminated, power spectra are the freedom at most, accounting for the larger uncertainty. sums of squares of almost normally distributed variates, The robust spectrum is a much better measure of the and hencedistributedas chi-square(X2). It is crucialto time-averagedbehaviorof the geomagneticfield, while the the correct operation of the nonlinear M-estimator that conventional spectrum is little different from that given by the proper distribution be used in getting the theoretical analysisof only the storm time data. MAD and interquartiledistancein (20) and (21). Using The q-q plots in Figure 2 illustrate the statisticalnature the discrete Fourier transform, each raw power estimate of the robust averaging procedure. For conveniencein possesses2 degreesof freedom at each frequency, neglect- plotting,the raw and weightedestimatesin eachfrequency ing the DC and Nyquist components, which have only 1 bin have been scaledso that their sum of squaresis 8, the
exponentis removed). Convergenceis achievedwhen the
degreeof freedom. The X2 distributionwith 2 degreesof value expectedfor the secondmomentof a X22variate. freedom(X•) is equivalentto the exponential distribution The top plot showthe original(unweighted)q-q plot for a and has a pdf given by the standardform [Johnsonand few frequenciesin the range 0.3-0.4 cph which are typical Kotz, 1970, chapter18] of the entire spectrum. The infrequent but intense storm activity gives a typicallong-taileddistribution. The shape f (x) = •,5e--x/2 (x >• 0) (30) of the weight function causesthe bottom q-q plot of the for which the median • is 21og2, the MAD itI,MAD is final, weightedpower estimatesto be slightlyshort-tailed. 2sinh--•(•h)• 0.9624, and the interquartile distance is Increasingthe weight parameter/3 would bring this closer
O-iQ---21og3• 2.1972. The quantilesof the exponential to a true X• result. However,the short-tailednatureof distribution are given by
the result does not appreciably alter the robust spectrum
of Figure 1, especiallyon the logarithmicscaleover which N significantpower changesoccur. j= ],...,N (31) log N-jq-•A The stationaritytest (28) was computed using the 2day-long raw sectionestimatesof Figure 1. The value of Using the Nth quantileof the X• distributionas the M was about 130 from the lowest frequency to about 0.2 weightparameter/3in (27), robustaveragingbecomesan cph, decreased slowly to a value of about 70 at the adaptive procedure that operates automatically in all save Nyquist frequency, and did not display any structure that exceptional cases. Detection of such exceptional cases is was localized in frequency. The expected value of M is facilitated by examination of the q-q plot of the final, 64.3, indicating strong nonstationarityat low frequencies. weighted data. If the weighting has been performed The reduction of M at high frequenciesis causedby a rise correctly, then the final q-q plot of the weighted spectral in the noise level as the spectrum decreases;instrument estimates(after discarding valueswith zero weight)against noise is generallyquite stationary. Further examinationof
the X22quantiles(31) shouldapproximatea straightline.
the series shows that
the detailed
form
of the nonsta-
If it remains long-tailed, then the outlier fraction exceeds tionarity is complicated. The ordinary variance in each a typical value of 10-20%, and the weight parameter/3 section of data normalized to its average over all of the must be reduced. Since the scale parameters (20) and sectionsis plotted againsttime in Figure 3 (top). The (21) are chosento be consistent with a X22distribution,the subset variance exceeds twice the average value at five
CHAVE ET AL.: ROBUSTSPECTRALESTIMATION
641
Figure 4 shows both conventional and robust spectra of
the electricfield variationsin the frequencyband 10--4 to
16
1 Hz collected on Adams Mesa, Arizona, in 1979. The differences between the robust and nonrobust
12 ll
8 4 0
0
4
8
X2 Quanfile 16 u
ROBUST ESTIMATION OF TRANSFER FUNCTIONS AND COHERENCES
12
m
8
ß
4
o
0
results are
more substantial than for Figure 1, especially at the highest frequencies, where the conventional spectrum is dominated by outliers. Note in particular the oscillatory nature of the conventionalestimate, rememberingthat it is plotted on a logarithmic frequency scale. This behavior is typical of a few large outliers in a singlesubset. Figure 5 shows original and final q-q plots for two frequency intervals, 0.010-0.013 Hz and 0.200-0.204 Hz. The former band shows typical long-tailed behavior caused by a small fraction of outliers that is easily correctedby the robust method. The higher-frequency interval also shows the effects of a few extremely large outliers; these are again removed by the robust averaging procedure with a dramatic effect on the spectrum.
•
I
I
I
I
The robust computation of transfer functions is the frequency-domain equivalent of multivariate robust linear regression, while the coherencesare similar to the correlation coefficientsof statistical inference theory, and are derived from the output and residual powers obtained during the M-estimation procedure. The spectral problem differs from more conventional
X2 Quanfile Fig. 2. Quantile-quantileplots for the conventional(top) and robust (bottom) spectra of Figure 1 in the frequency range
ones because the data are
complex rather than real numbers, and there are at least two frameworks
within which determination
of the robust
scale and iterative reweighting may be performed. In the
0.3--0.4 cph. The plots show the N ranked raw power estimates
in a given frequencybin and scaledso that their sums of squares
is 8 against the N quantiles of theX22distribution; eachplotted symbol correspondsto a single data section, and different symbols correspondto different frequency bands. The nonrobust q-q plot
c• 6
at thetopshowsa largely X22population, withan additional longtailed component that is attributed to brief but intense magnetic storms. The q-q plot at the bottom shows the effect of adaptive weighting of the data, eliminating the storms and yielding a
slightly short-tailed X22result. separate and isolated places; these are associated with small magnetic storms. The innovations variance from
(29) normalizedby its averageover all of the datasections is shown against time in Figure 3 (bottom). This was much higher than the mean for extended periods after the
five events of Figure 3 (top), suggestingcomplex and long-term changesin the underlyingprocess. The innovations variance is also higher between days 14 and 20 without any obvious associationwith the power, implying a different type of change in the structure. The ratio of the innovations and ordinary variances stayed near its median value during the high power event at day 3, imply-
O 2.0
, o
ing that the structureof the processdid not changemuch even though the ordinary variance increased dramatically. However, the processwas altered after this event, as evidenced by an increase in the ratio, and returned slowly to normal with time. The largest overall change in the structure of the processoccursduring a 4-day interval near the center of the record where the relative prediction variance decreasesby a factor of 4 from its median value; this is
alsoapparentin Figure 3 (bottom).
lo
20
30
TIME(doys) Fig. 3. The top panel shows the total power or variance in each windowed data subset used in obtaining Figure 1 as a function of the center time of the subset.
These values have been normalized
by the average power for the entire data series. The bottom panel shows the innovations variance for the different subsets against the center time of the subset.
The innovations
variance has been
normalized by its mean over all of the subsets. See text for details.
642
CHAVE ET AL.: ROBUSTSPECTRALESTIMATION
-r •
ence reinforces this: in severely contaminated data, treating the complex magnitude, rather than the independent
10 2
real and imaginaryparts, yields more consistenttransfer
0
E 10 •,--10-2
functions and better convergence. Only the Rayleigh method is used in this paper.
•
putation is similar to the standard procedures described earlier. The data input to the M-estimator are a set of k raw section Fourier transforms for a single output data
The robust M-estimator
10-4
10-6 [ I IIIIII
I
I
IIIIIII
10-5
I
I
10-2
I IIIIII
I
I
used for transfer function com-
series{Xk}andsimilarsetsfor p inputdataseries
I II
both are also parameterizedby frequency. The solution is initialized by computing an unweighted least squaresresult
10-1
Frequency (Hz)
using QR decompositionon (14), from which residuals
Fig. 4. Conventional(top curve) and robust (bottom curve) power spectraof the north electricfield collectedon Adams Mesa, Arizona. The nonrobust spectrumis the arithmetic averageof 43 independent raw estimates, with additional band-averaging increasingthis to 86 estimatesover 0.01--0.1 Hz and 172 estimates over 0.1--0.5 Hz. The robust spectrumhas •70 degreesof freedom over 0--0.01 Hz, •130 degrees of freedom over 0.01--0.1 Hz, and •250 degrees of freedom over 0.1--0.5 Hz. The severe effect of outliers is apparent on comparing the two results, especiallyat the highestfrequencies.
and a scale estimate (20) or (21) are obtained. The Huber
weights(26) computedusingthe scaledresidualsare then appliedto the rowsof the matrix regression problem(23). The weighted regressionproblem is solved, and the entire processis repeated until the total residual power does not change below a threshold value. This proceduregives a final scale estimate and an initial set of residuals for use in
the last part of the algorithm. This is also based on iteratively reweightedleast squaresbut uses the weight function (27) and a fixed scale estimate derived from the final first instance,the data (i.e., the raw Fourier transforms) Huber iteration, again terminating when the residual may be regardedas having independentGaussianreal and power does not vary. A modification of this procedure imaginary parts, so that separate weights are applied to which substitutes an L• simplex programming algorithm them. In the second case, only the magnitudes of the for the iterativi• Huber one has also been used successcomplex numbers are considered,and identical weights are fully. The residualsfrom an L• solutionare found and applied to the real and imaginary parts of the data. Zeger their MAD is used to get a final scaleestimate. Iteratively [1985] has shown that the latter choice,which has a Ray- reweightedleastsquaresusingthe weightfunction (27) is leigh distribution, is preferable becauseit is rotationally then used until the residual power does not change sub(phase) invariant. It is also more conservativein that an stantially. The standardprobabilitydistribution for a variate which outlier in either the real or the imaginary part of the data will result in downweighting. Extensive practical experi- is the square root of the sum of the squares of two
16
o15
•
12
10
8
ß
5
4
o
0
0
I
o
4
I
I
8
X2 Ouoni'ile
X2 Ouonfile o
•-
20
:.r_16
:• 15
212
• 10
m 8
g 4
5 0
o o
4
8
X2 Ouanfile
0
0
4
8
12
X2 Ouanfile
Fig. 5. Quantile-quantile plotsfor the nonrobust(top panels)and robust(bottompanels)powerspectraof Figure4. The left-hand side shows the results in the 0.010--0.013 Hz band, while the right-hand side covers the range
0.200--0.204 Hz. In bothcasesthe originaldistribution is a long-tailed X:22 typeandis changed to a slightly shorttailed one by the robust averaging. The effect of only a few severe outliers on the spectrumis seen at the higher frequenciesand by examining Figure 4.
CHAVE ET AL.: ROBUSTSPECTRALESTIMATION
643
where Sx• and Srr are the weighted output and residual
1.0
powers
•
08
N
E I Sx•= •=•N • w,2
i!
--c 06'
0.4
i--1
N
0.2 0.0
E w,2lr,I• Srr = i=lN • w,2
_
I
I 1111111
I
10 -1
I IIIllll
I
I IIIIIII
10 ¸
I
I I
101
and (34) is understoodto be zero if Srr> Sx•. The weight w, is computedusing (27) and the scaledresidualsfrom the last iteration in the weighted linear regression, and setting wi- 1 yields the conventional, nonrobust coherence. Note that (34) reduces to the amplitude of the ordinary magnitude-squaredcoherencefunction when there is only
lO
ß08/
i•
iI
Ii
one input data sequence (p=l). The weighted power spectrumin (35) is not the same as the robust result that
o
C.) ,•,,,'
,
• 04
is found using the methods of the last section becausethe weights are derived on the basisof regressionrather than
=02
10-1
(36)
I----1
Frequency (cph) ß
(35)
10 ¸
location residuals. The processesproducing large regression outliers may be different from those that generate anomalous power spectra; in particular, the local nonstationarity seen in Figure 1 may not lead to regressionproblems if the data and coefficient variables change in similar
101
Frequency (cph)
ways. In addition, new classesof regressionoutliers can occur that do not affect power spectra, and the origin of
Fi•. 6. Conventional (dashed lines) and robust (solid hn½s) these outliers can be difficult to determine. squared mulfip|½coherenceplots for the north magnetic fi½|d at Figure 6 compares the robust and nonrobust squared two seafloorstationso• Vancouver Island, site A (bottom) and site B (top), with al| three magneticfi½|dcomponentsat ¾JctorJa multiple coherence(34) betweenthe north magneticfield Observatory. The data have been contaminated by an instrument fault, accounfin• for the low conventional coherence between ! and $ cph. Robust r½•r½ssJon completelyremoves this ½•½ct. See text for details.
independentGaussian variablesis Rayleigh, a transformed
at two seafloor
sites off Vancouver
Island
and all three
magnetic field componentsfrom the standard observatory at Victoria. The seafloor data were contaminated to vary-
ing degreesby a nearly sinusoidalcomponent of about 1hour period that was later attributed to magnetized tape cassettes in
the
instrument
data
recorders.
Both
the
versionof the X22distribution.The pdf is givenby [John- period and the amplitude of the offending signal decreased sonandKotz, 1970, p. 197] continuously over the 1-month-long record, making it a
f (x)----xe--x2/2(x>•0)
(32)
Using (32), the median and interquartile distance are
•=•/21og2 andO-•Q=•/21og4--•/21og4/3. TheMAD is a solution of the transcendentalequation
2sinh • O'MA D)= e(•rMAD)2/2 yielding a value of •0.44845.
Q; =
(33)
As for the other robust methods, the N th quantile from
(33) servesas the weightparameter/•in (27). The robust squared multiple coherence of the output with the p input time series is computed on a frequencyby-frequency basisfrom
,,2 Sxx-Srr
3'-
$x•
ministic component. Site B (top) was more heavily affectedthan site A (bottom), and the noise component was readily visible as a large amplitude sinusoid in the former case but only produced a slight fuzziness in the site A record. In either case, the conventional coherence (dashedlines) is reduced substantiallyby the contamina-
tion, while the robustalgorithmhas readily'eliminatedits
The quantiles are
2log N-½-•!/2 j --1,..., N
frequency-localized outlier, rather than simply a deter-
effects. The conventional coherence possessesabout 100 nominal degrees of freedom per frequency, while the robust coherence has 85-90 degrees of freedom outside the zone of contamination
and somewhat fewer inside it.
Note also the presence of harmonics of the • 1 cph fundamental. Because of instrument drift, a reduced signal level, and the electromagneticeffectsof internal waves the coherence falls off at low and high frequencies in both examples: the robust method does not artificially increase the coherence when the underlying process is itself incoherent. This example illustrates the ability of the
(34) robust algorithmto handle both gross(site B) and subtle (site A) outliers in an automatic fashion.
644
CHAVE ET AL.: ROBUSTSPECTRALESTIMATION
•-
•- 2
0.0
1.5
3.0
,
o0
0.0
Rayleigh Quanfile
1.5
3.0
,
Rayleigh Quanfile
•- 2 0
I
0.0
1.5
o 0 0.0
3.0
Rayleigh Quantile
1.5
3.0
Rayleigh Quantile
Fig. 7. Unweighted(top) and weighted(bottom)residualq-q plotsfor the resultsin Figure5 computedusingthe Rayleighresidualmethod describedin the text. The left-handside showsthe initial long-tailedand final, slightly short-tailed distributionsfor the site A data over 0.97--1.14 cph. The right-hand side showsequivalent values over 1.2--1.4 cph for the more severely contaminated site B data.
Figure 7 shows sample Rayleigh q-q plots for the site B Figure 10 comparesthe original and final q-q plots for data over 1.2--1.4 cph (right panels) and for the site A two frequencybands obtained using the Rayleigh residual data over 0.97-1.14 cph (left panels). For conveniencein method to get Figures 8 and 9. At the lower frequency plotting, the residualsin each frequency band have been (0.0013-0.0017 Hz) the contamination consists of a few scaled so that the sum of squaresis 2, as expected for the second moment of a Rayleigh-distributed variate. In both cases the contamination causes the original, unweighted q-q plots to be extremely long-tailed; this is especiallyevident for the site B data. The final, weighted q-q plot shows a slightly short-tailed Rayleigh distribution for the residuals. There is a suggestionthat a smaller value of the weight parameter ,6 would improve the result at site Bg this is corroborated by the dips in the robust coherence
large outliers, and the conventional coherence is degraded only a little. The final q-q plot shows a typical short-tailed Rayleigh result. In the higher-frequency band of 0.0063-0.0067
Hz
the
nonrobust
coherence
is substan-
tially lower and the outlier contamination much worse, but the robust algorithm is still successfulin removing them with a concomitant improvement in the coherence and transfer
functions.
between 1 and 3 cph in Figure 6 (top). Such tuning is normally not required, but this is an exceptional case becausethe interfering signal is both strongand persistent. Figure 8 shows robust and conventional coherences between
the north
electric
field
collected
on the bed of
Lake Washingtonnear Seattlein 1981 (A. Schultz, private communication,1985) and two horizontalmagneticcomponents from a nearby land site computed using 223 raw estimates. The roll-off at low and high frequencies is caused by the filters applied during data collection, but outlier effects seriously degrade the conventional coherence
estimate
between
0.001
and
0.03 Hz.
The
robust
method yields a coherence of better than 0.99 over the same frequency band. Figure 9 compares the nonrobust and robust
transfer
functions
for the same north
and east magnetic components; this is just an element of the magnetotelluric response function. Outlier noise in the electric field causes the conventional
FREQUENCY (Hz)
electric
transfer function
to be excessively variable and consistently biased downward. By contrast, the robust transfer function is much smoother over the frequency interval of Figure 8 where the coherenceis high.
Fig. 8. The conventional(dashedline) and robust (solid line) squaredmultiple coherencebetween the north electric field at the bed of Lake Washington and the two horizontal magnetic field componentsfrom a nearby land site. There are 223 raw estimates used in both cases,but the robust coherencehas fewer than 446 degrees of freedom because outliers are downweighted. The robust squared coherence exceeds 0.99 over most of the range 0.0005--0.02
Hz.
CHAVE ETAL.'ROBUST SPECTRAL ESTIMATION
645
DISTRIBUTIONS AND RELIABILITYOF ESTIMATES 0.4
A detailed discussionof the distributions for estimates
of power spectra,coherences,and transfer functions is beyondthe scopeof this paper,andthe subjectis covered
0.2
in detail by Brillinger [1981]. Traditionally,tests of hypotheses or the placement of confidence intervals on
spectralparametershave requireddistributionalassumptions, with the complexityof the exactdistributionsneces-
sitatingfurther simplification and the use of asymptotic forms to give a tractableresult. However,the standard
-0.2 Ill
assumption of independence and identical statistics for
eitherthe dataor the regression residualsimpliesfreedom from outliers. This suggests that the robustspectralestimateswill matchthe statistical modelmore correctly,so that more realistichypothesistestingcan be performed usingthem. For the robustestimators of thispaper,reasonableapproximations are obtainedusingGaussian-based
0.4
0.2
statistics if
0
N
Neff(to) = • wi2(to) i=1
-0.2
is usedas the effective or nominalnumberof estimates,
[11
wherewi(to) is the robustweightfor the univariatecase. 10-4 10-3 In the multivariatecasethisnumberis averaged overthe FREQUENCY (Hz) p inputs,and Neff is reducedby a factorof p-1. This result is usuallynot as reliableas the jackknifedvalue Fig. 9. The real (top) and imaginary(bottom)partsof the con- mentionedbelow. In either case, variancecalculations ventional(dashedlines)and robust(solidlines)transferfunctions causedby the overlapof the between the northelectric andeastmagnetic fielddataof Figure mustincludethe correlation 8. The nonrobust androbustcurveshavebeenoffsetby 0.4 for clarity. The robust result is smootherthan its conventionalcoun-
terpart,whilethe latteris alsobiaseddownwardby outliernoise.
o
windowed subsections, about25% for the prolatetaper
with a time-bandwidthproductof 4.
In keepingwith the nonparametric philosophy adopted
0 6 6 4
0
1.5 3.0 RAYLEIGHQUANTILE
0
1.5 3.0 RAYLEIGHQUANTILE
Fig.10.Unweighted (top)andweighted (bottom) residual q-qplots fortheresults inFigures 8and9 using theRay-
leighresidual method described in thetext.Twofrequency bands areshown, 0.0013--0.0017 Hz (leftpanels) and 0.0063--0.0067 Hz (rightpanels). Theinitial result istypically long-tailed, andtheoutlier fraction islarger atthe higher frequency. Theadaptive weighting makes thefinalresiduals slightly short-tailed.
646
CHAVE ET AL.: ROBUSTSPECTRALESTIMATION
in this work, it is possibleto compute error and bias esti- nonzero expected value and the distribution becomes nonmatesby use of the jackknife method [R. G. Miller, 1974; centralX2 [Johnson andKotz, 1970,chapter28], and locaElton, 1982]. The jackknife is basedon N repeatedana- tion and scale estimates must be made separately. Similar lysesof the data using N-1 subsetseach time, in addition to the original estimate using all N subsets. Appropriate combinations of these N+ 1 estimates give values for both the bias and the variance that are valid under a wide range of parent distributions and estimation procedures,and are generally considered to be more reliable than error esti-
arguments apply to the transfer functions and coherences, although their distributions are substantiallymore complicated in the noncentral case. In other instances, deter-
ministic componentsmay be known to be present (e.g., tidal signals) and are removed by a preliminary least
squares procedure. Such approaches must be used with caution: first, by using a robust fitting procedure; second, clearly involves more computing than that for a single by being careful to remove the right frequencies. estimate, but the amount is not excessive, as it is not The effect of noise in the data channels in introducing necessaryto recomputethe Fourier transforms. bias and erratic behavior into computed magnetotelluric responsefunctions is well-known, and many techniques to circumvent this problem have been proposed. Numerical DISCUSSION approachesinclude the selection or weighting of different The methods for computing robust power spectra, data sectionsusing coherence [Kao and Rankin, 1977; coherences, and transfer functions presented in this paper Jonesand Joaticke,1984] or iterative refinement of the operate in an automatic, data adaptive fashion that breaks responsefunction [Larsen,1975, 1980]. The most widely down only under unusual circumstances. Detection of used and generally effective method is employment of a the uncorrelated noise their failure may be obvious on inspection and is eased by remote reference to minimize q-q diagnosticplotting. In addition, there are a few tools betweenseveraldistincttime series [Gambleet al., 1978]. that can prove useful either in preventing problems or in These techniques are aimed at the removal of outliers correcting them when they appear. As has been noted, from the data (especiallythe electricfield), since it is a outliers in power spectra are usually completely different few discordant data values rather than persistent backfrom those in the transfer functions and coherences. The ground noise componentsthat producemost of the erratic former are generally associated with nonstationarity or responsefunction behavior. The effectivenessof a robust obviouscontaminationof the data (e.g., spike noise) and estimation approach in substantially reducing bias and are not difficult to detect or eliminate. Problems with the variability is seen in Figure 9 and has been verified on robust regressionmethods used to find transfer functions many other magnetotellurictime series. The use of robust and coherences are more complicated and may be due to spectral analysis to reduce bias and related problems in failures of either the data or the model. The following magnetotelluricprocessingappearsto be quite promising. The methods presented in this paper are based on conideas are aimed primarily at improving the robust transfer function and coherencecomputations. ventional M-estimators and provide good protection Under some physical conditions there may be a correla- againstoutliersin the data {xi} of (14). They are less tion of the rate of occurrence or the size of outliers with effective when outliers occur only in the coefficientvarithe power in either the input or output data sequences. ables {u0} andcannotcopewithgrossly aberrant valuesin One obvious example of this is an instrument problem them. This leads to the concept of a breakdown point E* where clipping, saturation, or distortion at high signal lev- which is the smallest percentageof contaminated data that els on one or more data channels will result in poor corre- will cause the estimator to yield incorrect estimates. The lations with other, clean, time series. There is also some breakdown point is zero for ordinary least squares in the evidence for an increase in the frequency of outlier presenceof outliers in either the data vector or coefficient appearance during large magnetic storms at mid- to high- matrix and is also zero for outliers in the coefficient latitudes that can affect electromagnetic induction data. matrix with L• or M-estimators. M-estimators are preThese problems may be detected by plotting the power in ferred over L1 types on statisticaland efficiencygrounds, the unweighted regression residuals against the power in not becauseof improved outlier resistance. This observation has led to the introduction of different the individual data series and looking for correlations. While the standard robust algorithms usually work with robustification schemes for linear regression. Generalized such data, a substantial improvement may result if the M or G-M estimates were proposed with the goal of rows of (23) are weightedby the inverse of the power in bounding the influence of outlying coefficient variables the pertinent section of the offending time series at all and are discussedby Mallows [1975]. The breakdown stages in the iterative procedure. Weighting that is pro- point for G-M estimatesis at most 1/(p+ 1), or about 30% portional to the data power is appropriate if the outliers for simplelinear regression.Rousseeuw [1984] suggested occur during intervals of low activity. an alternatemethod basedon replacingthe sum in (14) In both the robust power spectrum and transfer func- with the median operator, yielding a breakdown point tion algorithms the residualsand a scale estimate are com- which approaches 50%. Larger valuesfor E* are meaningputed independently using the solution from an earlier less, since discriminationof good from bad data becomesa iteration. At first glance this seems to be unnecessary matter of semantics. Neither of these approaches to since, for example, a Gaussian model for the data implies robust regression have been applied to time series, but a scaledX2 distributionfor the powerspectrum,yielding both are promising candidatesto further improve robust coupled estimates of location (i.e., the residuals) and spectral analysis. scale. However, if periodic or other deterministic backWhile the examples of this paper have been drawn from ground signals are present the Fourier transform has a electromagnetic geophysics, the applications of robust mates based on Gaussian statistics. Such a procedure
CHAVE ET AL.: ROBUSTSPECTRALESTIMATION
spectral estimation are by no means limited to that field and will find uses in many branches of geophysicsand oceanography. Since spectralanalysisis fundamentally a statistical tool, emphasis has been placed on the importance of getting consistencybetween data and model, as well as on verifying the underlying statisticalassumptions. The techniques presented here are capable of achieving this by giving substantial, automatic, and data adaptive protection from at least two classesof outliers and should be used whenever the section-averagingspectral methods would be appropriate. Use of robust spectrum estimates can also substantially reduce data editing chores; within reason, the effects of isolated, bad data are virtually eliminated without user intervention. In conclusion, three points should be reemphasized. First, both spectrum estimation and robust statisticsare fields of active research, so that the methods presentedhere are unlikely to be the last word on the subject. Second, while these methods can help to identify subtle outliers, this does not mean that they should always be summarily rejected: in some instances, the outliers may be the important part of the data. Finally, while there are many arguments about which robust methods are best, there is general agreement among statisticiansthat almost any robust method is better than none at all.
647
symmetricdistributions,d. Am. Stat. Assoc.,71, 409-416, 1976. Hampel, F. R., E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel, RobustStatistics,John Wiley, New York, 1986. Hawkins, D. M., Identificationof Outliers,Chapmanand Hall, London, 1980. Hoaglin, D.C., F. Mosteller, and J. W.Tukey, Understanding Robust and ExploratoryData Analysis, John Wiley, New York, 1983.
Holland, P. W., and R. E. Welsch, Robust regressionusing iteratively reweighted least squares, Commun. Stat., A6, 813--827, 1977.
Huber, P. J., Robust estimation of a location parameter, Ann. Math. Stat., 35, 73-101, 1964. Huber, P. J., RobustStatistics,John Wiley, New York, 1981. Johnson, N. L., and S. Kotz, ContinuousUnivariateDistributions-l, Houghton Mifflin, Boston, 1970. Jones, A. G., and H. Jodicke, Magnetotelluric transfer function improvement by a coherence-basedrejection technique, paper presented at 1984 Society of Exploration Geophysicsmeeting, Atlanta, Ga., 1984. Kao, D., and D. Rankin, Enhancement of signal-to-noiseratio in magnetotelluric data, Geophysics, 42, 103-110, 1977. Kendall, M., and A. Stuart, The AdvancedTheoryof Statistics,2 vols., MacMillan, New York, 1977. Kleiner, B., and T. E. Graedel, Exploratory data analysis in the geophysicalsciences,Rev. Geophys.,18, 699--717, 1980. Kleiner, B., R. D. Martin, and D. J. Thomson, Robust estimation of power spectra,J. R. Stat. Soc. Set. B, 41, 313- 351, 1979. Lanzerotti, L. J., D. J. Thomson, A. Meloni, L. V. Medford, and C. G. Maclennan, Electromagnetic study of the Atlantic continental margin using a section of a transatlanticcable, J. Geophys.Res., 91, 7417-7428, 1986.
Larsen,J. C., Low frequency(0.1-6.0 cpd) electromagnetic study
Acknowledgments.This work was supportedby National Science of deep mantle electrical conductivity beneath the Hawaiian Foundation grants OCE83-08595 and OCE84-20578 and by fundIslands, Geophys.J. R. Astron.Soc., 43, 17- 46, 1975. ing from the Institute of Geophysicsand PlanetaryPhysicsbranch Larsen, J. C., Electromagnetic response functions from interat Los Alamos National Laboratory (LANL). It was completed rupted and noisy data, J. Geomagn. Geoelectr., 32, Supp. I, during A.D.C.'s tenure as J. Robert Oppenheimer Fellow at SI89-SI103, 1980. LANL. We would like to thank L. K. Law for permission to use Lawson, C. L., and R. J. Hanson, SolvingLeast SquaresProblems, the seafloordata for Figures6 and 7 and A. Schultzfor providing Prentice-Hall, Englewood Cliffs, N.J., 1974. the data for Figures 8-10. Lewis, T., and N. I. Fisher, Graphical methods for investigating the fit of a Fisher distribution to sphericaldata, Geophys.J. R. Astron.Soc., 69, 1-13, 1982. Mallows, C. L., On some topics in robustness, technical REFERENCES memorandum, Bell Telephone Lab., Murray Hill, N.J., 1975. Abraham, B., and G. E. P. Box, Bayesiananalysisof some outlier Mallows, C. L., Robust methods, Proc. Syrup. Appl. Math, 28, problemsin time series,Biometrika,66, 229-236, 1979. 49- 74, 1983. Andrews, D. F., A robust method for multiple linear regression, Martin, R. D., and D. J. Thomson, Robust-resistant spectrum Technometrics, 16, 523--531, 1974. estimation, Proc. IEEE, 70, 1097-- 1115, 1982. Barnett, V., Principlesand methods for handling outliers in data Martin, W., and P. Flandrin, Wigher-Ville spectral analysis of sets, in StatisticalMethodsand the Improvementof Data Quality, nonstationary processes,IEEE Trans. Acoust. SpeechSignal Process.,ASSP-33, 1461-1470, 1985. edited by T. Wright, pp. 131-166, Academic, Orlando, Fla., 1983. Mason, R. L., and R. F. Gunst, Outlier-induced collinearities, Technometrics, 2 7, 401-407, 1985. Barnett, V., and T. Lewis, Outliersin StatisticalData, John Wiley, New York, 1978. Mehrotra, K. G., and P. Nanda, Unbiased estimation of parameBartlett, M. S., Propertiesof sufficiencyand statisticaltests, Proc. ters by order statisticsin the case of censoredsamples,BiomeR. Soc. London,Set. A, 160, 268--282, 1937. trika, 61, 601-606, 1974. Beckman, R. J., and R. D. Cook, Outlier....s, Technometrics, 25, Miller, K. S., ComplexStochastic Processes, Addison-Wesley, Read119-163, 1983. ing, Mass., 1974. Belsley, D. A., E. Kuh, and R. E. Welsch, RegressionDiagnostics: Miller, R. G., The jackknife-a review, Biometrika,61, 1-15, 1974. IdentiJ)/ingInfluential Data and Sources of Collinearity, John Mosteller, F., and J. W. Tukey, Data analysisand linear regression, Wiley, New York, 1980. Addison-Wesley, Reading, Mass., 1977. Brillinger, D. R., Time Series:Data Analysisand Theory,Holden- Priestley, M. B., Evolutionary spectra and non-stationary Day, San Francisco,Calif., 1981. processes,J. R. Stat. Soc. Set. B, 27, 204-229, 1965. Claerbout, J. F., and F. Muir, Robust modeling with erratic data, Priestley, M. B., Spectral Analysis and Time Series, Academic, Geophysics, 38, 826-844, 1973. Orlando, Fla., 1981. Cook, R. D., and S. Weisberg, Residualsand Influencein Regres- Rousseeuw,P. J., Least median of squaresregression,J. Am. Stat. sion, Chapman and Hall, New York, 1982. Assoc., 79, 871- 880, 1984. Dongarra, J. J., C. B. Moler, J. R. Bonch, and G. W. Stewart, Slepian, D., Prolate spheroidal wavefunctions, Fourier analysis, LINPACK User'sGuide,SIAM, Philadelphia,Pa., 1979. and uncertainty, V, The discrete case, Bell Syst. Tech. J., 57, 1371-1429, 1978. Efron, B., TheJackknife,theBootstrap, and OtherResampling Plans, SIAM, Philadelphia, Pa., 1982. Thomson, D. J., Spectrum estimation techniquesfor characterizaFox, A. J., Outliers in time series, J. R. Stat. Soc. Set. B, 34, tion and development of WT4 waveguide, I, Bell Syst. Tech.J., 340-363, 1972. 56, 1769-1815, 1977a. Gamble, T. D., W. M. Goubau, and J. Clarke, Magnetotellurics Thomson, D. J., Spectrum estimation techniquesfor characterizawith a remote reference, Geophysics, 44, 53-68, 1978. tion and development of WT4 waveguide, II, Bell Syst. Tech.J., 56, 1983--2005, 1977b. Gross, A.M., Confidence interval robustnesswith long-tailed
648
CHAVEET AL.: ROBUSTSPECTRAL ESTIMATION
Thomson, D. J., Spectrum estimation and harmonic analysis, M. E. Ander, Earth and Space SciencesDivision, Los Alamos Proc. IEEE, 70, 1055--1096, 1982. National Laboratory, Los Alamos, NM 87545. Tukey, J. W., Instead of Gauss-Markov, what?, in AppliedStatisA.D. Chave, AT&T Bell Laboratories, 600 Mountain Ave., tics, edited by R.P. Gupta, pp. 351--372, Amsterdam, NorthMurray Hill, NJ 07974. Holland, 1975. D. J. Thomson, AT&T Bell Laboratories, 600 Mountain Ave., Vogel, C. R., Optimal choice of a truncation level for the trun- Murray Hill, NJ 07974. cated SVD solution of linear first kind integral equationswhen the data are noisy, SIAMJ. Numer. Anal., 23, 109--117, 1986. (ReceivedDecember26, 1985; revised September 30, 1986; Zeger, S. L., Exploring an ozone spatial time series in the freacceptedOctober6, 1986.) quency domain, J. Am. Stat. Assoc.,80, 323--331, 1985.