Stage-Wise Outlier Detection in Hierarchical Bayesian ...

Stage-Wise Outlier Detection in Hierarchical Bayesian Repeated Measures Models

Mario Peruggia, Yu-Yun Ho, and Thomas J. Santner

Authors' Information and Acknowledgments Mario Peruggia is Associate Professor and Thomas J. Santner is Professor, Department of Statistics, The Ohio State University. Yu-Yun Ho is Member of the Technical Sta, Software Technology Integration Department, Bellcore. The authors would like to thank Luke Tierney for helpful and insightful comments. They also acknowledge the Hospital for Special Surgery, New York, NY for providing the cortical bone data analyzed in Section 4. Please mail all correspondence to: Mario Peruggia Department of Statistics The Ohio State University 1958 Neil Avenue Columbus, OH 43210-1247 USA (E-Mail: [email protected])

Abstract We propose numerical and graphical methods for outlier detection in hierarchical Bayes analyses of repeated measures regression data. We consider a model that allows observations on the same subject (typically a curve of measurements taken at equidistant time points or locations) to have autoregressive errors of a prespeci ed order. First-stage regression vectors for dierent subjects are \tied together" in a second-stage modeling step, possibly involving additional regression variables. Outlier detection is accomplished by embedding the null model into a larger parametric model that can accommodate unusual observations. As a rst diagnostic, we propose the examination, for each subject's curve, of the posterior probability of a rst-stage, second-stage, or neither stage outlier relative to the modeling assumptions. These three posterior probabilities are computed for each subject and displayed in a barycentric coordinate plot, a useful device for assessing within-subject and between-subject model violations. If there is evidence of a model violation at either stage, we propose a second set of numerical and graphical diagnostics. For rst stage violations, the diagnostics identify the speci c measurements that are anomalous within a curve. For second-stage violations, the diagnostics identify the curve parameters that are unusual relative to the pattern of parameter values for the majority of the curves. We give two examples to illustrate the diagnostics and how they can be computed by the Gibbs sampler.

Keywords: Autoregressive errors; Gibbs sampler; Graphical diagnostics; Model-based diagnostics; Multinomial distribution; Three-stage model.

1 Introduction This paper proposes outlier detection methods for a class of hierarchical Bayesian models that are used to analyze data consisting of repeated measurements on each of a set of subjects. Usually, the measurements are taken at ordered time points or locations and each subject's data can be regarded as a curve. For each curve, the goal of our diagnostics is either to indicate that there 1

is no evidence of model violations at any of the hierarchical stages or to identify the stage where model assumptions are violated. When a curve is determined to display model violations at a given stage, we also identify speci c causes for the model violation. The analysis of repeated measures data occurs frequently in psychology, medicine, and many other disciplines. One popular frequentist method of analysis of such data is based on the speci cation of a two-stage random eects model (Laird and Ware, 1982). Several variations on this basic theme are presented in Crowder and Hand (1990), Lindstrom and Bates (1990), and Lindsey (1993). The Bayesian analysis of repeated measures data is typically based on hierarchical generalizations of mixed eects models. Wake eld et al. (1994) present numerous models and data analysis examples based on the hierarchical approach. Two important Bayesian approaches to outlier detection with respect to a given model are as follows. One approach identi es as outliers those observations whose realized errors with respect to that model have high posterior probabilities of being \large." The fundamental idea is contained in Zellner (1975) and is applied to the linear model and more general hierarchical models in Chaloner and Brant (1988) and Chaloner (1994), respectively. Weiss (1995) extends this approach to repeated measures data. An alternative approach is to embed the null model into a larger parametric model that can accommodate unusual observations. Outlier detection then consists of parametric inference based on the extended model. For example, using variance in ation and/or location-shift extensions of normal linear models, this approach is applied in Pettit and Smith (1985), Sharples (1990), and Verdinelli and Wasserman (1991). Recently, Carota et. al (1996) provide a comprehensive account of related model elaboration methodology. Model extension techniques follow Principle D2 of Weisberg (1983) by turning a problem of null model criticism into one of parametric inference. However, because they use more complex models, Bayesian outlier diagnostics are, in general, more computationally intensive than their 2

frequentist counterparts. Thus Bayesian diagnostics need not display the computational simplicity of Weisberg's Principle D3. Our paper uses the second approach, extension of a null model, to analyze a three-stage hierarchical Bayesian model for repeated measures data. Stage I of the model (the likelihood) speci es a regression with curve-speci c coecients and allows autoregressive measurement errors of a known order to account for dependencies among the observations on the same curve. Stage II of the model describes the variability in the (random) curve-speci c regression coecients using a second set of (Stage II) regressors and unexplained random variation. Lastly, Stage III of the model speci es hyperpriors for the hyperparameters in Stage II. We introduce both numerical and graphical diagnostics. For each curve, our goal is to determine whether that curve contains a measurement error outlier (a Stage I violation) or whether the regression coecient vector for that curve is, after accounting for regressor variables, inconsistent with the regression vectors of the remaining curves (a Stage II violation), or whether neither type of violation is indicated. We also provide tools to identify speci c sources of Stage I and II violations. This is accomplished by introducing two location-shift outlier indicators for each curve, one at each stage. The rst-stage outlier indicator equals unity when a measurement error is present; the second-stage outlier indicator equals unity when a regression coecient error is present; both indicators are equal to zero when the basic model is adequate. Then, as an example, one useful diagnostic is the vector of the posterior probabilities that these indicator variables of model stage violation are unity; these posterior probabilities are calculated for each curve given the observations on all curves. An initial version of this approach, using a model with a dierent number of stages, is given in Ho et al. (1995) where it is applied to an arti cial example similar to the introductory example of Section 4. Our null model is more general than those studied by either Sharples (1990) or Verdinelli and Wasserman (1991). In addition, we follow principle D4 of Weisberg (1983), emphasizing the use 3

of graphical summaries of our diagnostics. These summaries are a contribution to the dicult problem of determining and describing what does or does not constitute a representative curve from a set of longitudinal data (Jones and Rice, 1992; Segal, 1994). The remainder of this paper is organized as follows. Section 2 states the hierarchical null model analyzed in this paper while Section 3 introduces the extended model and uses it to de ne diagnostics. Section 4 rst provides a simple example illustrating the diagnostic in an ideal set-up and then applies the technique to a set of bone data that was analyzed in Peruggia et al. (1994). Some extensions are discussed in Section 6.

2 The Basic Hierarchical Bayes Model We use a hierarchical regression model with random coecients to describe data in which each of a collection of subjects contributes multiple responses (\curves") in time or space. The random coecients of the dierent subjects are related, in part, by their regression on subject and coecient speci c covariates. To formally describe the model, let I denote the number of subjects, 0n denote the vector of zeroes of length n, 1n denote the vector of ones of length n, I n denote the identity matrix of order n, and diag(v) denote a diagonal matrix with elements on main diagonal given by the vector v = (v1 ; : : : ; vm ). Also let Nn (; ) denote the multivariate normal distribution of dimension n,

and IG(a; b) denote the inverse gamma distribution (with shape parameter a and scale parameter b) whose density is given by p (w j a; b) = (?(a)ba wa+1 )?1 exp(?1=wb); for w > 0.

Stage I: For i = 1; : : : ; I; Y i = X i i + i ;

(2.1)

where Y i = (Yi;1 ; : : : ; Yi;J )> is the vector of measurements for the ith curve, X i is a Ji L design i

matrix, i is an L 1 vector of parameters, and i is a Ji 1 vector of measurement errors having 4

the autoregressive structure i;j = 1 i;j ?1 + 2 i;j ?2 + + P i;j ?P + "i;j ; for j = P + 1; : : : ; Ji ;

with the "i;j independent and identically distributed N (0; "2 ); for i = 1; : : : ; I and j = 1; : : : ; Ji .

Stage II: Let = ( >; : : : ; >L )>, = ( [1]; : : : ; [L])>, and = ( ; : : : ; P )>. Then 2

1

2

2

1

il ; 2 ind N z >il l ; 2 [l] ; for i = 1; : : : ; I and l = 1; : : : ; L,

where z il is a Kl 1 vector of covariates speci c to the lth component of the Stage I regression vector for curve i and l is the corresponding Kl 1 hierarchical regression coecient vector, NP "2

0P ; = diag

2 [1]; : : : ; 2 [P ] ;

IG(a"; b"):

(2.2)

The hyperparameters for the Stage I model parameters, ; a", and b" , are assumed to be known.

Stage III

l ind NK

0K ; diag [l; 1]; : : : ; [l; Kl ] ; for l = 1; : : : ; L, 2

l

2

l

2 [l] ind IG(al ; bl ); for l = 1; : : : ; L.

(2.3)

The hyperparameters 2 [l; k], al , and bl , for l = 1; : : : ; L and k = 1; : : : ; Kl , are assumed to be known. Implicit in this model speci cation is the assumption that the regression coecients il , while curve-speci c, are nevertheless related to one another. This is re ected in two ways. First, the means of the normal prior distributions for the il depend on curve-speci c covariates, z il , through a Stage II regression structure; the corresponding regression parameters, l , exhibit a stochastic stability across subjects in that they follow a common normal distribution. Second, additional dependence is introduced between the parameters il by assuming that their variances, 2 [l], have the same inverse gamma distribution for each curve. 5

Model (2.1){(2.3) simpli es considerably when the measurement errors are independent and there is no other curve-speci c covariate information. Setting the order of the autoregressive lter, P , to zero in Stage I and then Kl = 1, zil = 1, zil> l = 0;l in Stage II, the model reduces to:

Stage I: For i = 1; : : : ; I , Y i = X i i + "i ;

(2.4)

where "i ind NJ (0J ; "2 I J ): i

i

i

Stage II i j 0 ; iid NL 0; = diag 2 [1]; : : : ; 2 [L] ; for i = 1; : : : ; I; "2

IG(a"; b"):

(2.5)

The hyperparameters a" and b" are assumed to be known.

Stage III 0 NL

0L; 0 = diag 0 [1]; : : : ; 0 [L] ; 2

2

2 [l] ind IG(al ; bl ); for l = 1; : : : ; L:

(2.6)

The hyperparameters 0 , al , and bl , for l = 1; : : : ; L, are assumed to be known.

3 Location-Shift Outlier Detection We propose model-based diagnostics for detecting location-shift outliers. The extended version of Model (2.1){(2.3) contains two outlier indicators for each curve, one at each of Stages I and II, and two corresponding location shifts. Our diagnostics are based on the posterior distributions of the outlier indicators and the location shifts. The extended model is as follows. 6

Stage I: For i = 1; : : : ; I , Y i = X i i + y ci + i ;

(3.7)

i

where y takes value 0 or 1, ci is a Ji 1 measurement-shift vector, and i

i;j = 1 i;j ?1 + 2 i;j ?2 + + P i;j ?P + "i;j ; for j = P + 1; : : : ; Ji ;

with the "i;j independent and identically distributed N (0; "2 ); for i = 1; : : : ; I and j = 1; : : : ; Ji .

Stage II: Let d = (d> ; : : : ; d>L ; : : : ; d>I ; : : : ; d>IL)>. Then 11

1

1

N z >il ( l + dil ); 2 [l] ; for i = 1; : : : ; I and l = 1; : : : ; L, il ; d; 2 ind i

where takes value 0 or 1 and dil is a Kl 1 regression-shift vector, i

ci ind NJ c1J ; c2 I J ; for i = 1; : : : ; I; i

i

i

NP 0P ; = diag [1]; : : : ; [P ] ; 2

"2

2

IG(a"; b" ):

(3.8)

The hyperparameters c, c2 , 2 [p], for p = 1; : : : ; P , a" , and b" are assumed to be known.

Stage III:

l ind NK

l

0K ; diag l

2 [l; 1]; : : : ; 2 [l; Kl ] ; for l = 1; : : : ; L;

2 [l] ind IG (al ; bl ) ; for l = 1; : : : ; L;

dil ind NK d [l]; diag d2[l; 1]; : : : ; d2 [l; Kl ] ; for i = 1; : : : ; I and l = 1; : : : ; L; l

where d [l] is a Kl 1 vector, (y ; ; 1 ? y ? ) j py ; p iid Multinomial (1; py ; p ; 1 ? py ? p ) ; for i = 1; : : : ; I:(3.9) i

i

i

i

The hyperparameters 2 [l; k], al , bl , d [l], and d2 [l; k]; for l = 1; : : : ; L and k = 1; : : : ; Kl , are assumed to be known. 7

Stage IV: (py ; p ; 1 ? py ? p ) Dirichlet (q1 ; q2 ; q3 )

(3.10)

where the positive hyperparameters , q1 , q2 , and q3 , with q1 + q2 + q3 = 1 are assumed to be known. Here we take the usual parameterization for the Dirichlet(s1 ; s2 ; s3 ) density, viz,

Q

P

?( 3i=1 si ) 3i=1 (wis ?1 =?(si )) for non-negative wi satisfying w1 + w2 + w3 = 1. i

Some key features of Model (3.7){(3.10) are the following. a. A Stage I mean shift can occur for each curve with an unknown probability py ; y = 1 i

indicates that one or more components of the ith curve exhibits an anomaly. The magnitude of the shift, ci , can vary from curve to curve and across components within a curve. b. A shift can occur in the Stage II regression parameters of each curve with an unknown probability p ; = 1 indicates that one or more of the regression parameters of the ith i

curve exhibits an anomaly. The magnitude of the shift, dil , is both curve and regression component speci c. c. For convenience, each curve is assumed to have either a Stage I deviation or a Stage II deviation or neither. Accordingly, the deviation indicator for the ith curve, (y ; ; 1 ? i

y

i

i

? ), is given a single-trial, multinomial distribution with cell probabilities that specify i

the prior probabilities of (Stage I deviation, Stage II deviation, no deviation). It is equally possible to allow simultaneous Stage I and Stage II deviations by adding a cell representing this outcome to the multinomial distribution.

As a rst diagnostic, we suggest the examination of the posterior probabilities

pi(I ) ; pi(II ) ; pi(;) = P (y = 1; = 0jy ); P (y = 0; = 1jy); P (y = 0; = 0jy ) ; (3.11) i

i

i

i

i

i

for each curve i. These probabilities can be conveniently displayed using barycentric coordinates. This display provides a global assessment of the presence of outliers relative to Model (2.1){(2.3) 8

in light of the data. This follows since pi(I ) is the probability that curve i has one or more components that contain a measurement error, pi(II ) is the probability that curve i has one or more anomalous Stage II regression coecients, and pi(;) is the probability that curve i exhibits

neither Stage I nor Stage II model inadequacies. Thus, curve i is judged to contain a Stage I outlier when the posterior probability of [y = 1] is high; curve i is judged to contain a Stage II i

outlier when the posterior probability of [ = 1] is high; there is no evidence of lack of t for i

curve i when both indicators are equal to zero with high posterior probability. Once there is evidence of a Stage I outlier in the i-th curve, our second diagnostic proposal seeks to determine the component(s) of Yi where the model violation(s) occur. We do this by examining the posterior distributions of the shift vector y

i

ci :

(3.12)

Similarly, when there is evidence that a Stage II outlier occurs in curve i, the posterior distribution of the regression coecient shift vector

i

dil

(3.13)

can be examined to gain further insight about the nature of the deviation. Examples will be given in Sections 4.1 and 4.2. Given M independent draws from the posterior distributions of the parameters in Model

oM

n

(3.7){(3.10), i.e., i(m) ; ci(m) ; (m) ; "2(m) ; (m) ; 2(m) ; fdi(m) gIi=1 ; fp (m) gIi=1 ; fpy(m) gIi=1 m=1 , RaoBlackwellized estimators of the posterior probabilities (3.11) and of the means of the shift vectors i

i

(3.12) and (3.13) are given in the Appendix.

4 Worked Examples This section illustrates the use of the proposed diagnostics in two examples. The rst example is constructed to evaluate the performance of the diagnostics in a case that can be easily understood 9

graphically. The second example reconsiders a set of cortical bone thickness data previously analyzed in Peruggia et al. (1994). In both these examples the Gibbs sampler is used to estimate posterior quantities.

4.1 Simple Linear Regression with Repeated Measures The data set consists of 20 curves, each having 10 measurements. For j = 1; : : : ; 10, the ith curve, Y i , was generated to have j th component Yi;j = i1 + i2 (j=2) + i;j ;

(4.14)

where the i;j are independent N (0; 0:1). The intercepts, i1 , were generated independently as N (0; 10:01) and the slopes, i2 , were generated independently as N (0:4; 10:01).

We introduced an error in curve Y 1 by adding 6.0 to its 1st component. We set the slope of curve Y 19 , 19;2 , equal to 2.0 and the intercept of curve Y 20 , 20;1 , equal to 3.0. Both of these values are large relative to the corresponding ones for the other curves. Figure 1 shows a simultaneous plot of Y 1 , Y 19 , Y 20 , and a selection of four other \typical" curves. We based our analysis on the independent measurement error Model (2.4){(2.6). The values of the hyperparameters in the expanded version of Model (2.4){(2.6) were set as follows. In Stage IV, the prior mean probabilities of a Stage I error and a Stage II error were both set equal to 0.15 (q1 = 0:15 = q2 ); thus the prior mean probability of a curve satisfying both the Stage I and II assumptions is 0.70 (q3 = 0:70). We set = 10 in Equation (3.10); this value guarantees that the event [The proportion of Stage I outliers exceeds 30%] has prior probability 0.1. The prior probability of the event [The proportion of Stage II outliers exceeds 30%] is automatically also 0.1. Each component of a Stage I shift vector, ci , was given prior mean zero, a null value representing no shift. The same was assumed for the Stage II shift vector di |its prior mean was taken to be d = (d [1]; d [2])> = (0; 0)> . The remaining prior variances and inverse gamma parameters were set to produce an informative, albeit diuse, prior for the location shifts ci and di , for the 10

Table 1: Estimated Posterior Probabilities (pbi(I ) ; pbi(II ) ; pbi(;) ), i = 1(1)20, for the Example of Section 4.1. i pbi(I ) pbi(II ) pbi(;)

1 2|18 19 20 1.0 min = 0:000 max = 0:000 0.0 0.0 .0 min = 0:018 max = 0:033 0.730 0.998 .0 min = 0:967 max = 0:982 0.270 0.002

mean regression parameter 0 , for the measurement error variance "2 , and for the variances of the regression parameters 2 [l]. The shape and scale parameters of all inverse gamma distributions appearing in Stages II and III were set equal to 1.0; for these values the mass of the inverse gamma distribution is shifted to the \right" to such an extent that neither the mean nor the variance exist. In Stage II the prior variance of the shifts was taken to be c2 = 1:0, in Stage III we took 20 [1] = 25:0, 20 [2] = 4:0, d2 [1] = 4:0, and d2 [2] = 1:0.

We used the Gibbs sampler to obtain draws from the posterior distribution. Two decisions have to be made to implement this algorithm: the length of the warm-up (transient) phase of each run, and the method of analysis of the draws generated by the sampler. Based on standard diagnostics, we established that the rst 5,000 iterations formed a conservative warm-up phase. and a subsampling interval of length 100 cycles gave approximately independent draws from the posterior. We based all subsequent analyses on 500 sets of values, 100 cycles apart. For each curve, Table 1 lists and Figure 2 plots in barycentric coordinates, the estimates (A.22){ (A.24) of the posterior probabilities (pi(I ) ; pi(II ) ; pi(;) ). Roughly, a curve is declared a measurement error outlier (regression coecient outlier) if the point (pbi(I ) ; pbi(II ) ; pbi(;) ) is close to vertex P (I ) (P (II ) ) and the curve shows no evidence of Stage I or Stage II violation if this estimate is close to vertex P (;) . Thus, curve Y1 is strongly indicated to contain one or more components with measurement errors. Similarly, curves Y19 and Y20 are singled out as regression coecient outliers. There is no evidence of model inadequacy for any other curves. 11

To provide more detailed information about the nature of the violations identi ed in Figure 2, we estimated the means of the posterior distributions of the Stage I and Stage II shift components (Formulas (A.22) and (A.23)). Figure 3 demonstrates one straightforward method to display the estimated Stage I mean shifts, Efy ci j yg, for this example using an index plot in which the i

ten estimated values for each curve are connected. Clearly, Y 1 is designated as a Stage I outlier because of its rst component, while Y 2 ; : : : ; Y 20 show no evidence of outlyingness. Similarly, we estimated the means of the posterior distributions of the Stage II shifts in the intercept and slope coecients i1 and i2 which are Ef di1 j yg and Ef di2 j yg, i

i

respectively. While the pairs of these values can be displayed in a scatterplot, we save space by merely listing the expected (intercept, slope) shifts: (0:0; 0:0) for curves 1{18, (0:22; 0:93) for Y 19 and (2:78; ?0:09) for Y 20 . These values clearly show that Y 19 has a non-conforming slope

while Y 20 has an aberrant intercept. In addition, the 0.22 posterior mean of the intercept for Y 19 highlights the fact, clearly visible in Figure 1, that the intercept of Y 19 is further away from the body of intercepts for all curves not having Stage I anomalies except Y 20 .

4.2 Measuring Bone Strength The data for this example come from an observational study of patients admitted for evaluation and possible hip replacement surgery at the Hospital for Special Surgery in New York City. As part of the evaluation process, a series of CAT scans were made of the cross section of each patient's femur in the area near the lesser trochanter (a section of the femur close to the hip joint). Figure 4 displays a typical bone cross-section with two clearly delineated regions: a central area of honeycomb-like trabecular bone and a surrounding area of denser cortical bone. Peruggia et al. (1994) analyzed these data to identify the factors associated with the distribution of the thickness of cortical bone and bone strength. For simplicity, we consider the identi cation of outliers for only one of the responses analyzed by Peruggia et al.|the thickness 12

of the cortical bone. The thickness was measured counterclockwise starting from an anatomical landmark along a series of 72 equally spaced rays emanating from the centroid of each section. Figure 4 shows, for a typical section, the landmark (denoted by a small circle), the centroid, and the rays. To explain dierences in cross-sections, the risk factors available were the subject's (centered and scaled) age (AGE), gender (G), and diagnosis (DX). Here, AGE was centered so that AGE = 0 corresponded to a 45 year-old subject and was scaled to range over the interval (?1; 1); the latter facilitated the prior modeling of the AGE-related regression coecients. There were three DX categories: osteoarthritis (OA), juvenile rheumatoid arthritis (JRA), and other (OTH). OA is the most common cause of hip replacement surgery in older patients; its etiology is the breakdown of cartilage in the joint, typically due to injury or wear. JRA is a congenital disease which manifests itself early in life and aects the growth of the hip bones and connective tissue. Patients in the OTH category tend to have acute problems, such as injuries or bone cancer; prior to their hospitalization they can be regarded as nearly normal. The circular character of the cortical shell thickness suggested the following form for Stage I of the model. The ith curve was summarized by the curve-speci c coecients (i ; i1 ; i2 ; i3 ; i1 ; i2 ; i3 ) from the Fourier t Yi;j = i +

X 3

f =1

[if cos (2f j=72) + if sin (2f j=72)] + i;j

(4.15)

for i = 1; : : : ; 50 and j = 1; : : : ; 72. Independence of the deviations from the Fourier series model was not tenable and we based the within-curve errors on the autoregressive model i;j = 1 i;j ?1 + 2 i;j ?2 + "i;j :

(4.16)

In Stage II, the vector of overall means for the 50 curves, (1 ; : : : ; 50 ), was regressed on

;1 + ;2 I[G = MALE] + ;3 I[DX = OA] + ;4 I[DX = JRA] + ;5 AGE + ;6 AGE2 (4.17)

13

where I[E ] is the indicator function of the event E: For each patient, Equation (4.17) is a quadratic regression model in AGE whose intercept is gender and diagnostic-group speci c (additive in these two variables). In particular, ;1 is the intercept for a 45 year-old female subject in the OTH diagnosis group while ;1 + ;3 is the intercept for a 45 year-old female with OA. The same regression variables (with harmonic-speci c regression coecients) were also used to summarize the variability in the 501 vectors corresponding to each of the six remaining harmonic coecients. Peruggia et al. (1994) contains additional detail. We are interested in determining outliers relative to the null Model (4.15){(4.17). In implementing the Gibbs sampler, we derived the posterior distributions of each parameter given the remaining parameters and the data using the conditional likelihood approximation to the full likelihood given the rst two observations (Box et al. 1994). The resulting distributions have analytically computable conjugate forms. See McCulloch and Tsay (1994) for another use of this approximation. The hyperparameters that complete speci cation of the model were set to induce a rather diuse prior. In Stage II, the components of the location shift ci were assigned common mean c = 0 with relatively large variance c2 = 4:0. The prior variances of the autoregressive pa-

rameters were set to be 2 [1] = 1 = 2 [2] and, for the reasons stated in Section 4.1, the parameters of all inverse gamma distributions were set equal to unity. In Stage III, the prior variances of the Stage II regression coecients for the overall means (1 ; : : : ; 50 ), , were set as ( 2 [; 1]; : : : ; 2 [; 6]) = (25:0; 1:0; 1:0; 1:0; 1:0; 1:0). The relatively large variance for the intercept of the regression of (1 ; : : : ; 50 ), which is the mean thickness for a 45 year old female with OTH, allowed adequate support for values far from zero. The prior support of the other regression coecients was restricted to smaller values. Each vector of prior variances for the re-

n

o

gression coecients of the other six harmonics, ( 2 [l; 1]; : : : ; 2 [l; 6]) : l 2 f1 ; : : : ; 3 g , was set equal to (4:0; 1:0; 1:0; 1:0; 1:0; 1:0). The means of all Stage II deviation magnitudes, dil [k], were set 14

?

equal to zero (d [l] = 06 ) and their prior variances were set as follows: d2 [; 1]; ; : : : ; d2 [; 6] =

?

(25:0; 1:0; 1:0; 1:0; 1:0; 1:0), and d2 [l; 1]; ; : : : ; d2 [l; 6] = (4:0; 1:0; 1:0; 1:0; 1:0; 1:0), for l 2 f1 ; : : : ; 3 g. The variances of the intercept terms were set dierently than the variances of the coecients of the remaining regressors for the same reasons that suggested a similar assumption about the 2 [l; k]. In Stage IV, the prior means of a measurement error and of a regression coecient error

were set equal to 0.15 (q1 = 0:15 = q2 ) and was selected to be 10 as in Subsection 4.1 (see also Peruggia et al. 1994). A FORTRAN program was written to implement Gibbs sampling from the conditional distributions of each parameter given the data and the other parameters based on Model (4.15){(4.17). The length of the warmup phase of the process and the spacing of the subsample were determined using standard diagnostic methods. The rst 6,000 iterations were used as a warmup for the algorithm. The estimates stated below were based on subsequent cycles of the Gibbs sampler spaced 200 apart, for a total of 100 samples. Figure 5 is the analogue, for the bone data, of the barycentric coordinate plot in Figure 2. It displays estimates of the posterior probabilities (pi(I ) ; pi(II ) ; pi(;) ), for i = 1; : : : ; 50. The estimated probabilities of measurement error, pbi(I ) , are virtually zero for all the bone sections. This result means that no curves are indicated as containing Stage I (measurement error) outliers and thus the 7-term Fourier series expansion in Equation (4.15) approximates the thickness curve well for all the bone sections. Our diagnostic identi es six bone sections (2, 6, 7, 9, 21, 48) as Stage II outliers. Table 2 lists the estimated posterior probabilities (pbi(I ) ; pbi(II ) ; pbi(;) ) for the anomalous bone sections identi ed in Figure 5 together with the \nearest" of the remaining bone sections, Number 15. We expect these bone sections to have one or more Fourier coecients that dier from the pattern of the corresponding coecients for sections having similar covariates. We discuss several methods that provide more detailed information about the seven bone sections identi ed as Stage II outliers. First, we assess how the presence of a Stage II violation 15

Table 2: Estimated Posterior Probabilities pbi(I ) , pbi(II ) , and pbi(;) for Bone Sections 2,6,7,9,15,21,48. i 6 7 2 9 48 21 15 pbi(I ) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 pbi(II ) 1.00 1.00 0.97 0.95 0.89 0.79 0.14 pbi(;) 0.00 0.00 0.03 0.05 0.11 0.21 0.86

is re ected in the Stage I coecients of the individual curves. We do this by contrasting two estimates of each Stage I regression coecient based on the Gibbs sampler draws. For example,

?1 P100 (m) , > d for the intercept the rst estimate is b[1] m=1 i = z i ( b + di ) where b = 100 P (m) d(m) , and f (m) ; f(m) gI ; fd(m) gI g100 are the 100 independent d di = 100?1 100 m=1 i i i=1 i i=1 m=1 i

i

i

[2] > draws from the Gibbs sampler for those coecients. The second estimate is b[2] i = z il b ; bi ignores the (possible) location shift correction. Similar pairs of estimates can be constructed

for each harmonic coecient if and if . Figure 6 displays an index plot, over the 50 bone sections, of the dierence between the [1] and the [2] estimates for each of the seven Stage I Fourier coecients. We anticipate the dierence between these two estimates to be large in the presence of a Stage II violation since the second estimate omits the location shift term. For example, Figure 6 shows that the Fourier intercept, , is ill- t by the Stage II regression for bone sections 6, 7 and 48. Similarly all of the cosine harmonic coecients are poorly t for bone section 48. Figure 7 gives an overall view of how the presence of deviations in the Stage II regression model aects the t of the data. For each bone section identi ed as an outlier, we display index plots of the raw thickness curve (solid line) overlaid with two estimates b[ik] +

Xh 3

f =1

i b[ifk] cos (2f j=72) + bif[k] sin (2f j=72) ; for k = 1; 2

(4.18)

of the mean curve. The estimates, for k = 1; 2, are based on the posterior estimates of the Fourier coecients de ned in the previous paragraph. For comparison, the same display is also presented for a non-outlier, bone section 17. The lack of t of the Fourier model subject to the Stage II 16

regression without shift adjustment (dashed line) is most pronounced for bone sections 6 and 7. The same lack of t is also very clear for bone sections 2, 9, 21 and 48 (in the region of maximum thickness). Notice that these bone sections are precisely those whose barycenters are closest to the p(II ) vertex in Figure 5. Lastly, we investigate speci c causes for lack-of- t in the individual curves identi ed as Stage II outliers. Recall that Equation (4.17) is a quadratic in AGE with gender and diagnostic-group speci c intercept. We used Equation (A.23) to estimate the shifts in the intercept, the AGE, and the AGE2 coecients for every patient. For example, consider a female patient with OA; she has intercept, linear, and quadratic regression coecients that are ;1 + ;3 , ;5 , ;6 , respectively. We estimated the posterior means of these three Stage II regression coecient shifts from di1 + di3 (= intercept shift), di5 (= AGE shift), and di6 (= AGE2 shift). i

i

i

i

For the six primary outliers identi ed in Figure 5, Figure 8 displays simultaneous boxplots of the estimated posterior expected shifts of the intercept and of the linear and quadratic coecients of AGE. The plots for the Stage I Fourier intercept, , show that bone sections 6 and 7 ill t the Stage II subject-speci c intercept and coecients of AGE and AGE2 . The t for these bone sections is also poor with respect to several of the Stage I Fourier coecients for all three Stage II predictors. Bone sections 2, 9, 21, and 48 also exhibit extreme expected shift values for various combinations of Stage I Fourier coecients and Stage II predictors. Ultimately, of course, direct examination of the cortical thicknesses of the subjects is also crucial but we omit such a discussion here because of space considerations.

5 Discussion We mention but a few of the many methods to extend the basic model (2.1){(2.3). If the withincurve measurements are not equally spaced, then an autoregressive error structure could still 17

be speci ed as in Jones and Boadi-Boateng (1991). Given the relevant prior information, one could assume a more complicated (and possibly random) covariance structure for the regression parameters il . A discussion of various possible speci cations that would also apply to our setting is given in Section 2.3 of George and McCulloch (1993). Another useful extension, mentioned earlier, is to allow an observation to be simultaneously a Stage I and a Stage II outlier. Lastly we note that the basic model can be expanded to allow outlier detection in n-stage hierarchical Bayesian models with n > 3. Arguably, the most practical such extension is to nstage hierarchical models where the parameters are subject-speci c only up to Stage II. For such models, our diagnostic method can be extended to identify simultaneously Stage I and Stage II outliers and check for assumption violations at Stage III or higher. The identi cation of Stage I and Stage II outliers can be carried out in a manner similar to that of Section 3. To check for assumption violations at Stage III or higher, deviation indicators and deviation magnitudes might be added to appropriate parameters of the prior distributions for which it is desired to detect deviations. This extension would introduce a tremendous amount of complexity to the problem if it were to be solved analytically. However, the development of a Gibbs algorithm to generate draws from such a model would be feasible. All that is needed to add an extra stage to the model is a corresponding extra subroutine, but no changes have to be made to code previously developed.

REFERENCES Box, G. P., Jenkins, G. M., and Reinsel, G. C. (1994), Time Series Analysis { Forecasting and Control (3rd edition), Englewood Clis, NJ: Prentice-Hall.

Carota, C., Parmigiani, G., and Polson, N. G. (1996), \Diagnostic Measures for Model Criticism," Journal of the American Statistical Association, 91, 753{762. Chaloner, K. (1994), \Residual analysis and outliers in Bayesian hierarchical models," in Aspects of Uncertainty. A Tribute to D. V. Lindley, 149{157, eds. Freeman, P. R., and Smith, A.

18

F. M., Chichester, UK: Wiley. Chaloner, K., and Brant, R. (1988), \A Bayesian Approach to Outlier Detection and Residual Analysis," Biometrika, 75, 651{659. Crowder, M. J. and Hand, D. J. (1990), Analysis of repeated measures, New York: Chapman & Hall. George, E. I., and McCulloch, R. E. (1993), \Variable Selection Via Gibbs Sampling," Journal of the American Statistical Association, 88, 881{889.

Ho, Y.-Y., Peruggia, M., and Santner, T.J. (1995), \Diagnostics for Hierarchical Bayesian Repeated Measures Models," in 27th Symposium of the Interface: Computing Science and Statistics.

Jones, R. H., and Boadi-Boateng, F. (1991), \Unequally spaced longitudinal data with AR(1) serial correlation," Biometrics, 47, 161{175. Jones, M. C., and Rice, J. A. (1992), \Displaying the important features of large collections of similar curves," The American Statistician, 46, 140{145. Laird, N. M., and Ware, J. H. (1982), \Random-eects models for longitudinal data," Biometrics, 38, 963{974.

Lindsey, J. K. (1993), \Models for repeated measurements," Oxford, UK: Clarendon Press. Lindstrom, M. J., and Bates, D. M. (1990), \Nonlinear mixed eects models for repeated measures data," Biometrics, 46, 673{687. McCulloch, R. E., and Tsay, R. S. (1994), \Bayesian Analysis of Autoregressive Time Series via the Gibbs Sampler," Journal of Time Series Analysis, 15, 235{250 Peruggia, M., Santner, T. J., Ho, Y. Y., and Macmillan, N. J. (1994), \A Hierarchical Bayesian Analysis of Circular Data With Autoregressive Errors: Modeling the Mechanical Properties of Cortical Bone," in Statistical Decision Theory and Related Topics V, 201{220, eds. Gupta, S. S., and Berger, J. O., New York: Springer-Verlag. 19

Pettit, L. I., and Smith, A. F. M. (1985), Outliers and In uential Observations in Linear Models. In Bernardo, J. M., DeGroot, M. H., Lindley, D. V., and Smith, A. F. M., editors, Bayesian Statistics II, pages 473{494. Elsevier Science Publisher B.V.(North-Holland).

Segal, M. R. (1994), \Representative curves for longitudinal data via regression trees," Journal of Computational and Graphical Statistics, 3, 214{233.

Sharples, L. D. (1990), \Identi cation and accommodation of outliers in general hierarchical models," Biometrika 77, 445-453. Verdinelli, I., and Wasserman, L. (1991), \Bayesian Analysis of Outlier Problems Using the Gibbs Sampler," Statistics and Computing, 1, 105{117. Wake eld, J. C., Smith, A. F. M., Racine-Poon, A., and Gelfand, A. E. (1994), \Bayesian Analysis of Linear and Non-linear Population Models by using the Gibbs Sampler," Applied Statistics, 43, 201{221.

Weisberg, S. (1983), Comment on \Developments in linear regression methodology: 1959-1982," Technometrics, 25, 240{244.

Weiss, R. E.(1995) \Residuals and Outliers in Repeated Measures Random Eects Models," Technical Report, UCLA School of Public Health, Department of Biostatistics. Zellner, A. (1975), \Bayesian analysis of regression error terms," Journal of the American Statistical Association, 70, 138-144.

A Appendix: Statement of Estimators oM

n

Given M independent draws i(m) ; ci(m) ; (m) ; "2(m) ; (m) ; 2(m) ; di(m) ; p (m) ; py(m) ; i = 1; ; I m=1 from the posterior of (3.7)|(3.10) we use the following Rao-Blackwell estimators for the outlier i

indicator probabilities:

i

M p1 i(m) ; ci(m) ; (m) ; "2(m) ; (m) ; 2(m) ; py(m) X 1 ^ ; pî(I ) = P (y = 1; = 0 j y) = (m) M i

i

i

Pi

m=1

20

(A.19)

M p2 i(m) ; (m) ; "2(m) ; (m) ; 2(m) ; di(m) ; p (m) X 1 ^ pî(II ) = P (y = 0; = 1 j y) = ; (A.20) (m) M i

i

i

pî(;) = P^ (y = 0; = 0 j y) = i

i

where

P m=1 (m) (m) 2(m) i (m) 2(m) (m) (m) M p3 i ; ; " ; ; ; py ; p 1 X ; M m=1 Pi(m) i

(A.21)

i

p1 ( i ; ci ; ; "2 ; ; 2 ; py ) = py expf?T1 ? T2 g; i

i

p2 ( i ; ; "2 ; ; 2 ; di ; p ) = p expf?T3 ? T4 g; i

i

p3 ( i ; ; "2 ; ; 2 ; py ; p ) = (1 ? py i

i

i

? p ) expf?T ? T g; 3

i

2

Pi = p () + p () + p (); 1

and T1 =

2

3

0 1 J P X X @yi;j ? xi[j; ] i ? ci [j ] ? p (yi;j?p ? xi [j ? p; ] i ? ci[j ? p])A

1

2

i

2a2 j =P +1

p=1

L 1 X il ? zil> l ; l 2 [l] 1 0 J P X X @yi;j ? xi[j; ] i ? p (yi;j?p ? xi[j ? p; ] i )A = 21

;

2

T2 =

2

=1

i

T3

a2 j =P +1

T4 =

p=1

L 1 X il ? zil> l ? zil>dil 2 [l]

2

l=1

2

2

;

:

The Rao-Blackwell estimator of a mean Stage I shift, E [y ci [j ] j (m) ; y], is given by i

PM

(m) (m) m=1 E [ci [j ] j ; y] y PM (m) m=1 y i

pî I ;

(A.22)

( )

i

where

m i c a m m E ci [j ] ; y = ry [j ] + c c + " m c + " m ? x [j; ] m ? PP m (y ? x [j ? p; ] m ? c m [j ? p]). h

(

i

i

(

i

)

(

p=1 p

)

(

2(

2

with ry(m) [j ] = yi;j

2(

2

)

)

)

)

i

i;j ?p

2(

2

(

i

i

)

)

(

i

)

Similarly, the estimator of a mean Stage II shift, E [ dil j (m) ; y], is given by PM E [d [k] j (m) ; y] (m) il m=1 p^ i(II ) PM (m) i

i

m=1

i

21

(A.23)

where E

h

i dil [k] m ; y = (

)

2(m) [l] d2 [l; k](zil [k])2 (m) r [k] + 2 d [l][k] d2 [l; k] (zil [k])2 + 2(m) [l] d [l; k] (zil [k])2 + 2(m) [l] i;l

with r (m) [k] = ( il(m) ? zil> l(m) ? zil>[?k]dil(m) [?k])=zil [k]. i;l

22

8 10 6

19

20

0

2

4

Y

1

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

j/2

Figure 1: Plot of f(j=2; Yi;j ) : j = 1; : : : ; 10g for i = 1; 19; 20 and a Selection of Four Other Curves. The rst component of Y 1 is unusually large; relative to the majority of the curves, Y 19 has a large slope and Y 20 has a large intercept.

23

1.0

P(φ)

0.4

0.6

0.8

* * * *

0.0

0.2

* 19

* 20

*1

P(Ι) 0.0

P(II) 0.2

0.4

0.6

0.8

1.0

Figure 2: Barycentric Plot of the Estimated Posterior Probabilities (pi(I ) , pi(II ) , pi(;) ) for i = 1; : : : ; 20. Curve Y 1 is identi ed as a measurement error outlier; curves Y 19 and Y 20 are regression coecient outliers.

24

5 4 3 2 1

Expected Stage I shifts

1

0

2-20

2

4

6

8

10

Position

Figure 3: Index Plot of Estimated Expected Stage I shifts y ci for i = 1; : : : ; 20. The line connecting the 10 expected shifts for Y 1 is labeled \1." The lines connecting the expected shifts i

for curves Y 2 {Y 20 are indistinguishable and essentially lie on the horizontal line through zero; the (common) line for these cases is labeled \2-20."

25

x

x

x

x

x

x

x

x

x

x

x

x

x

x x

x

x

x

x

f

x

o

x

x

x

x x x

x

x x

x x x x x x x

x x

x x

x

x x

x x

x x

x x

x x x

x x x

x

x x x

x

x x

x

x

x

x x

x

x

x

x

x

x

Figure 4: A Digitized Image of the Cross Section of Human Bone Through a Slice Taken at the Lesser Trochanter.

26

1.0

P(φ) * * *

0.8

* *

0.2

0.4

0.6

* 15

* 21

* 48

0.0

*9 *2 * 6 7

P(I) 0.0

P(II) 0.2

0.4

0.6

0.8

1.0

Figure 5: Barycentric Plot of the Estimated Posterior Probabilities (pbi(I ) , pbi(II ) , pbi(;) ) for i = 1; : : : ; 50. No bone sections are indicated as containing components with measurement errors. Sections 2, 6, 7, 9, 21 and 48 are strongly indicated as having regression coecient outliers. Section 15 may also contain regression coecient outliers.

27

constant

3

4

5

• 6• 7

2

• 48

1

•

•

• •

0

•

•

• •

•

•

•• • • •

-1

•

0

••

••

• ••• • • ••• • • •

•

•

••• •

• •

20

•

••

•

••

10

•

30

40

50

Subject

1st sin harmonic 3

2.5

1st cos harmonic •2

• 21

• 48

1.5

2

•9

•

•

-0.5

• •

•••

•

•

•

•

••

•

•

•• • •• •

•••

• • •

•

•

••

•

0

•

•

•

•

•

•

•

•

•

•

• •

•

•

•

•

•

•

••

•

• • • •

•

10

20

30

40

50

0

10

•

•

•

•

20

30

2nd cos harmonic

2nd sin harmonic

40

50

•

2

1

• •

•

0

•

0

• •

•

• ••

•• •

•

• • • •

•

•

•

• •• • • • • • • ••• • •• • • • •

-1

•

•

•

10

30

40

50

0

30

0.5

3rd sin harmonic

•

•• •

••• •

•

•••• • • •• • • •

•

•

• • • • • •• • • • •• • • • • •• • •• •

40

50

• • • • • ••• • • • • • • • • • • ••• • • • • •• • • • • • • • • • • • • • • • • • • • • 21 • 48

-0.5

•

••

•

0.0 •

20

3rd cos harmonic

• •• • •• •

10

Subject

-1.0

•

• • • • • •• • • • •

Subject

1

•

•

•

•6

20

• 48

•

•

•

•7 •9

•

•2

••

•

• •

•

• 21 0

• • •

• • •

•

-1

•

•

••

-2

1

• 15

•

2

• •

•

Subject

•

0

••

•

Subject

• 48

-1

• •

•• ••

•••

•2

•

•

••

• •

•

•9

0

•

•

•

••

•••

-1

0.5

•

•

•

1

• 38

• 15

-2

-1.5

•

•9 0

10

•9 20

30

40

50

0

Subject

10

20

30

40

50

Subject

Figure 6: Index Plots of the Dierences between the Estimated Fourier Coecients and their [2] Fitted Regression Points for the intersept, fb[1] i ? bi g, and the other six harmonic coecients in

(4.15). 28

10 5 0

0

5

10

15

Section # 7

15

Section # 6

40

60

0

20

40

position

position

Section # 2

Section # 9

60

10 5 0

0

5

10

15

20

15

0

40

60

0

20

40

position

position

Section # 48

Section # 21

60

10 5 0

0

5

10

15

20

15

0

40

60

0

20

40

position

position

Section # 15

Section # 17

60

10 5 0

0

5

10

15

20

15

0

0

20

40

60

0

position

20

40

60

position

Figure 7: Plots of the Raw Thickness Y i (solid line) and the Two Estimates (4.18) of the Posterior Predictive Mean Curves Based on the Stage II Regression Model in (3.8). The dotted and dashed lines use estimated regression coecients with and without Stage II location shift, respectively. Plots are provided for six bone sections strongly indicated as outliers (2, 6, 7, 9, 21, and 48), the marginally indicated outlier (15), and, for comparison, a bone section showing no indication of being an outlier (17). 29

(a)

2 48 9

48

2 48

21

21

2 48

7 9 6

9

0

2

4

6 7

21

-2

48 9

µ

α1

β1

α2

β2

9

α3

β3

(b) 7 9

7

0.2

6 21

7 6

0.0

9 21

2 6 -0.2

9 7

48 9

21

2 6

9

48

2 21

9 7

α1

0.1

0.2

µ

β1

α2

β2

(c)

2

α3

β3

6 6 7

9

6

7

9

38

9 6 36

21 6

7

21

-0.1

0.0

21 9

7 6

-0.2

7

-0.3

9

6

µ

α1

β1

α2

7 9

β2

α3

β3

Figure 8: Boxplots of Estimated Expected Stage II Shifts for Bone Sections 2, 6, 7, 9, 15, 21, and 48. Estimates are displayed for all subject-speci c intercepts (Panel (a)), and coecients of AGE (Panel (b)) and AGE2 (Panel (c)) corresponding to all Fourier coecients. 30

Stage-Wise Outlier Detection in Hierarchical Bayesian ...

Stage-Wise Outlier Detection in Hierarchical Bayesian ...

Suggest Documents

H2O: A Hybrid and Hierarchical Outlier Detection ... - Computer Science

Bayesian Outlier Detection with Dirichlet Process ... - Project Euclid

Outlier Detection in Logistic Regression

Chapter 1 OUTLIER DETECTION

Chapter 1 OUTLIER DETECTION

Chapter 1 OUTLIER DETECTION

Statistical Outlier Detection in Large Multivariate Datasets

spatial outlier detection techniqes - IJARIIE

Distance-based Outlier Detection in Data Streams

Fast Outlier Detection in High Dimensional Spaces

Bayesian Hierarchical Model for ADR detection using ...

Outlier Detection in GARCH Models - Tinbergen Institute

Parameterless Outlier Detection in Data Streams

Outlier Detection in Automatic Collocation Extraction

A Case Study in Outlier Detection

Robustness and Outlier Detection in Chemometrics

Outlier Detection in Sensor Networks - Google Sites

Outlier absorbing based on a Bayesian approach

Outlier Detection : A Survey - FTP

Outlier Detection in WSN-A Survey

Outlier Detection in Video Sequences under Affine

Robustness and Outlier Detection in Chemometrics

CURIO : A Fast Outlier and Outlier Cluster Detection Algorithm for ...

Outlier Detection Rules for Fault Detection in Solar Photovoltaic Arrays