Cluster detection diagnostics for small area health data: with reference ...

STATISTICS IN MEDICINE Statist. Med. 2006; 25:771–786 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2401

Cluster detection diagnostics for small area health data: With reference to evaluation of local likelihood models Md. Monir Hossain∗; † and Andrew B. Lawson‡ Department of Epidemiology and Biostatistics; Arnold School of Public Health; University of South Carolina; U.S.A.

SUMMARY The focus of this paper is the development of a range of cluster detection diagnostics that can be used to assess the degree to which a clustering method recovers the true clustering behaviour of small area data. The diagnostics proposed range from individual region specic diagnostics to neighbourhood diagnostics, and assume either individual region risk as focus, or concern areas of maps dened to be clustered and the recovery ability of methods. A simulation-based comparison is made between a small set of count data models: local likelihood, BYM and Lawson and Clark. It is found that local likelihood has good performance across a range of criteria when a CAR prior is assumed for the lasso parameter. Copyright ? 2006 John Wiley & Sons, Ltd. KEY WORDS:

cluster; diagnostic; spatial; small area health; count data; local likelihood; lasso

1. INTRODUCTION The local likelihood method employs a local relative risk model based on the relation of a data item to other data within a lasso distance. Previously the local likelihood (LL) model was proposed by Hossain and Lawson [1] (denoted by HL hereafter) and a limited simulation study was carried out to check how well the LL model recovers true relative risk under dierent priors for the lasso parameter. The LL approach is based on estimating a lasso distance for each region within which regions are considered to be clustered. The method has many advantages, e.g. the implementation does not require any complicated MCMC algorithm. In addition, extending the model to incorporate covariates is also straightforward. This paper develops a range of cluster detection diagnostics that can be used to assess the degree to which a clustering method recovers the true clustering behaviour of small area data in ∗ Correspondence

to: Md. Monir Hossain, Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, U.S.A. † E-mail: [email protected] ‡ E-mail: [email protected] Contract=grant sponsor: NIH; contract=grant number: 1R03CA11314-01

Copyright ? 2006 John Wiley & Sons, Ltd.

Received 24 August 2005 Accepted 1 September 2005

772

Md. M. HOSSAIN AND A. B. LAWSON

relation to LL models. The evaluation of the performance of spatial clustering algorithms is not as straightforward as relative risk estimation where goodness-of-t measures can be employed within simulations (see e.g. Reference [2] for a recent evaluation). Evaluation of cluster recovery capability depends on the underlying model for clustering and there is currently no theoretical criterion that matches all the goals of recovery. For example, it could be important to be able to recover general clustering in cancer cases (that is the overall clustering behaviour of the map), as well as localized excesses of risk. It is possible to monitor general clustering via global parameters of models, whereas for localized behaviour it may be necessary to have localized measures as well. The diagnostics that have been proposed in this paper are based on tting a smoothing model to data and then use the residuals and=or posterior summary statistics to check the ability of each model to recover the true clustering behaviour. The reason for using residuals in cluster diagnostics is because residuals contain the removed extremes risks information after smoothing. By adopting this strategy we are assuming that extreme low or high-risk areas form clusters (see, References [3, 4] for details). It is also assumed that two random eects, spatially correlated and uncorrelated heterogeneity, are capturing normal variation in risk well. Since all the considered models in this paper includes spatially correlated and uncorrelated random eects to explain the unexplained variation in disease risk. A range of models is considered for relative comparison in terms of cluster detection performance and goodness-of-t. These will include random eect models [5] and mixture models [6]. While these models were not designed to detect clusters per se, they have been proposed as clustering models and can be evaluated as such. A recent study by Best et al. [7] showed the merits of BYM (Besag, York and Mollie) models in evaluation to other contemporary spatial models including a multivariate normal geostatistical model with exponential covariance, spatial mixture model [8], partition model [9] and gamma moving average model [10]. Besides BYM, we considered the Lawson and Clark [6] model in relative comparison because in earlier work (HL) we have observed some encouraging results in relation to the BYM model. The organization of sections is as follows. In Section 2, a brief illustration of LL models and two other models used in comparison are given, followed by the description of the eastern Germany lip cancer data and simulation algorithms used. Section 3 introduces all the criteria developed for diagnosis. The results and discussions are given in Sections 4 and 5.

2. MODEL DEVELOPMENT In this section, we describe the LL models, BYM model and Lawson and Clark model as dened for count data and the data example. 2.1. LL model In data-dependent clustering, the relative risk is dened to be a function of the ‘local’ concentration of cases. The local concentration is dened by a parameter (a lasso) that denes the extent of the local concentration. Instead of specifying cluster centers, local relative risk estimator is based simply on the relation of a data item to other data within the lasso distance (: a parameter). Copyright ? 2006 John Wiley & Sons, Ltd.

Statist. Med. 2006; 25:771–786

CLUSTER DETECTION DIAGNOSTICS FOR SMALL AREA HEALTH DATA

773

In the case of count data observed within small areas, a likelihood can be constructed where the probability of a count is related to a locally-dened relative risk that depends on a lasso parameter. This could be interpreted as a Poisson likelihood, but in this case due to dependence between the relative risks based on overlapping lassos, this is regarded as a local likelihood. While specifying dependence of the relative risk on the local density of cases, it is still possible to allow for this dependence via the use of correlated prior distributions for parameters within the likelihood. Data-dependent models then are formed by a LL for the data and the inclusion of correlated prior distributions at the next level of the hierarchy within a Bayesian Hierarchical modelling setting. This approach avoids a number of problems that arise in other approaches. We assume that counts of observed and expected cases are available within n arbitrary regions and are denoted by {yi } and {ei }, i = 1; : : : ; n. Let yi and ei be observed and expected total counts, respectively, within the lasso i . The total of observed counts within the lasso i , yi , is modelled as independent Poisson variates, conditional on ei and the model parameter, i . This is the local likelihood assumption. Symbolically, yi |ei ; i ∼ Poisson(ei i ) The log relative risk, i , is assumed to be a linear function of two independent random components i and i , dened as i = ln(i ) = i + i

(1)

The i surface is assumed to be continuous and values on the surface will be correlated if locations are close. It is natural therefore to consider a probability model that allows positive spatial correlation. Because we have assumed a LL model at the rst level, then it is possible to consider a prior distribution that includes spatial correlation. The i surface prior distribution could be specied in various ways. One approach is to consider the separation distance (dij , say) between any two arbitrary locations (i; j) and the intersection of the two balls with radius i and j . Then it is possible to consider a parametric covariance function dependent on lasso sizes. However, it is simpler to adopt an approach where the smoothness of the i surface is controlled by an intrinsic Gaussian prior distribution and the strength of spatial correlation in the surface is controlled by a parameter . This is a singular distribution that is simple to implement. The CAR prior for the parameter i is given as 1 −n=2 2 [{i }| ] ∝ exp − (i − j ) 2 i¡j j∈{i } The marginal distribution has the form

i |{−i }; di ; ∼ N i ; di where, di = nj=1 I ( j ∈ {i }), and i = j∈{i } (i =di ). We denote this model as the LL-car model. The usual CAR prior distribution that imparts smooth dependence between dierent region parameters may be thought to over smooth the cluster sizes. We have also examined a prior that induces less smoothing in the lasso Copyright ? 2006 John Wiley & Sons, Ltd.

Statist. Med. 2006; 25:771–786

774


parameters: an absolute dierence prior distribution that allows jumps in risk (LL-abs; see e.g. Reference [6]). The absolute dierence prior is dened as 1 −n=2 exp − |i − j | [{i }| ] ∝ j∈{i } The other random component in (1), i , known as the unstructured component can be modelled as an exchangeable model with N(0; 2 ) prior distribution. Various Bayesian models have been proposed for relative risk estimation but few have the focus of cluster identication. The distinction between relative risk estimation and clustering is often not clearly identied in the literature and so it was felt useful to include relative risk models in the comparison with the LL models. The models included in the comparison are the BYM model and the Lawson and Clark model. A brief description of these models is as follows: The BYM [5] model considers the hierarchical Bayesian approach to account for spatially structured extra-Poisson variability in small area studies. The rst level of hierarchy, the within area variability is modelled as yi |ei ; i ∼ Poisson(ei i ) The second level, variation between areas is modelled as log-linear mixed model given as log i = vi + ui The priors for the random eects vi (structured heterogeneity) and ui (unstructured heterogeneity) are assumed to have CAR prior and normal prior distributions, respectively, as in (1). The Lawson and Clark [6] model, decomposes the log relative risks at the second level of hierarchy as log i = pi vi + (1 − pi )wi + ui The prior for the random eect wi (structured heterogeneity) is assumed to have absolute dierence prior distribution. The weighting function pi controls the mixing of two spatial components, a quadratic loss CAR prior distribution and an absolute dierence loss prior distribution. Note that when pi = 1, ∀i, Lawson and Clark model reduces to BYM model. An uninformative beta distribution is assumed as the prior distribution for the weighting function, pi . 2.2. Real data and the simulation data For checking the performance of LL models in cluster recovery, we have used eastern Germany (formerly known as East Germany) lip cancer data for the period 1980–1989. The data consists of observed counts of mortality, expected mortality and percentage of population involved in agriculture, sheries, and forestry within 219 landkriese. The expected mortality was computed from nine age and two gender groups based on the national East Germany rates for that period. Besides the real data, we have conducted a simulation study to examine how the LL models performed in comparison with other models which have been proposed and which were publicly available. We have used the lip cancer example from eastern Germany with the Copyright ? 2006 John Wiley & Sons, Ltd.

Statist. Med. 2006; 25:771–786


775

observed, expected counts, and the landkriese geographies as given. Within this study region we have simulated a range of true models for the relative risks {i }. The true risk model considers dierent areas and types of risks. There are three main areas of elevated risks: northeast with a circular excess (max : 2:5), the southeast with a (max : 4) and southwest with a cross-structured excess with two isolated maxima: 3.5 and 3.2 (Figure 5 displays the map of these risks). Given the true relative risks dened under the model, we conditionally simulated 100 sets of observed counts for each region generated from a Poisson distribution with: yij ∼ Poisson(ei it );

i = 1; : : : ; 219 and j = 1; : : : ; 100

where it is assumed to be the true value for i . 3. METHODS We develop a number of criteria to assess the ability of each model in recovering clusters. The criteria are developed separately for real data and simulated data. 3.1. Real data In classical inference for Poisson likelihood models, a set of tted values is compared to observed values by computing the residuals, dened as ri = yi − ei î where î is an estimate of i from a specic model. (Here we have dropped the subscript and afterwards to avoid notational complexity.) These crude residuals contain information about how the model deviates from the null model. As in the case of Monte Carlo testing, it is possible to generate a set of m ( j = 1; : : : ; m) simulated counts {yij∗ } from a specic model with the tted values {ei î } and hence compute simulated residuals {rij∗ = yij∗ − ei î }. For clarity, we dene the former residual as a data residual and latter residual as a simulated residual. From these residuals, it is possible to report the ranking of the data residual amongst the simulated residuals and then compute a pointwise p-value for the data point considered. This idea is an example of a parametric bootstrap and was rst illustrated by Kelsall and Diggle [11] in a disease mapping context. Algebraically, the ith p-value is given by pi =

m 1 I (|rij∗ |¿|ri |) m + 1 j=1

where I (·) is an indicator function, takes 1 as condition within parentheses is satised, otherwise 0, | · | is an absolute function and rij∗ is the simulated residual for jth data set. These rank residual p-values could be extremely high or low. We examine extremely low values (less than 0.05) where the models do not predict well. The signicance of this idea is based on the assumption that only the poor tted areas which support the alternative hypothesis, are forming clusters. We extend this measure to incorporate rst-order neighbourhood information as ni j=0 I (pij ¡0:05) Li = ni + 1 Copyright ? 2006 John Wiley & Sons, Ltd.

Statist. Med. 2006; 25:771–786

776


where {pij ; j = 0; 1; : : : ; ni } is the set of rst-order neighbour p-values of the ith region, pi0 is p-value of ith region itself and ni is the total number neighbours. The measure Li is the proportion having p-values less than 0.05 based on the rst-order neighbours. In this way, a surface of Li can be derived for each model. The above measure Li shows the grouping of extreme residuals. Obviously, these are the regions where the model is not well tted, i.e. the regions of lack-of-t. In a way, this lack-oft measure can be used as a measure of cluster. The relevance between these two measures, model tness and clusters, is based on an implicit assumption that extreme low and high risk regions where the model is likely to have poor t, are forming clusters. The measure Li is designed to exploit this connection and the residuals from a reasonably well tted model can be used for this purpose. 3.2. Simulated data The criteria that will be proposed here will be related to posterior probability assessment of the value of relative risk from the simulated data (is , i = 1; : : : ; n and s = 1; : : : ; S). From within converged posterior samples of MCMC algorithms, it is possible to estimate exceedance probabilities for the relative risks and hence to compute an estimate qis = Pr(îs ¿c) where îs is an estimate of is from a specic model for ith region and sth simulated data and c is any threshold value, generally it is assumed equal to 1. The posterior exceedance probability, qis , gives a measure of the ‘signicance’ of excess risk. The average posterior exceedance probability (APEP) of ith region is dened as q i = (1=S) s qis . Another measure is proposed an extension of q i which incorporates the rst-order neighbourhood information as as i q ij =(ni + 1), where {q ij ; j = 0; 1; : : : ; ni } is the set of rst-order neighbour q values q i = nj=0 of the ith region and q i0 is the q value of ith region itself. There are a number of measures available in Bayesian inference to check the closeness of posterior estimate to true value. Among them, a simple measure of t is the posterior t ˆ absolute error, dened as dis = |is − i |. An average measure over the simulation can be dened as di = (1=S) s dis and called as average posterior error (APE). Another commonly used measure of t when the true values are known is the mean square error (MSE) and is dened as, MSEi = (1=S) s (îs − it )2 . Plotting these three global measures, q i , di and MSEi against it will depict the closeness of t of each model. Plotting APE and MSE will depict same patterns with dierent magnitude. Note that it is assumed that it is xed, or represents an average risk. A local measure to check possible clustering can be visualized by plotting {q ij ; j = 0; 1; : : : ; ni } against {it } for each region. In practice, if the number of regions (in data example, the landkriese) in a study area is large, it is possible to implement this measure for a number of regions where it is important to ensure that the models are correctly picking up the clustering information. A local measure similar to the measure of Li for real data, can also be developed for simulated data. We dene ni ij ¿0:95) j=0 I (q Ri = ni + 1 to calculate the proportion having average posterior exceedance probability greater than 0.95 based on the rst-order neighbours. The measure Ri shows the grouping of excess risk where Copyright ? 2006 John Wiley & Sons, Ltd.

Statist. Med. 2006; 25:771–786


777

the average posterior probability of excess risk, i.e. îs ¿c, is greater than 0.95. In this way, a surface of Ri can be derived which will give evidence of clusters of excess risk. ROC curve: Instead of using Ri , a similar measure can also be developed by using receiver operating characteristic (ROC) curve. For the moment we assume that the true status of a ‘cluster’ is dened when the average exceedance probability, q, is greater than 0.95. We consider the value 0.95 because at this value the evidence of excess relative risk is substantial. Two indices are used, sensitivity and specicity, to evaluate the accuracy of each model that discriminates between ‘clusters’ and ‘not clusters’. We can make a binomial diagnostic decision based on t if we select a cut-o value and call all regions above the cut-o cluster, and all those below not cluster. For each model, the results could be divided into true clusters (TC), false clusters (FC), false not clusters (FNC) and true not clusters (TNC) with the relative frequency of each depending on where the cuto is set. We can evaluate the reliability of each model relative to that cut-o by calculating the sensitivity and specicity. The sensitivity is equal to the true cluster rate, TC=(TC+FNC). The specicity is equal to the true not cluster rate, TNC=(TNC + FC). It is possible to nd the optimum cut-o for each model by plotting sensitivity versus specicity as a function of cut-o, but for our purpose we are interested in evaluating the reliability of each model that does not depend on the cut-o value. If we plot the TC rate as a function of the FC rate (1-specicity), we obtain an ROC curve. A ROC curve is a graphical representation of the trade-o between the TC and FC rates for every possible cut-o. By tradition, the plot shows the FC rate on the X -axis and the TC rate on the Y -axis. The accuracy of each model is measured by the area under the ROC curve (AUC). An area of 1.0 represents a perfect model, while an area of 0.5 represents a worthless model. The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the model: the TC rate is high and the FC rate is low. Statistically, more area under the curve means that the model is identifying more TC while minimizing the number (or per cent) of FC.

4. RESULTS We use freely available statistical software R (R Development Core Team [12]) for our analysis and for mapping results we use the commercial software MapInfo (MapInfo Professional Version 7.8 [13]). To check the relative performances of LL models in cluster recovery with other competitors, we apply almost all the proposed diagnosis criteria to these four models; LL models with CAR prior (LL-car), LL models with absolute dierence prior (LL-abs), random eect models (BYM) and mixture models (LC). 4.1. Real data The observed SMR distribution of eastern Germany lip cancer is given in Figure 1 and is dened as is the ratio of observed to expected mortality. Clearly, a high variability in SMRs is visible in the data ranging from 0 to 3.82. Clustering of high SMRs (2.0 or more) is visible in north and there are also few areas with low SMRs (0.5 or less) in south. The data residuals of each model are mapped in Figure 2 after removing the unobserved structured and unstructured heterogeneities. The maps indicate that there are some very high and low Copyright ? 2006 John Wiley & Sons, Ltd.

Statist. Med. 2006; 25:771–786

778


residual regions, not modelled explicitly. Comparing Figures 1 and 2, it is apparent that the areas with very low and very high SMRs are having high residuals, indicating that these areas are not modelled well by any of these four models. In calculating simulated residuals, we set m = 499 and generate yij∗ , j = 1; : : : ; 499, 499 counts for ith region by the parametric bootstrap method and use the posterior expected estimate ei î for observed counts. In order to ensure approximate independence [14] between the repeated samples, we consider the rst and every 100th replicates starting from the rst sample, from a total of 49 801 replicates for each region. The rank residual p-values surface of each model is given in Figure 3, which shows many regions with large values (greater than 0.60). However, there are few regions with extreme low values (smaller than 0.05). Interestingly, all the rank residual p-values for LL-abs model are greater than 0.05, indicating that at 5 per cent level of signicance the model is tting well. It is important to note that not only the high SMRs are producing less than 0.05 p-values for each model but also a low SMR is producing a p-value in southwest region for LL-car model. The Li surface is given in Figure 4. LL-car model is indicating that in southwest area a cluster of low SMR is identied whereas BYM and LC model are identifying cluster with high SMRs in northeast area. LL-abs model is not able to identify any regions with excess

2.0 or more 1 to 2 0.5 to 1 0.5 or less

Figure 1. Thematic map of observed SMR. Copyright ? 2006 John Wiley & Sons, Ltd.

Statist. Med. 2006; 25:771–786


3.0 or more

0.5 to 3

-0.5 to 0.5

-3 to -0.5

779

-3.0 or less

Figure 2. Thematic map of data residuals. Top-left: LL-car, top-right: LL-abs, bottom-left: BYM and bottom-right: LC.

risk at 5 per cent level of signicance. Probably, the reason is due to the better t of LL-abs model that is illustrated in the earlier paper of HL [1]. 4.2. Simulated data The true relative risk, t , used in the simulations is plotted as labels in Figure 5. Three main areas (northeast, southeast and southwest) of elevated risks are visible in the map. We set the threshold value c as equal to 1.0 in order to calculate the posterior exceedance probability. It involves the assumption that the regions are clustered if the relative risk is greater than 1. In Copyright ? 2006 John Wiley & Sons, Ltd.

Statist. Med. 2006; 25:771–786

780


0.6 to 1

0.1 to 0.6

0.05 to 0.1

0 to 0.05

Figure 3. Thematic map of p-values of residuals. Top-left: LL-car, top-right: LL-abs, bottom-left: BYM and bottom-right: LC.

Figure 6, APEP (left panel) and MSE (right panel) are plotted as smooth line for each model against t . We omit the plot of APE against t because it is almost identical to the plot of MSE against t . The left panel graph indicating that LL models and BYM produce almost similar APEP whereas LC models classify the regions of moderate risks with a larger APEP. In right panel graph, the low MSE of BYM and LC models for excess risk regions implies that these two models are modelling better the high-risk regions than LL models. Among LL models, LL-abs performs a little better than LL-car especially for the relative risk of 3.0 or higher. Copyright ? 2006 John Wiley & Sons, Ltd.

Statist. Med. 2006; 25:771–786


0.7 to 1

0.3 to 0.7

0.1 to 0.3

781

0 to 0.1

Figure 4. Thematic map of the Li surface. Top-left: LL-car, top-right: LL-abs, bottom-left: BYM and bottom-right: LC.

Figure 7 displays the Ri surface maps. All the models correctly recover the areas with high relative risk with varying Ri values. In accordance to APEP in Figure 6, LC model is picking up three clustered regions with maximum size. Interestingly, LL-abs and BYM models are producing almost similar clustered regions although the MSEs of LL-abs model for high-risk regions are slightly higher than BYM model. The cluster of high-risk regions in southwest is actually a cross-shaped with two isolated maxima: 3.5 and 3.2 (Figure 5). Only the LL-car model is correctly displaying this pattern. Copyright ? 2006 John Wiley & Sons, Ltd.

Statist. Med. 2006; 25:771–786

782


1 1 11

1 1

1

1 1

1

11

1

1

1

1

1

1

1 1

1

1

1

1

1

1 1 1 1

1 1

1 1

1

1

1 1

1

1

1

1

1 1

1 1

1 1

2.5 1.8 13 1 1 1

1

1

1.3

2

1

1

1

1 1 1.5

1 1

1 2

1

1

1

1

1

1

1

1 11

1 1 1 1 1 1 1 1 1 1 2.5 3.2 1 1 1 1.2 1 2.5 1 1 1 1 1 1 1.8 1 1.8 1 2.2 1.8 1 1 1 1 1 1 1.6 1.5 3 1 1 1 1 1 1 1 1 4 1.2 1 1 1 2.2 1 1 1 3.5 11 11 1 1 1 2.5 1 1 1 1 1 1 1 11 1 1 1 2 3 1 1 1 1 1 1 11 1 1 2 2.6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1

Figure 5. Map of it for the east Germany landkriese regions with superimposed three areas of excess risk.

1.0 5

0.9

4 MSE

APEP

0.8 0.7 LL-CAR LL-Abs BYM LC

0.6 0.5

3 2 1 0

0.4 1.0

1.5

2.0

2.5 t θ

3.0

3.5

4.0

1.0

1.5

2.0

2.5 t θ

3.0

3.5

4.0

Figure 6. Smooth lines of APEP (left) and MSE (right) against t . Copyright ? 2006 John Wiley & Sons, Ltd.

Statist. Med. 2006; 25:771–786


0.7 to 1

0.3 to 0.7

0.1 to 0.3

783

0 to 0.1

Figure 7. Thematic map of R. Top-left: LL-car, top-right: LL-abs, bottom-left: BYM and bottom-right: LC.

Table I displays the relative frequencies for three cut-os for each model. The cut-os are chosen arbitrarily. Clearly, as the cut-o increases, the sensitivity decreases but the specicity increases. Thus, there is a trade-o between sensitivity and specicity. Figure 8 gives the ROC curve for each model. AUC computes the area under the curve from 0 to 1 on the X -axis (i.e. the 1-specicity axis). The AUCs for each model are very close, and the largest AUC is obtained for the LL models with CAR prior and the smallest AUC is for the LL models with absolute dierence prior. Copyright ? 2006 John Wiley & Sons, Ltd.

Statist. Med. 2006; 25:771–786

784


Table I. Relative frequencies for three cut-os of each model. LL-car

LL-abs

Cut-o

q ¿ 0:95

q¡0:95

q ¿ 0:95

q¡0:95

t ¿1 t 6 1 t ¿ 2 t ¡2 t ¿ 3 t ¡3

10 0 10 0 7 3

19 190 8 201 0 209

12 0 11 1 7 5

17 190 7 200 0 207

BYM

t ¿1 t 6 1 t ¿ 2 t ¡2 t ¿ 3 t ¡3

LC

q ¿ 0:95

q¡0:95

q ¿ 0:95

q¡0:95

11 0 11 0 7 4

18 190 7 201 0 208

15 0 14 1 7 8

14 190 4 200 0 204

AUC = 0.999045

AUC = 0.994762

0.8 sens: LL - abs

sens: LL - car

0.8

0.4

0.0

0.4

0.0 0.0

0.2

0.4 0.6 0.8 1-spec: true.theta

0.0

1.0

0.2

0.8 sens: BYM

1.0

AUC = 0.996569 sens: Lawson & Clark

AUC = 0.998029


0.4

0.0

0.8

0.4

0.0 0.0

0.2


1.0

0.0

0.2


1.0

Figure 8. ROC curve and AUC for each model. In X -axis: 1-spec (specicity) and in Y -axis: sens (sensitivity). Copyright ? 2006 John Wiley & Sons, Ltd.

Statist. Med. 2006; 25:771–786


785

5. CONCLUSIONS We have outlined a number of cluster detection diagnostics in order to assess how well each method recovers true clustering behaviour of small area data. The methods considered in relation to local likelihood are BYM model and Lawson and Clark model. The performances are compared for eastern Germany lip cancer data as well as for simulated data with same landkriese. The simulation example contains 100 data sets generated from an assumed true risk set. The assumption of a single set of relative risk as true risk may be justied under Bayesian paradigm as an average risk. The measure based on p-values for real data, Li , is dened in a way to identify clusters of regions with excess risk, in terms of high or low SMRs. It is important to note that Li is related to model lack-of-t criteria. A perfectly tted model (which is impossible in real situation) will produce zero values for all the Li ’s at 5 per cent or 10 per cent level of signicance and hence the model will not be able to recover any clusters. Whereas the other measure developed for simulated data, Ri , which is based on average posterior exceedance probability, is designed to identify clusters of high-risk regions. It is possible to change the threshold value in the calculation of posterior exceedance probability according to the denition of cluster. In the simulated example, we have assumed this value as equal to 1. The graphical criterion, Ri , does not provide any clear-cut inference about relative performances of each model whereas the ROC curves and hence the AUC provide a real value which essentially led us to judge the best performing model in terms of cluster recovery. Applying these diagnostics to each competing models reveals more insights of their effectiveness in recovering clusters in various degrees. We have observed for this particular simulated data, LL-car performs better than LL-abs, BYM and LC models. For eastern Germany lip cancer data, LL-car model recovers clusters of low risk areas whereas BYM and LC models recovers clusters of high-risk areas.

ACKNOWLEDGEMENTS

The authors would like to gratefully acknowledge the support of NIH grant # 1 R03CA11314-01 without which this work would not have been possible. We thank two referees for many thoughtful comments that led to additional insights in present version.

REFERENCES 1. Hossain MM, Lawson AB. Local likelihood disease clustering: Development and evaluation. Environmental and Ecological Statistics 2005; 12:259– 273. 2. Lawson AB. Cluster modelling of disease incidence via RJMCMC methods: A comparative evaluation. Statistics in Medicine 2000; 19:2361– 2376. 3. Elliott P, Wakeeld J. Disease clusters: should they be investigated, and, if so, when and how? Journal of the Royal Statistical Society, Series A 2001; 164:3 –12. 4. Wartenberg D. Investigating disease clusters: why, when and how? Journal of the Royal Statistical Society, Series A 2001; 164:13 – 22. 5. Besag J, York J, Mollie A. Bayesian image restoration with two applications in spatial statistics (with discussion). Annals of the Institute of Statistical Mathematics 1991; 43:1– 59. 6. Lawson AB, Clark AB. Spatial mixture relative risk models applied to disease mapping. Statistics in Medicine 2002; 21:359–370. 7. Best N, Richardson S, Thomas A. A comparison of Bayesian spatial models for disease mapping. Statistical Methods in Medical Research 2005; 14:35 – 59. Copyright ? 2006 John Wiley & Sons, Ltd.

Statist. Med. 2006; 25:771–786

786


8. Green J, Richardson S. Hidden Markov models and disease mapping. Journal of the American Statistical Association 2002; 97:1055 –1070. 9. Knorr-Held L, Raer G. Bayesian detection of clusters and discontinuities in disease maps. Biometrics 2000; 56:13 – 21. 10. Best NG, Ickstadt K, Wolpert RL, Briggs DJ. Combining models of health and exposure data: the SAVIAH study. In Spatial Epidemiology: Methods and Applications, Elliott P, Wakeeld JC, Best NG, Briggs DJ (eds). Oxford University Press: Oxford, 2000. 11. Kelsall J, Diggle P. Non-parametric estimation of spatial variation in relative risk. Statistics in Medicine 1995; 14:2335 – 2342. 12. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria, 2004. 13. MapInfo Professional Version 7.8. MapInfo Corporation: Troy, New York, 2004. 14. Brooks SP. Markov chain Monte Carlo and its application. The Statistician 1998; 47:69–100.

Copyright ? 2006 John Wiley & Sons, Ltd.

Statist. Med. 2006; 25:771–786

Cluster detection diagnostics for small area health data: with reference ...

Cluster detection diagnostics for small area health data: with reference ...

Suggest Documents

Small Area Estimation with Skewed Data

Reference Cluster Normalization Improves Detection of ...

A Cluster Based Reusability Model with Reference

Fault Detection & Diagnostics - ashrae

detection and diagnostics

Cluster Analysis for Anomaly Detection in Accounting Data: An Audit ...

Cluster Based Outlier Detection Algorithm for Healthcare Data

Fault Detection Based on Hierarchical Cluster Analysis in Wide Area ...

Detection of small earthquakes with dense array data ... - Haoran Meng

Applying Census Data for Small Area Estimation in Community and ...

FOR BROKEN BAR DETECTION AND DIAGNOSTICS

Point-of Care Diagnostics for HIV and ... - Global Health Diagnostics

Coupled Cluster Methods for Multi-Reference Applications

Drawing planar bipartite graphs with small area

Data Science for Molecular Diagnostics Applications

Data modeling diagnostics for share price ...

Improving Detection in Managing Health and Medical Care with Data ...

Suzuki V35.00 Diagnostics List(Note:For reference only)

ROMEO V18.40 Diagnostics List(Note:For reference only)

Molecular diagnostics: harmonization through reference ... - NIST

Spatial Cluster Detection

Geographic information in Small Area Estimation. Small area models ...

SMALL-AREA GAMES

SMALL-AREA GAMES