Developing methodologies for evaluating community

1 downloads 0 Views 837KB Size Report
SUMMARY. There has been growing recognition that health promo- tion programs which target whole communities are more likely to be effective in changing ...
HEALTH PROMOTION INTERNATIONAL © Oxford University Press 1996

Vol. 11, No. 3 Printed in Great Britain

Developing methodologies for evaluating community-wide health promotion CART Project Team: ROBERT SANSON-FISHER, SALLY REDMAN, LYNNE HANCOCK, STEPHEN HALPIN, PHILIP CLARKE, MARGOT SCHOFTELD, ROBERT BURTON, MICHAEL HENSLEY, ROBERT GIBBERD, ALEXANDER REID and RAOUL WALSH University of Newcastle, Australia

AFAF GIRGIS, LOUISE BURTON, ANN McCLINTOCK ROBERT CARTER Australian Institute of Health

ALLAN DONNER University of Western Ontario, Canada

SYLVAN GREEN National Cancer Institute, United States

SUMMARY There has been growing recognition that health promotion programs which target whole communities are more likely to be effective in changing health behaviour. However, studies evaluating the impact of communitywide health promotion programs rarely use adequate methodology. Randomised control trials where multiple whole communities are randomly assigned to control and intervention groups are optimum if evaluators hope to validly attribute changes in health behaviour to the intervention. However, such trials present a number of difficulties including cost and feasibility limitations and the evolving nature of statistical techniques. This paper proposes applying a fairly well-accepted phased evaluation approach to the evaluation of community participa-

tion programs, using three defined phases. Phase 1 consists of small-scale studies to develop the measures and assess acceptability and feasibility of the intervention; Phase 2 consists of studies in a small number of communities designed to trial the intervention in the real world; Phase 3 studies use an appropriate number of entire communities to provide valid evidence of efficacy of the intervention. It is suggested that criteria be resolved to identify adequate studies at each stage and that advantages and limitations of Phase 1 and 2 studies be clearly identified. The paper describes the major design, sampling and analysis considerations for a Phase 3 study.

Key words: community; evaluation; health promotion

INTRODUCTION Community-wide interventions attempt to implement changes which will simultaneously affect

many individuals (Dixon, Research Unit in Health

1989; Edinburgh and Behavioural 227

Downloaded from heapro.oxfordjournals.org by guest on July 6, 2011

NSW Cancer Council, Australia

228

R. Sanson-Fisher et al.

WHY ARE COMMUNITY-WIDE PROGRAMS DIFFICULT TO EVALUATE? Any evaluation study should be designed so that the results will be believed by the scientific community. Studies should meet minimal methodological criteria, such as use of valid and reliable outcome measures and adequate response rates. However, the most difficult methodological issue for evaluations of community-wide interventions is the establishment of adequate control groups. For all health interventions, a randomised controlled trial is accepted as the optimum scientific design. In such trials, individuals are randomly allocated to control or treatment groups and impact of treatment is assessed by comparing outcomes between groups. The strength of random allocation is the reduction in potential confounding due to the equal distribution of known and unknown confounders between the intervention and control groups. Any differences between the two groups at follow-up can be attributed to the intervention. When community-wide treatments are evaluated, communities rather than individuals must

be assigned to control and treatment groups. It has long been recognised that such evaluations must assign communities to each group. Study designs which involve only one or two treatment communities cannot exclude the possibility that changes in behaviour are due to factors other than the intervention (Koepsell et al., 1991). While studies like North Karelia and Stanford were pioneering and state-of-the-art at the time of their conduct, these landmark studies nevertheless failed to allow definitive conclusions about the effectiveness of community-wide strategies. This lack of conclusive evidence has been acknowledged by the designers of the studies (Farquhur et al., 1985; Puska et al., 1985). There have recently been considerable advances in our understanding of sampling and analysis issues in studies where communities (or clusters) rather than individuals are randomly assigned. In cluster designs, two sets of sample sizes must be considered—the number of individuals in a community to be sampled and the number of communities to be included—as variation can occur at both these levels (Donner et al., 1981). Additionally, the power to detect an intervention effect will depend more upon the number of communities studied than upon the number of individuals observed within each community (Hsieh, 1988). In practical terms, this means that, for acceptable statistical power, evaluations of community-wide programs will usually require that many communities be assigned to control and intervention groups. There can be difficulties in analysing data from multi-community trials if community sampling is not accounted for within the analysis. Community allocation health evaluations should not be analysed as if individuals have been randomly assigned. If standard statistical methods are used with cluster samples, the degree of statistical significance will tend to be overestimated (Donner, 1982). Therefore, such studies may report statistically significant differences in the absence of a true effect (Type 1 error). The practical consequences for evaluators of community-wide health promotion programs are that they will need to include more communities and more people. Have evaluations of community-wide health promotion accounted for these design issues? While the methodological difficulties associated with cluster designs were recognised by the 1950s, greater awareness of these issues was stimulated

Downloaded from heapro.oxfordjournals.org by guest on July 6, 2011

Change, 1989; Tones et al., 1990; Green and Kreuter, 1991). There are several reasons why such programs have potential to change health behaviour: individuals are more likely to adopt a health behaviour if there is social support for change (Pomerleau et al., 1978); action at the community level allows structural changes, such as modifying cost or availability of resources, services or products (Rothman, 1968); concurrent delivery of the same message from several sources can increase the likelihood of consumer acceptance (Rogers, 1971; Kottke et al., 1988); and empowerment gained through the opportunity for active community engagement in the intervention process can increase the probability that change will occur (Pomerleau et al., 1978; WHO, 1986). Despite the potential advantages of community-wide programs, debate continues about their effectiveness in changing health behaviour (Farquhar, 1978; Dixon, 1989). In part, this is attributable to lack of consensus about the role of, and optimal strategies for, the evaluation of such programs. The aims of this paper are to describe some of the methodological difficulties in evaluating community-wide programs and to propose some strategies to improve future evaluations.

Evaluating community-wide health promotion

Why haven't optimum designs been used for evaluating community-wide programs?

Several factors have contributed to non-scientific evaluations of community-wide health promotion programs. First, biostatistics is a rapidly evolving field. Many of the epidemiological and biostatistical techniques for evaluating such programs have not been readily accessible to researchers and practitioners, due to their recency or continuing development (Zeger, 1988; Donner

and Klar, 1994). Few practising researchers will have had the opportunity to absorb information about cluster randomisation trials in the same way that they have about individual randomisation. There is therefore a need to provide researchers and practitioners with training programs and tools to enable them to incorporate new statistical technology, and to stimulate debate about analysis and design issues within health promotion forums. Second, the assignment of multiple communities is expensive. For example, the cost of the evaluation of COMMIT was estimated to be ~US$ 41 million. This level offinancialcommitment means that very few evaluation studies using designs like COMMIT will be undertaken. However, lack of information about the efficacy of community-wide interventions, their growing use in health promotion and the range of potential confounders and predictors of change suggest that many evaluation studies will be needed. There is, therefore, a need for solutions to this funding conundrum; either the development of more cost-effective but still scientific evaluation strategies, changes to current funding strategies, or the development of networked research centres where the cost can be shared across sites. A third barrier is lack of feasibility. It may not be possible to identify enough clusters or appropriate communities to take part in the program. For example, within Australia, there are only four or five cities of 1 million or more people. Evaluating a program designed for communities of this size will exclude the possibility of a randomised control trial. Likewise, if the program requires considerable input from the research team, it may not be feasible for it to be implemented in a short time period in many communities. Fourth, within health promotion, there is little consensus about the relative importance of finding out more about how the intervention works by collecting detailed process measures compared with establishing whether the program is changing health behaviour. There is a perception that pre-post designs cannot adequately provide information for developing and modifying interventions. From this perspective, scientific design issues may be of less importance than the collection of sophisticated process measures which can demonstrate a link between intervention components and changes in health behaviour. Finally, despite political pressure on practitioners to implement community-wide programs,

Downloaded from heapro.oxfordjournals.org by guest on July 6, 2011

by a seminal paper by Cornfield (1978). Within the health promotion field, the issues of appropriate controls for community evaluations were first debated in relation to the Stanford study in the late 1970s (Farquhar et al., 1990). Despite recognition of the threats to scientific validity, most subsequent evaluations of community-wide interventions have not met the requirements for adequate study design. A review by Donner et al. (1990) found that of 16 community intervention studies published from 1978 to 1989, 11 failed to justify the need for cluster randomisation, only three correctly accounted for between-cluster variation in discussing sample size and power, and only eight accounted for between-cluster variation in the analysis. One study failed to address any of the requirements (Black et al., 1981). Similarly, a recent review of 11 studies which evaluated community-wide cancer risk reduction programs (published between 1985 and 1991) found that only one study had used more than one pair of control and intervention communities (Clover and Redman, 1995). None of the studies had controlled for cluster randomisation in selecting their sample or in their analysis. A rare example of an optimally designed evaluation is the COMMIT project (COMMIT Research Group, 1991). The aim of this study was to measure the impact of a community-wide smoking cessation and prevention program on smoking rates. COMMIT uses 11 matched pairs of communities with one of each pair randomly allocated to intervention condition. Each matched pair provides one data point for assessing smoking reduction in intervention compared to control communities. The impact of the program has been assessed by establishing whether change across communities was greater than expected by chance alone. Studies such as COMMIT are designed to produce a valid estimate of effectiveness and hence draw convincing conclusions about the impact of the intervention.

229

230 R. Sanson-Fisher et al.

HOW SHOULD COMMUNITY-WIDE HEALTH PROMOTION PROGRAMS BE EVALUATED? Several principles for evaluating community-wide health promotion programs can be established, which could resolve some of the competing demands facing evaluators and form the basis for guidelines for funding bodies, journal publication policies and researchers and practitioners who seek to develop their field. Using community as the unit of analysis should be strongly justified

Using the community rather than the individual as the unit of analysis increases the difficulty of sampling, design and analysis, increases study costs and can compromise feasibility of an evaluation. Therefore, the value of randomising communities rather than individuals within a community to treatment groups must be justified. Donner et al. (1990) note that of 16 studies reviewed, only four justified the need for assignment of communities rather than individuals. There are at least three situations in which community assignments will be necessary. First, when it is perceived that community-wide interventions will produce a greater level of behaviour change than will individual strategies; for exam-

ple, community action programs. Second, when studies need to use community-wide interventions to avoid treatment group contamination; for example, health information mail-outs. Finally, when it is not possible to deliver the intervention to individuals, such as programs involving policy change, economics or massmedia approaches. However, community-wide trials are sometimes used when individual allocation may have been possible. For example, it is frequently argued that mass-media programs require community-wide evaluations. However, Robertson et al., 1974) demonstrated that it is possible to evaluate the impact of mass-media programs with individuals as the unit of analysis; in this study, two separate cable TV networks were used to provide mass-media messages to the treatment group. The outcome of interest was seatbelt use, with car license plates used to identify treatment or control households. It may be, therefore, that increased resources should be directed at developing more sophisticated designs which allocate individuals rather than necessarily undertaking large scale community allocation studies. The intervention must be vigorously pre-tested

It is important that expensive multi-community evaluations not be undertaken until the proposed intervention has been rigorously pre-tested. Pretesting studies should occur in a series of predefined stages, each building upon the last. Only when the intervention meets specific preconditions should an evaluation involving community assignment be considered. It is possible to draw an analogy with drug trials, which follow a defined sequence. First is a pre-test stage in which the drug is trialled with animals. Then, Phase 1 involves the first measurements with humans, to explore the acceptability of drug doses, using small numbers of volunteer subjects. Phase 2 assesses whether there is any effect of the treatment by measuring if anything good happens, with close monitoring for adverse effects. Typically, 20-150 subjects participate and there may not be a control group. Finally, Phase 3 involves full-scale evaluation of efficacy with random allocation to control and treatment and appropriate outcome measures. Less common adverse effects are assessed. Importantly, each phase of this testing builds on the preceding stage; each phase is reasonable and important in its own right. The fundamental reservation is that Phase 1 and 2 studies might

Downloaded from heapro.oxfordjournals.org by guest on July 6, 2011

there is insufficient recognition of the time and money required to develop and evaluate efficacious programs (National Centre for Epidemiological and Population Health, 1991). While there is community tolerance for the time delay in introducing a 'promising' new drug, this acceptance has been assisted by the disastrous early introduction of drugs, such as thalidomide, with serious but undiscovered side effects (Johnson et al., 1994). However, in health promotion there are frequently demands to implement programs without such testing and therefore pressure on evaluators to use quick and simple evaluation designs even if they are not scientifically rigorous. It is evident that health promotion needs to find strategies for reconciling the competing demands on evaluators. Unless solutions can be found, it seems likely that inadequate methodologies will continue to be used, and the continuing risk that unproven programs may be implemented without evaluation or with process evaluation only.

Evaluating community-wide health promotion

indicate that the drug has promise, but cannot indicate that it is efficacious. It would seem useful to adopt a similar staged approach to evaluating community-wide interventions. While this is not a new idea, and has in fact been formalised in initiatives such as the United States National Cancer Institute's Smoking and Tobacco Control Program (STCP) (Glynn et al., 1993), it is evident that some health promotion practitioners and researchers are not yet applying these staging principles to community-wide programs (Donner et al., 1990). One reason may be that funding agency support is much more likely to be gained for Phase 3 type projects rather than for Phase 1 and Phase 2 studies.

231

analogy with drug trials would suggest that a Phase 2 trial would permit researchers to have a first look at changes in the outcome measure but not to use the data to decide on the generalisability of the results. Phase 2 evaluations should continue to collect a wide variety of process

Stepped Wedge

i

Phase 1

Single community receives the intervention with varying times of onset

Factorial A(intervention) - B(control) B(control) - A(intervention) Randomization A(intervention) - B(intervention)

Phase 2

In this phase, studies would address the feasibility and acceptability of the proposed intervention in real life. Research during this phase would seek to refine the intervention and establish whether any change occurs in early outcome or process measures. The issue of outcome assessment is potentially contentious. If there appears to be no change in outcome, it may be because the power of the evaluation is limited. Alternatively, if outcomes measures do change, there is the danger that the intervention will be argued to be effective without adequate data to support this conclusion. Despite these reservations, most practitioners and researchers would want some evidence of potential efficacy before embarking on a large-scale study. A solution might be to establish a decision rule ahead of time with a clear understanding that evidence of change in outcome measures will imply that the program is worth testing but not that it is efficacious. Again, a 'stop or proceed' rule would be useful. The

B(control) - A(control) Need to evaluate two interventions and two conditions (intervention and control), randomised to communities

Crossover rTIME-

- control

intervention

Randomization

^

control

'TIME

* intervention

One community is randomised to treatment and the other acts as control. With time, the intervention becomes a control and control the intervention Fig. 1: Alternative approaches to random allocation of communities.

Downloaded from heapro.oxfordjournals.org by guest on July 6, 2011

During Phase 1, a series of small-scale studies would be implemented. The studies would usually not be community-wide and their purpose would be to develop intervention components, to test feasibility and acceptability of proposed programs, to validate measures and to establish the parameters of intervention intensity. At this stage, the research should be planned as an iterative series where data are gathered, the program amended and then re-trialled perhaps over many modifications. Before embarking, there should be a decision rule about when to stop and when to proceed to the next phase.

232 R. Sanson-Fisher et al.

technologies, so that advancement to Phase 3 could be delayed until proposed interventions had conclusively demonstrated effectiveness in the real world. Phase 3

Phase 3 studies are designed to provide definitive evidence about the efficacy of the program. Currently, the most accepted technique for Phase 3 trials involves random allocation of multiple communities to treatment condition. The methodological criteria that such evaluations should meet are described below. Randomised community trials should employ adequate design

There are several considerations that should guide the development of an adequate multicommunity randomised control trial (RCT), as follows. Design

Downloaded from heapro.oxfordjournals.org by guest on July 6, 2011

measures, to refine the intervention for use in the real world. It is worthwhile to explore Phase 2 designs other than pre-post (see Figure 1). For example, a multiple baseline approach, which uses each subject (or community) as their own control, involves the collection of repeated measures of the outcome of interest over time, possibly in a number of communities (Guyatt et al., 1986; Marascuilo and Busk, 1988). One problem with multiple baseline studies can be the limited capacity for attribution of intervention effect if secular trends cause confounding. In this case, the multiple baseline approach could be combined with the stepped wedge design, where outcome monitoring starts in each community at the same time, but onset of intervention is staggered. Thus, some communities are in the intervention phase while others are in the baseline period, the staggered onset of intervention ensures that any extraneous events such as, say, an external media smoking cessation program, are controlled for. Over time, all communities receive the intervention. The impact of the intervention on the outcome of interest can be assessed using time-series analysis or analysis of variance techniques. One of the major advantages of the multiple baseline technique in Phase 2 is that there is the potential to map the impact of successive interventions on the behaviour of interest over time, allowing the addition of new components and rapid monitoring of their impact on the outcome of interest. This type of design is potentially very useful for developing and refining the intervention as well as assessing behaviour change in response to the program. However, multiple baseline designs have received relatively little interest from epidemiologists and biostatisticians when compared with the randomised control trial. There are few data about trends, between-cluster variation and size of the intervention effect for this type of design, which would allow researchers to estimate the number of communities needed and optimum frequency of measures. Additionally, behavioural scientists have given little attention to developing or validating continuous measures such as tobacco sales compared with that given to point measures such as self-report of smoking with biochemical validation. There is also little information about how best to stagger intervention implementation in different communities, although these parameters are crucial to the validity of the findings. So it appears that much more research effort could be invested in Phase 2

The first principle is that, given afixednumber of subjects, the design will be more efficient if there are many communities with a smaller number of individuals rather than many individuals sampled within a smaller number of communities. The number of clusters (communities) which are necessary will depend upon the anticipated efficacy of the intervention. For a binary outcome, there will be a statistically significant result if in five towns out offivethe intervention is favoured (p = 0.063), in nine out often (p = 0.021) or 12 out of 15 (p = 0.035). Since communities, not individuals, are being allocated, the most important sample size consideration is number of communities. Second, the principle of random allocation to control or intervention group is of vital importance. Many previous evaluations of communitywide interventions have not been able to use random allocation because of political considerations (Farquhar et al., 1985; Puska et al., 1985). However, if it is accepted that an adequate Phase 3 trial will involve a large-scale project with random allocation of many communities, political pressure will need to be resisted. Thus, the same rule would need to be applied as is the case with individual patient randomised trials: if patients wish to be considered only for the treatment arm they are excluded from the trial. However, when political or public health ethical considerations are paramount, alternative approaches can include factorial and crossover

Evaluating community-wide health promotion

leaders, or a well-developed infrastructure, might be important. However, because of the lack of knowledge about the likely role of these factors, they could not currently be included in a matching design. An additional concern with matched designs is that multivariate methods to model treatment effects which adjust for imbalance on individual level covariates used in the matching cannot then be used. For instance, if communities are matched on socioeconomic status (SES), then SES cannot be included as a possible predictor of behaviour change. Predictive analyses are likely to be important in community-wide health behaviour programs, where they can provide useful information about the context in which the programs will work or not, which is needed for improvements to the intervention. Finally, consumers of research may have opinions about the factors which are likely to make communities similar or dissimilar. Unless a 'matched pair' of communities somehow appear to be similar to those communities and to politicians, it is unlikely that the results of the research will be accepted as valid. Thus, there is a need to develop strategies for selecting appropriate matching variables or for combining them in a systematic manner, based upon better models of community behaviour and greater understanding of potential confounders. Sample size Assessing sample size is more complex for trials which randomise communities rather than individuals, and both number of communities and number of individuals in each should be calculated in advance (Donner, 1982; Hsieh, 1988). As Table 1: Sample size required to detect a reduction in smoking from 25% to 20% with 90% power and alpha = 0.05 Number of towns per group

Effective sample size in each town

5

10

20

30

300

150

75

50

Number of people to be surveyed per group

10 500

6000

3750

3000

Number of people to be surveyed per town

2100

600

188

100

4

2.5

2

IF

7

Downloaded from heapro.oxfordjournals.org by guest on July 6, 2011

designs (see Figure 1). In a factorial design, two interventions are compared to a control separately and combined. A limited form of the factorial design, the reciprocal control design, does not have control or combined arms and uses intervention A as a control for intervention B and vice versa. In a community trial of an intervention designed to increase rates of screening for cervical and breast cancer using the reciprocal control design, half of the communities might be assigned to receive the program for cervical cancer and half to receive the breast cancer program. Those communities receiving the cervical cancer program would function as the controls for the breast cancer outcome and vice versa. This approach may help to overcome the political and ethical obstacles associated with randomisation since all communities participating receive some intervention. It is essential that there is no contamination by one of the interventions since this would reduce the apparent impact. Third, there remains considerable debate about whether matched pairs, stratification or total randomisation is the best strategy. Matching can provide a design advantage by increasing power, but only if the matching factor is one that is highly correlated with the outcome variable. If the matching factor is not highly related to the outcome variable, matching will reduce the power because of the loss of degrees of freedom that results from using the community pair rather than the individual communities as the unit of analysis. In stratified randomisation, several factors such as size or geographic area are selected for stratification, and communities within each stratum are randomly allocated. If the number of communities is small, then issues of matching and stratification become more important. The issue of full randomisation versus matched or stratified designs is particularly difficult for community-wide studies seeking to alter health behaviour. The variables which are likely to affect outcome at the community level are frequently not known. For example, while smoking rates and smoking cessation rates are known to be associated with socioeconomic status (Hill and Gray, 1984), there is little information on the relationship of factors such as community size, urban versus rural location or number of health care providers to cessation rates. If the program is attempting to mobilise community action, other factors, such as community cohesiveness, the presence of effective community

233

234

R. Sanson-Fisher et al.

where m = average cluster size and p = prior estimate of intra-class correlation (Donner et al., 1981). The intra-class, or intra-cluster, correlation is the community-level variance; that is, the level of statistical dependence of individuals within a cluster. Reliable estimates of the intraclass correlation will rarely be available, as it is difficult to estimate the variability of health behaviours within and between communities (Donner, 1982). Thus, individual sample size requirements will often not be accurate, particularly where only a small number of communities are being used in the trial. Koepsell et al. (1991) demonstrated that estimates of variance can depend upon the method used to estimate community-level variance, and also upon the communities used to provide estimates. They recommend that optimistic and pessimistic estimates of sample size be made. Analysis There is still considerable developmental work to be done on the analysis methods for trials where the community is the unit of allocation. There are several types of analysis to be considered, as follows. (i) Using cluster as the unit of analysis: each community provides one data point. For example, each community might generate the percentage change in smoking from pre- to post-test. These data are submitted

to a Mest or paired Mest depending on whether the trial used random allocation or was a matched pair design. This approach is simple but it may lack power unless a large number of communities is included in the study. Further, it cannot be used to adjust for covariates measured at the individual level such as differences between the communities in terms of age or sex. (ii) Continuous outcome measures: analyses to be used on continuous outcome measures are well developed, and include mixed model analysis of variance and generalised least squares. These are described in detail in Koepsell et al. (1991) and Donner (1985). Standard statistical methods such as /-tests can also be adapted. For example, the test statistic can be divided by a correction factor (Donner, 1982). (iii) Dichotomous data: methods of analysing dichotomous data are much less well developed. Available methods are described in Donner and Klar (1994). Randomised designs can be analysed using standard test statistics such as Pearsons chi-square by a correction factor (Donald and Donner, 1987). There are some techniques available for conducting multivariate analysis (Liang et al., 1986; Donner, 1987). However, they all require a large sample size. Stratified and pair-matched designs can use an extension of the Mantel-Haenszel test to incorporate clustering (Donner, 1987, 1992). There are some methods available for multivariate analysis although these techniques are still in the developmental phase. Downloaded from heapro.oxfordjournals.org by guest on July 6, 2011

mentioned before, for the same total number of individuals, in general, the study will be most efficient if a large number of communities are included rather than a large number of individuals within each community. An example of this effect is shown in Table 1. The example assumes that the study is attempting to evaluate a smoking cessation program where whole communities will be randomised to intervention or control. The table illustrates an approach to the problem of how to estimate the number of communities to use and the number of people to sample within each community. Sample size requirements for studies which allocate communities can be calculated by adapting standard formulae. The number of subjects required per treatment group should first be calculated using standard sample size formula. The result is then multiplied by an inflation factor (IF): IF = 1 + {m-\)p

It is evident that more work is required to develop, refine and disseminate appropriate analysis for trials where community is the unit of analysis, which can be adopted by non-statisticians such as practitioners of health promotion. For example, user-friendly computer packages which could be used by practitioners would increase the sophistication of analysis of such trials.

CONCLUSIONS There continues to be a lack of consensus about the most effective strategies for evaluating community-wide health promotion programs. While the randomised control trial is the optimum

Evaluating community-wide health promotion

ACKNOWLEDGEMENTS The CART project is a collaborative project jointly funded by the National Health and Medical Research Council (Australia) and the NSW Cancer Council (Australia). This paper arose from a workshop held in Newcastle, NSW in September 1992, part-funded by the Australian Cancer Society. The contributions of the workshop participants are gratefully acknowledged. Address for correspondence: Dr Lynne Hancock Faculty of Medicine and Health Sciences University of Newcastle Locked Bag 10 Wallsend 2287 NSW Australia

REFERENCES Black, R. E., Dykes, A. C , Anderson, K. E., Wells, J. G., Sinclair, S. P., Gary, G. W., Hatch, M. H. and Gangarosa, E. J. (1981) Handwashing to prevent diarrhoea in day-care centres. American Journal of Epidemiology, 113, 445-451. Clover, K. A. and Redman, S. (1995) Is community participation effective in encouraging change of behaviour? Unpublished report, University of Newcastle. COMMIT Research Group (1991) Community intervention Trial for Smoking Cessation (COMMIT): summary of design and intervention. Journal of the National Cancer Institute, 83, 1620-1628. Cornfield, J. (1978) Randomization by group: a formal analysis. American Journal of Epidemiology, 108, 100-102. Dixon, J. (1989) The limits and potential of community development for personal and social change. Community Health Studies, 13, 82-92. Donner, A. (1982) An empirical study of cluster randomization. International Journal of Epidemiology, 11, 537-543. Donner, A. (1985) A regression approach to the analysis of data arising from cluster randomization. International Journal of Epidemiology, 14, 322-326. Donner, A. (1987) Statistical methodology for paired cluster designs. American Journal of Epidemiology, 126, 972-979. Donner, A. (1992) Sample size requirements for stratified cluster randomization designs. Statistics in Medicine, 11, 743-750. Donald, A. and Donner, A. (1987) Adjustments to the Mantel-Haenszel chi-square statistic and odds ratio variance estimator when the data are clustered. Statistics in Medicine, 6, 491-499. Donner, A. and Klar, N. (1994) Methods for comparing event rates in intervention studies when the unit of allocation is a cluster. American Journal of Epidemiology, 140, 279-289. Donner, A., Birkett, N. and Buck, C. (1981) Randomization of cluster: sample size requirements and analysis. American Journal of Epidemiology, 114, 906-914. Donner, A., Brown, K. S. and Brasher, P. (1990) A methodological review of non-therapeutic intervention trials employing cluster randomization, 1979-1989. International Journal of Epidemiology, 19, 795-800.

Downloaded from heapro.oxfordjournals.org by guest on July 6, 2011

design, it is costly and requires difficult epidemiological and biostatistical technology. In addition, it may be politically unacceptable because of randomisation of communities to control conditions and, unless it includes sophisticated process measures, may not yield much information about the condition under which the intervention program is effective. It is likely that the resolution of these difficulties will require input from epidemiologists, biostatisticians and behavioural scientists, if research designs are to be generated which are both scientifically valid and feasible in the real world. The views of politicians and communities themselves will also need to be sought, if the results of the research are to be accepted outside of the scientific community and if the need to fund largescale, expensive studies is to be recognised. Research funding agencies have a critical role to play in improving methodologies in this area by reinforcing minimum criteria for different types of evaluation and indicating to researchers that designs other than randomised trials may be acceptable provided that the conclusions drawn are appropriate. It is also evident that, as more researchers understand the need to include many communities in evaluation studies, there will need to be changes to our current approach to evaluation. Four issues in particular seem in need of attention from scientists and funding agencies. First, there is a need to develop methodological criteria for projects which are not yet ready for a largescale randomised trial; these projects provide essential developmental data but will use methods other than randomised control trials. There needs to be critical debate within the scientific community about the standards that such studies need to meet to be considered suitable for funding and publication. Second, there needs to be an increased understanding of the design and analysis issues that are raised by cluster randomisation within the health promotion field. Third, there needs to be consideration of whether it is possible to develop alternate methodologies to randomised control trials for rigorously evaluating community-wide programs. Fourth, when multicommunity studies are proposed, the funding commitment may be less burdensome if co-operation between research centres in developing, resourcing and conducting these trials can be achieved.

235

236

R. Sanson-Fisher et al. M. L. (1988) Attitudes of successful smoking cessation interventions in medical practice. A meta-analysis of 39 controlled trials. Journal of the American Medical Association, 259, 51-61. Liang, K. Y., Beaty, T. H. and Cohen, B. H. (1986) Applications of odds ratio regression models for assessing familial aggregation from case-control studies. American Journal of Epidemiology, 124, 678-683. Marascuilo, L. A. and Busk, P. L. (1988) Combining statistics for multiple baseline AB and replicated ABAB designs across subjects. Behaviour Assessment, 10, 1—28. National Centre for Epidemiological and Population Health (1991) The Role of Primary Health Care in Health Promotion in Australia: Interim Report to the National Better Health Program. Commonwealth Department of Health, Housing and Community Services, Canberra, Australia. Pomerleau, O., Adkins, D. M. and Pertschuk, M. (1978) Predictors of outcome and recidivism in smoking cessation treatment. Addictive Behaviours, 3, 65-70. Puska, P., Nissinen, A., Tuomilehto, J., Salonen, J. T., Koskela, K., McAlister, A., Kottke, T. E., Maccoby, N. and Farquhar, J. W. (1985) The community-based strategy to prevent coronary heart disease: conclusions from the ten years of the North Karelia Project. Annual Review of Public Health, 6, 147-193. Robertson, L. S., Kelley, A. B., O'Neill, B., Wixon, C. W., Eiswirth, R. S. and Haddon, W. (1974) Controlled study on the effect of television messages on safety belt use. American Journal of Public Health, 64, 1071-1080. Rogers, E. M. (1971) Diffusion of Innovations. Free Press, New York. Rothman, J. (1968) Three models of community organisation practice. In Social Work Practice, National Conference on Social Welfare. Columbia University Press, New York. Tones, K., Tilford, S. and Robinson, Y. K. (1990) Health Education: Effectiveness and Efficiency. Chapman & Hall, London. World Health Organization (1986) Ottawa Charter for Health Promotion. International Conference on Health Promotion, 17-21 November, Ottawa, Ontario, Canada. Zeger, S. L. (1988) Discussion of papers on the analysis of repeated categorical response. Statistics in Medicine, 7, 161168.

Downloaded from heapro.oxfordjournals.org by guest on July 6, 2011

Edinburgh Research Unit in Health and Behavioural Change, University of Edinburgh (1989) Changing the Public Health, Chapters 5 and 8. John Wiley, New York. Farquhar, J. W. (1978) The community-based model of life style intervention trials. American Journal of Epidemiology, 108, 103-111. Farquhar, J. W., Fortmann, S. P., Maccoby, N., Haskell, W. L., Williams, P. T., Flora, J. A., Barr, T. C , Brown, B. W., Jr, Solomon, D. S. and Hulley, S. B. (1985) The Stanford five-city project: design and methods. American Journal of Epidemiology, 122, 323-334. Farquhar, J. W.. Fortmann, S., Flora, A. J., Taylor, B., Haskell, W. L., Williams, P. T., Maccoby, N. and Wood, P. (1990) Effects of community-wide education on cardiovascular disease risk factors: The Stanford Five City Project. Journal of the American Medical Association, 264, 359-365. Glynn, T. J., Manley, M. W., Mills, S. L. and Shopland, D. R. (1993) The United States National Cancer Institute and the science of tobacco control research. Cancer Detection and Prevention, 17, 507-512. Green, L. W. and Kreuter, M. W. (1991) Health Promotion Planning: An Educational and Environmental Approach. Mayfield, Mountain View. Guyatt, G., Sackett, D., Taylor, W., Chong, J., Roberts, R. and Pugsley, S. (1986) Determining optimal therapy—randomized trials in individual patients. New England Journal of Medicine, 314, 889-892. Hill, D. and Gray, N. (1984) Australian patterns of tobacco smoking and related health beliefs in 1983. Community Health Studies, 8, 307-314. Hsieh, F. Y. (1988) Sample size formulas for intervention studies with the cluster as unit of randomization. Statistics in Medicine,!, 1195-1202. Johnson, K. A., Ford, L. G., Kramer, B. and Greenwald, P. (1994) Overview of the National Cancer Institute (USNCI) chemoprevention research. Ada Oncologica, 33, 5-11. Koepsell, T. D., Martin, D. C , Diehr, P. H., Psaty, D. M., Wagner, E. W., Perrin, E. B. and Cheadle, A. (1991) Data analysis and sample size issues in evaluation of communitybased health promotion and disease prevention programs: a mixed-model analysis of variance approach. Journal of Clinical Epidemiology, 44, 701-713. Kottke, T. E., Battista, R. N., DeFreise, G. H. and Brekke,