Understanding research methods and findings. This Briefing aims to explain for
research customers and users: • the different research techniques used, ...
National Offender Management Service ‘What Works’ Briefing 3/05: Understanding research methods and findings This Briefing aims to explain for research customers and users: • the different research techniques used, especially for assessing impact on reoffending; and • the different levels of robustness of findings.
Research purpose Different types of research are used to answer different types of questions. Research can be designed to: • explore the existence of a phenomenon to answer questions about occurrence, prevalence or incidence; • describe or define the features and attributes of a phenomenon and how it differs from others; • identify relationships between variables so that one can be used to predict the other; • explain and investigate cause-effect links between variables or test between different explanations; • assess impact; and • present solutions to a particular problem or inform improvements to an intervention or service as the evaluation proceeds. From the outset, it is important to identify what the research questions are in order to determine what design is best suited to produce a reliable answer. Equally, at the end, it is important to determine if the research questions have been answered and how robust the findings are.
Improving our knowledge: choosing the right design to answer the question Different designs, methods, sample sizes and types of analysis are suited to different types of research questions. Table 1 provides some examples but is not exhaustive.
1
Table 1: Examples of design and methods used to answer research questions Design and methods suited to answering the question Analysis of management information, longitudinal or cross-sectional design, cohort studies, survey or census methods, small- or large-scale sample size, descriptive quantitative data or thematic analysis of qualitative data. Longitudinal cohort design, correlational studies, regression analysis, thematic analysis of qualitative data, moderate to large representative sample size.
Questions to be answered
What’s happening?
What matters? / What makes a difference and to whom?/ What is the relationship between different factors and re-offending? / How strongly are different factors related to re-offending? / Which factors best predict re-offending and desistance? What works? / Will it work for us? / What effect does it have? / Does it work? / By how much does it reduce re-offending?/ With whom is it working? /What impact does it have on the outcome?
Meta-analysis, systematic reviews. Experimental methods, i.e. random assignment, systematic variation, control over variables, quantitative data of independent and dependent variables, t-tests, analysis of covariance, multiple regression, large enough sample size to achieve sufficient power to detect small effects. Questionnaire and interview methods, focus groups, small to medium sample sizes, qualitative and quantitative data, descriptive and inferential data analysis. Analysis of management information and audit information. Econometric analysis (needs good data).
Why does it work?
How well are we doing? Is it cost-effective? / Is it worth it?
Questions of effectiveness and ‘what works?’ Randomised control trials (RCTs) are the best way of answering the question, "Does the intervention work?" An RCT compares outcomes when the intervention is used to outcomes without the intervention or with an alternative (seen in the control group). In an RCT, participants are randomly allocated to an intervention or a control group which differ systematically in the type or amount of the intervention received. The groups are otherwise equivalent because all other differences are randomly distributed. With any other approach, a comparison group similar to the intervention group has to be generated and it is almost impossible to control for the bias that this introduces. As a result, other approaches suffer from the problem that observed outcomes may be explained by factors other than the treatment. These include differences in motivation, level of criminogenic need, previous experience and other factors (measured or not measured) that are not randomly distributed. Only with an RCT, can we be confident that different outcomes can be directly attributed to the intervention. This applies to all types of outcomes including reconviction, selfreported offending, psychometric test scores, attitudes, substance misuse and other behaviours.
Alternatives to RCTs RCTs cannot be used in all cases however; for example they may be difficult to apply to sentencing. When an RCT is not used it is essential to have an appropriate comparison group that adequately shows what would have happened in the absence of the intervention (or with an alternative). The quality of the match between intervention and comparison group should be 2
clearly described, detailing both how the match has been achieved and the level of comparability obtained. A five-point scale can be used to appraise the standard of research that evaluates the effects of interventions on re-offending and offence-related outcomes, based on the extent to which the design controls for different sources of bias. Table 2: Scientific Methods Scale adapted for reconviction studies Standard Level 1
Level 2 Level 3 Level 4 Level 5
Description A relationship between intervention and reconviction outcome. Compares difference between outcome measure before and after the intervention (intervention group with no comparison group) Expected reconviction ratesª (or predicted rates) compared to actual reconviction rates for intervention group (risk predictor with no comparison group) Comparison group present without demonstrated comparability to intervention group (unmatched comparison group) Comparison group matched to intervention group on theoretically relevant factors e.g. risk of reconviction (well-matched comparison group) Random assignment of offenders to the intervention and control conditions (Randomised Control Trial)
ªExpected reconviction rates can be generated using the Offender Group Reconviction Scale-Revised (OGRS-R), see Taylor (1999). This is a Home Office-developed risk prediction instrument that assesses the likelihood of reconviction in the absence of any intervention.
Propensity score matching is a level 4 research design that controls for bias by using the most robust method of matching individuals on all variables that affect participation in the intervention as well as all variables that affect outcome. Regression discontinuity design is a level 4 research design that allocates individuals to an intervention or control group based on a pre-defined cut-off point on a quantifiable measure. It requires rigorous application of the measure as the sole basis of allocation or selection to the intervention.
Making sense of outcomes for completers and non-completers A common pattern with outcome research shows that those who complete an intervention have better outcomes than the comparison group, than those who are referred but fail to start, and those who start and drop-out. While the results for completers can be read as evidence of effectiveness, they can also be read as a selection effect. This means that those who stayed the course would have done better regardless of the intervention because they were more motivated, had fewer needs, lower risk, etc. while those who did not complete would have done worse anyway. Thus effectiveness is not best determined by examining results for completers alone. It is important to consider the ‘real world’ outcomes of both completers and non-completers combined to understand the effects of the intervention as it is used in practice. This type of evidence is needed to determine what works to reduce re-offending and to drive the most costeffective use of NOMS interventions.
Sample size Sample size matters because smaller sample sizes are less likely to detect small effects. In particular, there is a danger that small studies may find no effect when the correct interpretation should be that the sample size was insufficient to detect an effect. If the sample size falls below the minimum required (see Table 3 based on Fleiss, 1987), the reliability of the findings is reduced and should, therefore, be treated with caution. In addition, small samples are rarely representative of larger populations and this greatly limits the extent to which the findings and 3
conclusions can have any wider application to other samples or populations. Small samples are best suited to qualitative studies where a representative sample is not necessarily needed. Table 3 Minimum sample size required by expected reduction in reconvictionª Expected percentage point reduction in reconviction
Minimum number of people in each intervention and control/comparison group 325 572 1,273 5,024
10 7.5 5 2.5
ªIt was assumed the average general reconviction rate for offenders was 50 per cent within two years from the start of a community sentence or release from prison, a 1-tailed test 95 per cent confidence and 80 per cent power (type II error).
Are the data representative? If quantitative data are not representative of the population or other samples, they are not useful. Some questions to ask: • •
Was the sample randomly selected (or did a well-meaning person choose people to take part and so introduce bias)? What’s been done to examine statistically the effect of missing data or nonrespondents?
Why missing data and data quality matter • • •
Assume a random sample of 100 offenders Reconviction data are available on 50 of those offenders According to the reconviction data 20 offenders are reconvicted.
When missing data are ignored, a reconviction rate of 40 per cent is produced:
Offenders reconvicted Offenders not reconvicted Total
Results from our ‘known’ data Number Percent 20 40 30 60 50 100
When missing data are included, a reconviction rate of 20 per cent is produced:
Offenders reconvicted Offenders not reconvicted Missing data Total
Results from the whole sample Number Valid percent 20 20 30 30 50 50 100 100
With no knowledge of the reconviction outcomes for the 50 missing cases, it would be most accurate to report a reconviction rate of at least 20 per cent, but possibly ranging to as high as 70 per cent if the missing cases were all reconvicted. 4
Small-scale studies of effectiveness and ‘what works’ Operational managers naturally wish to evaluate impact of local schemes but this poses particular problems. It may well be difficult to use RCTs or to generate sufficiently large sample sizes or design adequate control groups. RDS NOMS is planning to work with probation and prison researchers to think how best to address this important issue in more detail and see what options and best practice can be generated. Meanwhile the following may be of use. RCTs may be applied to local-level studies of impact with the use of other outcome measures apart from reconviction alone – for example, reliable self-reported measures of criminal attitude and offending-related behaviour. The results from small-scale studies using consistent methods can be combined and generalised through meta-analysis of effect sizes. Both cross-sectional and longitudinal designs can be applied at a local level to examine the strength of associations between different factors and offending-related outcomes for a sample representative of the location. Additionally, analysis of management information can point to action research at a local level, if qualitative data are collected to explore why performance is poor and how it can be improved. Different explanations of desistance and re-offending can be explored at a local level through the collection and analysis of qualitative data about how and why offenders come to re-offend or desist. These explanations can be developed into models of change to inform new interventions for future evaluation. If you are an operational manager or researcher interested in the problems of evaluating smallscale scheme, please contact NOMS RDS at the email address below.
Summary - key things to remember 1. What’s the research question?
−
Are they the right research questions (are they important and useful) and have these been answered?
2. Will the research design answer the question?
–
If you need to know what effect it has on re-offending or the answer to the question ‘does it work?’, you need a randomised control group or appropriately matched comparison group to get a valid answer to the question.
3. Is the sample size big enough?
–
If you want to measure impacts, differences or changes (assuming five per cent impact/difference from a 50 per cent baseline) as a rule of thumb: you need 1,300 people in each of your intervention and comparison groups for the sample to be large enough.
5
4. Are the data representative of anything useful?
–
If not, you cannot generalise from them.
5. Are there missing data and is this a problem?
–
Has the effect of missing data been analysed? If not, can you be sure that the results are meaningful?
–
Demand to know the actual (or weighted) percentages, not the valid percentages.
For further information contact: Dr Mia Debidin, RDS NOMS Email:
[email protected] Telephone: 020 7035 3438
6