EVIDENCE BASED DECISION MAKING IN

2 downloads 0 Views 4MB Size Report
excluded, these might include diagnostic tests, surgical treatments, medical and ... produced guidelines or reports (for example, British Thoracic Society for .... The second line is the PICO for this question, and then below each facet are the.
EVIDENCE BASED DECISION MAKING IN HEALTHCARE Workbook and methodology handbook

0 Edward Purssell September 2012

Introduction This aims to guide you through the process of producing a mini-review. It is not a complete guide, and is not to be referenced as such; it simply aims to point out things that you could/should/should not do. The assessment falls into two parts 1. The search 2. The analysis The first of these is essentially a pass/fail affair. You either do it right or you don’t, if you do you pass, if you don’t you fail. The second part is more complex because it is about your interpretation, which is a matter of opinion. As long as what you do here makes sense, and is supported by the evidence you provide, you can write what you like. This is the part where you show your critical appraisal and analytical skills, and so there are lots of marks here. Support Support is available from the course leader and lecturers, and your programme leader. However, this is part of an advanced course of study, and you should be primarily self-directed. You should not expect more than about an hour of individual tutorial time in total (it will never be all in one go!) Statistics This is not a statistics course! Therefore you do not need to know the various tests in detail, but you do need to be able to interpret them. Part of doing that is knowing the different types of data, and a rough idea of the right tests and how to interpret them. If you have no idea, then you will have to do some background reading. It is fine to ask someone, but remember in the text you will be expected to demonstrate that you understand what you have been told. Sorry, there is no easy way round this! Handbook The detailed requirements and marking for the assessment is contained within the Course Handbook and the Post-Graduate Handbook. Methodology This workbook makes reference several times to the NICE and SIGN methodologies. This is because they are relatively straightforward, and used by the NHS. You do not have to use them, there are lots of others. The important thing is that you use a methodology that makes sense. As long as it contains all of the ‘bits’ you will be fine. If you want to know about these in detail see the NICE and SIGN websites.

1

Clinical question The Scope/background The scope is the framework within which the guidelines are developed. For NICE this includes a process of scope development and then consultation with key stakeholders before the final scope of the review is defined. For the purposes of this exercise this need not be done, but you must ensure that you write a scope which will serve as the background to the final review. Important issues you must consider are 1. 2. 3. 4.

5. 6. 7.

The epidemiology of the disease or condition The populations to be included or excluded The appropriate healthcare settings The different types of interventions and treatments to be included and excluded, these might include diagnostic tests, surgical treatments, medical and psychological therapies, rehabilitation and lifestyle advice Topic-specific information and support for patients and carers. The main outcomes that will be considered Any links with other relevant guidance

Scoping search The scoping search is undertaken in order to identify any other clinical guidelines, health technology assessment reports, key systematic reviews and economic evaluations relevant to the guideline topic. This search does not aim to be exhaustive or to address potential review questions in any detail. NICE suggest the following sources for a scoping search, you need only use the Cochrane Database of Systematic Reviews and Medline, although you may also like to use others if appropriate.

NICE suggested sources for scoping search o Cochrane Database of Systematic Reviews – CDSR (Cochrane Reviews) o Health Technology Assessment (HTA) Database (Technology Assessments) o Medline/Medline In-Process o National Guideline Clearinghouse (United States) o National Library for Health (NLH) o NHS Economic Evaluation Database (NHS EED) (Economic Evaluations) and the Health Economic Evaluations Database (HEED), if subscribed to o Websites of NICE and the National Institute for Health Research (NIHR) HTA Programme for guidelines and HTAs in development o Websites of relevant professional bodies and associations that may have produced guidelines or reports (for example, British Thoracic Society for

2

. The scope therefore serves two purposes for your review 1. It ensures that there is not an existing review dealing with your subject, to which you can not add anything. If there is a review, but you can add new data, interpret the data differently, or answer a slightly different question then you can do it. What you can’t do is simply replicate an existing review. 2. It provides information for the background to the review. Remember that reviews are generally read by non-experts. You must provide sufficient information so that non-experts can understand your subject and question. The scope will also make it clear why this is an important question.

3

The Question You must then define the question which the review will answer. It is very important that the question is properly structured, as you will need to identify the different facets of the question later. This will be considered later, for now you need to come up with the question. Remember it must lead on from the scope/background. You might try coming up with a few alternatives to give you some flexibility later on.

My question is

Types of question Although there are many different types of question, they generally fall into one of three categories 1. Intervention 2. Diagnostic 3. Prognostic Other types of question include aetiology, economic, patient experience and service delivery questions. It is not recommended that you try these types of question. It is important for you to be able to identify your question type, as this will inform your search strategy, and the types of study that you look for to answer the question. What type of question are you asking?

4

PICO Questions must be structured in order to be answerable. There are many ways of structuring questions; a commonly used method is the PICO. Each of the PICO elements will eventually become a facet of your question. These are: P I C O

Population Intervention Comparison/counter-intervention Outcome

However, not all questions will contain all of these facets. The important issue for you is to be able to identify those that you need, to define them, and to select those that are, and are not, useful for your search.

5

Intervention questions These compare different interventions for a health related problem in a particular population. They are generally structured using the PICO format Patients/population: Which patients or populations of patients are we interested in? How can they be best described? Are there subgroups that need to be considered? My population is: _____________________________________________ Intervention: Which intervention, treatment or approach should be used? My intervention is: ____________________________________________ Comparison: What is/are the main alternative/s to compare with the intervention being considered? My comparison is: _____________________________________________ Outcome: What is really important for the patient? Which outcomes should be considered? Examples include intermediate or short-term outcomes; mortality; morbidity and quality of life; treatment complications; adverse effects; rates of These relapse; late morbidity and re-admission; return to work, physical and social functioning; resource use. My outcome is: _________________________________________________

The usual type of method used to answer these questions is: Alternative types of methods are:

6

Diagnostic questions Questions about diagnosis are concerned with the performance of a diagnostic test, typically about the diagnostic accuracy of the test or the clinical value of using the test. Most of these questions will be comparing a new test to a currently used (or ‘gold standard’) test. A slightly modified PICO format can also be used for these questions, the ‘outcome’ being the accuracy of the test. Patients/population: To which patients or population of patients would the test be applicable? How can they be best described? Are there subgroups that need to be considered? My population is: _____________________________________________ Intervention: The test being evaluated (the index test). My intervention is: ____________________________________________ Comparison: The test with which the index test is being compared, usually the reference standard (the test that is considered to be the best available method to establish the presence or absence of the outcome – this may not be the one that is routinely used in practice). Target condition: The disease, disease stage or subtype of disease that the index test and the reference standard are being used to establish. My comparison is: ______________________________________________ My target condition is: __________________________________________ Outcome: The diagnostic accuracy of the test for detecting the target condition. This is usually reported as test parameters, such as sensitivity, specificity, predictive values, and likelihood ratios. My outcome is: __________________________________________________

The usual type of method used to answer these questions is: Alternative types of methods are:

7

Prognostic questions Prognostic questions deal with particular outcomes, such as the progression of a disease, or the survival time for a patient after the diagnosis of a disease or with a particular set of risk markers. A prognosis is based on the characteristics of the patient ('prognostic factors'). These prognostic factors may be disease-specific (such as the presence or absence of a particular disease feature) or demographic (such as age or sex), and may also include the likely response to treatment and the presence of comorbidities. A prognostic factor does not need to be the cause of the outcome, but should be associated with (in other words, predictive of) that outcome. It is unlikely that you will have both an I and a C, so a modified PICO is needed. If you have both an I and a C, what kind of question is this most likely to be?

Patients/population: Which patients or populations of patients are we interested in? How can they be best described? Are there subgroups that need to be considered? Intervention: What factors or characteristics of the patients or disease do we think might be prognostic? Is this a characteristic of the patient, or is it an aspect of treatment? Outcome: What outcome are we interested in? Is it a clinical outcome, or a surrogate outcome?

The usual type of method used to answer these questions is: Alternative types of methods are:

8

Other types of questions The NICE manual also deals with questions regarding patient experience and service delivery. It is not recommended that you try these kinds of questions. The NICE manual does not deal with aetiology questions as these are beyond the remit of NICE. PICOTT and other approaches The PICO format is there to help you formulate an answerable question, it is not set in stone however, and part of your decision making process is its applicability. You must be able to explain what you have done. There are also some modifications of PICO, for example PICOTT o o o o o o

Population Intervention Comparison Outcome Type of question being asked Type of study that can answer the question

Formulating the question and PICO By now you should 1. Be able to write your question containing all of the relevant facets 2. Be able to identify the facets of the question My PICO is Population

Comparison

Intervention

Outcome

This should be presented in the text of your assignment. Facet analysis Having analysed your question, the next step is to construct a facet analysis. This is where you take each of the items from your PICO (however many you used), which are the facets of your question, and think about all of the different ways that they can be expressed. You need to think about different ways of saying the same thing, different spellings, synonyms, and the use of MeSH and freetext terms (discussed later). The facet analysis will become the plan for your search, and it links the PICO to the actual search you conduct. An example is given below, this is for the question, is ibuprofen more efficacious than paracetamol for the treatment of fever in children?

9

Population Children Child (MeSH) Child$ P?ediatric Infant

Intervention Ibuprofen Ibuprofen (MeSH)

Comparison Paracetamol Paracetamol (MeSH) Paracetamol Acetaminophen

Outcome Fever Fever (MeSH) Fever Temperature Febrile

The second line is the PICO for this question, and then below each facet are the different ways of describing these. Note that MeSH terms are identified, the others are freetext searches. In this particular example it was decided not to use tradenames, which would be explained in the text. Inclusion and exclusion criteria In addition to your search terms, there may be other criteria upon which you want to include or reject studies; these are known as inclusion and exclusion criteria. These should be clearly identified, and may include population, phenomena, study type, and study quality. For example, it is unusual to mix paediatric and adult populations, so if you are interested in adults, an exclusion criterion may be that the study includes children. If you are considering an effectiveness question, you may decide to include only experiments and quasi-experiments. The choice is yours, but you must make reference to these, even if you decide to have none. In order to make this as clear as possible it is recommended that these are put in a table. Types of studies and the hierarchy of evidence There are a number of evidence hierarchies, the idea being that some types of study are ‘better’ than others. NICE defines these as ‘Study types organised in order of priority, based on the reliability (or lack of potential bias) of the conclusions that can be drawn from each type’. Although there is not complete agreement about different levels of evidence, you need to think about the required level of control, randomness (sample and treatments), and method of data collection. While it is tempting to just go for RCTs, which are among the highest levels of evidence, they are not suitable for all questions and you need to demonstrate your understanding of the best methods for answering your particular question. The NICE hierarchy of evidence is discussed later in this manual. .

Further reading Engberg S, Schlenk EA (2007) Asking the right question. Journal of Emergency Nursing 33 (6) 571-573

10

Developing your search Having decided upon your question and broken it down into its facets, the next step is to develop a search strategy. There is no one correct way of doing this, but there are plenty of incorrect ways! The first step is to turn your PICO into a facet analysis. This is your way of deciding what terms you are going to search. Look at each of your terms, and think about the different words that can be used to describe them. There are two types of search terms, those that are found in the databases vocabulary thesaurus, such as MeSH terms, and those that are not. MeSH MeSH stands for Medical Subject Headings, and is the ‘controlled vocabulary’ used by Medline. It consists of sets of words that describe various medical terms, and under which papers are indexed by the National Library of Medicine. These are arranged in a hierarchical structure, which is to say at the ‘top’ there are very broad terms, which can be made progressively more specific. The equivalent in Embase is Emtree. For example the MeSH term for children is child. This is a subset of the term agegroups, along with adolescent, adult and infant, which in turn is a subset of the term persons. There is another related term, which is preschool children. In this case they have different meanings, you need to make sure that you are using the correct term for the group that you want. The relationship between the terms can be seen below, note that it starts with a very general term, and becomes more specific as you go down. You will normally be expected to include a MeSH term for each facet. If there is not one, you must discuss this.

Persons Age Groups Adolescent Adult Child Child, Preschool Infant

11

You need to make sure that 1. All facets contain the appropriate MeSH or equivalent term. 2. The MeSH term is the correct one for your facet. 3. That you have the correct balance between sensitivity and specificity.

This forms the first line of your facet analysis Population MeSH term

Intervention MeSH term

Comparison MeSH term

Outcome MeSH term

Freetext Because MeSH terms depend on indexers to correctly identify and classify papers, and because some things may not be covered by a MeSH term, you also need to search free-text terms. By default MEDLINE will search the title, abstract and any key terms provided for free-text terms. You need to consider all of the terms that might be used for your subject, which may include synonyms, acronyms, abbreviations, differences in terminology across national boundaries, different spellings, old and new terminology, brand and generic drug names, and lay and medical terminology. You must include at least one free-text term for each facet, usually there will be many more. There are also some tools designed to help you with this, the most important being truncation and wildcards. Trunction Truncation is a symbol (the $ in Medline) that truncates a word at the point of its insertion, allowing for any ending. For example Child$ will search Child Childs Children Childish Wildcards Wildcards (? in Medline) are symbols that can stand for any letter, or no letter. For example health?care will search Health care Healthcare Health-care

There are many other useful tools, for example adjacency operators. Look in the help menu of the database you are planning to use. Be aware that the examples here apply to Medline and may not be the same in all databases. 12

When added to the PICO you will have something that looks like this. Population MeSH term Free-text term Free-text term Free-text term

Intervention MeSH term Free-text term Free-text term Free-text term

Comparison MeSH term Free-text term Free-text term Free-text term

Outcome MeSH term Free-text term Free-text term

Sensitivity versus specificity When planning a search you need to balance sensitivity (getting lots of papers some of which will be irrelevant) against specificity (getting fewer papers, but that are very focussed on your question). This is a balance, but you will probably need to do a sensitive search first in order to make sure that you do not miss important papers. While the sensitivity and specificity will result primarily from the way in which you construct your search, some databases give you tools to aid you with this, in particular explode and focus. Exploding a term will search for all articles indexed with that term plus articles indexed with related narrower terms. Focussing a term will select only those articles where the term is of major significance. In general you should explode to increase the sensitivity. Constructing the search Boolean operators Having decided what search terms you are going to use, you have to construct the search. This is not just a matter of putting the search terms into the database, as they have to be combined in a particular way to give you one final list of papers This is done using tools known as Boolean operators, and there are two that are particularly important for this purpose, these being OR and AND. They have very different meanings and effects and must not be mixed up. This is one of the few things that equal an immediate fail, as your search will be meaningless. • The operator OR broadens the search and retrieve records containing any of the words it separates • The operator AND narrows the search and retrieve records containing all of the words it separates For example, paracetamol OR ibuprofen would retrieve papers with either drug, while paracetamol AND ibuprofen would only retrieve papers containing both drugs. The final search would therefore be constructed thus

13

Population Intervention Comparison Outcome MeSH term MeSH term MeSH term MeSH term OR OR OR OR Free-text term Free-text term Free-text term Free-text term OR OR OR OR AND AND AND Free-text term Free-text term Free-text term Free-text term OR OR OR Free-text term Free-text term Free-text term So each of your facts are first combined with OR, and then at the end each of these OR lines are combined with AND. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.

MeSH term for population Free-text term for population Free-text term for population 1 OR 2 OR 3 MeSH term for intervention Free-text term for intervention Free-text term for intervention 5 OR 6 OR 7 MeSH term for comparison Free-text term for comparison Free-text term for comparison 9 OR 10 OR 11 MeSH term for outcome Free-text term for outcome Free-text term for outcome 13 OR 14 OR 15 4 AND 8 AND 12 AND 16

14

Annotated search One way of understanding your search is to annotate it. This looks like this 1. MeSH term for population 2. Free-text term for population 3. Free-text term for population 4. 1 OR 2 OR 3 5. MeSH term for intervention 6. Free-text term for intervention 7. Free-text term for intervention 8. 5 OR 6 OR 7 9. MeSH term for comparison 10. Free-text term for comparison 11. Free-text term for comparison 12. 9 OR 10 OR 11 13. MeSH term for outcome 14. Free-text term for outcome 15. Free-text term for outcome 16. 13 OR 14 OR 15 17. 4 AND 8 AND 12 AND 16

Population Boolean operator ‘OR’ Intervention Boolean operator ‘OR’ Comparison Boolean operator ‘OR’ Outcome Boolean operator ‘OR’ Boolean operator ‘AND’

Limits and filters Limits and filters may be used to make a search more specific if needed. They are not quite the same; limits are provided by the database, whereas filters are search strings that need to be added. The latter is probably only suitable for those with some experience of database searching, and if you are not confident you should either not use anything, or use the limits provided. Most databases provide a number of different limits; however they must be used with caution. In particular you must understand the effect that using limits will have on your search and be prepared to justify their use. For example, careless use of a time limit may miss important papers. The best limitation method is for you to look and make a decision. The recommendation from NICE about use of limits is • Date parameters. These depend on the clinical guideline topic and on when the majority of the research was published. • Animal studies can be excluded from the search results in some databases • You can limit a review to studies reported in English. • Depending on the review question, it may be appropriate to limit searches to particular study designs. NICE advise that the best way to do this is to use an appropriate search filter rather than limiting searches by the publication type field, but if you are not confident about doing this, use the limit function. • Sometimes it may be appropriate to limit searches by age. This can be useful to identify citations relating to children, but is often not necessary for those relating to adults. • Limiting searches by sex is not recommended. Remember, whatever you do you must be able to justify it. 15

There are no hard and fast rules, but generally you will be expected to use and/or discuss the use of MeSH terms (definitely) At least one free-text term (definitely) Trunction (probably) Wild-card (probably) Limits (possibly)

16

The new OVID interface Log in and choose your database. If you log in through the College system you will see this screen. If you log in through your employers system, it may be different. Note that if you are using Medline, you must not use the database that includes ‘non-indexed citations’ as you will not be given the option to search MeSH terms if you do (as they won’t have any, they are still being indexed!)

The Medline that you get here is essentially the same as Pubmed, with some extra features. For most people Medline plus one other will be your basic search. Choosing that other is part of the process that you need to go through. If you have never used Ovid before, you are strongly advised to go through the online tutorials which are in the support and training section (find via the help menu)

17

Features of the OVID search page By default if you enter through the College system, you will be put into the advanced search, if you are not, then you should manually change this. Use the help system to navigate around the system. As you type entries the search history will build up, and this will be checked to see that it matches the text. They must match! The easiest way to submit you annotated search is to print this page out when you have finished and to write or type the annotations on. If you can’t then just retype the information.

If you wish to save your search, set up a personal account here

Type terms in here making sure keyword is selected

Tick turns MeSH headings on – If you don’t have this, you are in the wrong database Gives you access to limits and filters

18

The review Managing your papers Having conducted the search you will end up with a number of ‘hits’. The numbers are very variable, and may be a few, or may be several hundred. You must have at least two that you can use; you can not do a review on less than that! If there are too many to manage then you may need to reconsider your search. There is no rule about what is ‘too many’ it is up to you. If you are happy with your search there is then a four step process that you need to undertake to identify the studies that will form your review. 1. 2. 3. 4.

Look at the titles and exclude the irrelevant Look at the abstracts of those that are left and exclude those that do not answer the question or meet your inclusion criteria. Read the full text of those that are left to assess their suitability. Assess those that meet your criteria for their quality.

You must be able to account for every paper. For those excluded at step 1 and 2, it is sufficient to give the numbers excluded at each stage. For those that meet your criteria but you do not include in the review (step 3 onwards), full details must be given in a table of excluded papers. You must provide the reference and the reason for exclusion. One way of presenting these data are in the form of an exclusion flow chart. Exclusion flow chart The most important thing of all in a review is transparency. In order to make it quite clear how you got from the total number of papers in the search to the smaller number that you review, you should produce some kind of table. Remember, all studies must be accounted for in some way. We need to know the following information 1. What is the total volume and nature of the literature? 2. How much of this is relevant? 3. How many abstracts did you review? 4. How many papers did you read 5. Of these how many did you reject – and why? 6. How many papers did you include We are not interested in the detail of papers until you get to reading the full text. By this time you have thought that they were relevant twice, firstly when you looked at the title, and secondly when you read the abstract, so what changed when you read the paper? One way of presenting these data are shown below.

19

Total number of papers from search 1 000 950 excluded on the grounds of irrelevance – don’t need to know any more about these Abstracts reviewed 50 40 excluded after review of abstracts – don’t need to know any more about these Papers reviewed 10

Included in review 7

3 excluded after reviewing the papers – need to know why, these go in table of excluded papers

Table of excluded papers Reference Bloggs (2004) Bloggs and Bloggs (2007) Bloggs et al (2009)

Reason for exclusion Cohort study Case-control study Uses data reported in Bloggs (2008)

Quality criteria All of the papers that you select must then be subject to some kind of quality assessment. There are a large number of tools for this, which are specific for different research methodologies, and most can be found on the Equator Network website http://www.equator-network.org/. You must ensure that you use the correct one, and be quite clear about what these are and crucially are not. These are guidelines for assessing the transparency and accuracy of research, in other words the quality of the reporting. They do not tell you that research is useful, worthy, important, significant….just that it is well reported, and rubbish can be well reported and valuable information can be reported poorly. Also remember to look at other papers in same journal or the Instructions for Authors, as what appears to be poor reporting may be due to the requirements of the journal. 20

You do not need to include the details of the assessment of each paper, but a summary table should be included to demonstrate that you have done this. The quality criteria that you choose as a minimum level for the inclusion of a paper is for you to decide, but whatever you decide to do, you must discuss it. This discussion must include your rationale and some appreciation of the effect that this decision has had on your review. You may decide to exclude papers on the grounds of reporting quality, if so they must go into your table of excluded papers. Evidence levels When you read papers you will find that evidence begins to fall into different categories or levels, such as that below. These are linked to the various hierarchies of evidence, and have the same benefits and limitations. Some kinds of studies are more robust than others, but you need to find the evidence appropriate for your question, and you will be marked according to your understanding of this. You should identify the level of evidence that you are providing. There are a number of different schemes for categorising evidence, these are the levels as defined by NICE and SIGN, and more complex equivalents are available. As you can see, a good systematic review is the highest level of evidence. If you find a systematic review, you must be able to add something to it, otherwise you are undertaking a pointless exercise, the evidence is there! This might be new data (if more recent studies have been done) or a meta-analysis. It is rare that level 3 evidence is appropriate, and level 4 is not appropriate for this exercise. If you are planning to use level 3 evidence, have a very, very good reason why.

Description of papers Having selected your papers, the next step is to describe them, tell us what they are about. The details of what you say about them will differ according to the nature of 21

the papers, but a lot of words can be saved by putting much of the information into an evidence table (these are discussed later). Remember the markers need enough information to know what you are talking about, but there are not a lot of marks for this. Synthesising the evidence The synthesis of the evidence is where you bring it all together, and where the big marks are to be found. So far you will generally have done things right or wrong, if you have done it right you are on for a pass, and the synthesis part will improve your mark. If you have done it wrong so far, you will fail, no matter what you do here. Let’s assume that you have done ok so far. There are two main types of synthesis, narrative (where you just discuss the papers) and meta-analysis (where you combine the statistics from the papers to calculate a summary statistic). Which ever one you decide to do, there are a number of things that you must do. Firstly, extract the statistical data of interest from the papers. If you can’t find this, you need to carefully consider if you can use that paper. There are two types of data to be found and extracted. These are the measures of 1. 2.

Statistical significance, the p-value Effect measure, the clinical significance

You must find both, extract both and differentiate them. This is crucial, because the p-value, which everyone seems to get excited about, is not a measure of clinical significance but simply a calculation of the probability of this result, or one more extreme, occurring if the null-hypothesis is true with a random sample of size n. SPSS, Minitab etc. are fine statistical programmes, but they only know about numbers, they don’t know whether a fall in blood pressure of 15mm Hg is important or not, that will come from your interpretation of the effect measure. The kind of things to look for here are means, medians, odds-ratios and relative-risks. You must also consider the confidence intervals associated with these. As far as statistical tests go, you don’t need to be a statistician, but you must be able to identify the correct statistic, have some idea of what the test does and does not do, and be able to interpret it. At this level you are expected to have some knowledge of statistics, and there are lots of good introductory books in the library about this, and some excellent websites. The only statistical test we are going to teach in any detail is metaanalysis.

A little more about p-values and confidence intervals The p-value, defined above has a number of features 1. It is a probability 2. That probability is of getting the data observed 3. Assuming that the null hypothesis is true 4. Using a random (probability) sample 5. Of a given size

22

It is therefore defines the probability of getting the data if the null hypothesis is true. You therefore need to understand what the null hypothesis is, whether it is directional or not, and the relationship of the hypothesis (which is what you are interested in but have not tested) is to the null hypothesis (which you are not really interested in, but nevertheless have tested). It is crucial to remember that apart from one relatively specialist area of statistical testing (called Bayesian statistics), you cannot test the hypothesis directly. So what of the 95% confidence interval? The confidence interval (and it does not have to be 95%) is a measure of precision. Typically you will see a statistic given, for example a mean value, and then a range which constitutes the 95% confidence interval around that mean, for example mean BP of 110mmHg (95% CI 100-120). Interpreting a confidence interval is not as straightforward as it may seem however. Typically you will see it defined as a range within which you can be 95% sure the true population mean lies. The problem with this definition is that frequentist statisticians (which is what we are!) believe that that all parameters have a fixed, if unknown value. In other words, we don’t know what the true population mean is, but we know there is one, and we believe it has a fixed value. Now look back at that definition above – it says that the 95% CI is a range within which the true mean lies. Can you see a problem? The problem is that we think the mean is an unknown but fixed value. If it is fixed, how can it be within a range? The answer is that it can’t, and the reason for the confusion is that we are interpreting the 95% CI incorrectly. Frequentists (remember that is what we are!) are interested in long term frequency, i.e. what happens (or would happen) if we were to repeat something many times. Thus the 95% CI says that if we were to repeat the test many times, 95% of the times, the statistic would fall within that interval. The problem of course, is that we generally don’t repeat the test many times. This is interesting, but it doesn’t really help us to interpret one 95% CI from one test. A compromise might be to define a 95% CI as ‘a unique interval XX:XX [the numbers] that estimates the population parameter with 95% confidence’ Klein, 2004). This is not quite correct, but it is not too incorrect, and suggests that the CI is actually a measure of the precision of the estimate of the parameter rather than any kind of range. For those who are interested, there is a lot of literature on this subject, much of which is quite accessible. Interestingly very little of it is from healthcare, most is from the psychological and educational literature. A good starting point: Klein, B (2004) Beyond significance testing: Reforming data analysis methods in behavioural research. American Psychological Association, Washington DC Purssell, E., While, A., P=nothing, or why we should not teach healthcare students about statistics, Nurse Educ. Today (2010), doi:10.1016/j.nedt.2010.11.017 23

What are the outcome measures? Look for two things. 1.

Some measure of effect a. For dichotomous outcomes this may be an odds and odds ratio, a risk and risk ratio b. For continuous outcomes this may be a mean and mean difference c. For other non-normally distributed data this may be a median Remember to also find the 95% CI

2.

Some measure of statistical significance

Ideally these will be the same for all studies, if they are not why not? When you have found them, you may like to write a summary as below for each study. Remember, while it is important to be able to find the data and extract it, the most important bit of all is your interpretation, that is where you show that you understand what you have read. My outcome Does drug X affect blood pressure?

Effect Mean difference in BP at 30 mins was 15mmHg (95% CI 10-20 mmHg)

P-value P=0.04

Your outcome

Effect

P-value

My interpretation Statistically significant but of marginal clinical significance Your interpretation

Data tables You must provide a summary table/s. Many people do two separate ones, the first for the descriptive data, and the second containing the outcome data for the synthesis. The recommendation from SIGN is that the table/s should contain the following elements: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Reference Study type Evidence level Number of subjects Patient characteristics Intervention Comparison Length of follow-up Outcome measure Effect size Source of funding

Remember these tables must be clear and be explained in the text, their purpose is to help the reader understand, not confuse them more.

24

Grading of evidence and strength of the body of evidence So far we have concentrated upon the individual studies. Now having considered the entire body of evidence, come to some kind of conclusion and perhaps made some recommendations, you have to decide what the overall quality of that body of evidence is. There are a number of methods of doing this, among the most commonly used is the system devised by the GRADE (Grading of Recommendations Assessment, Development and Evaluation) Working Group (http://www.gradeworkinggroup.org/). The GRADE rating scale uses four categories to summarise the quality of the body of evidence. They interpret these slightly differently for systematic reviews and guidelines, but you need not worry too much about this. They state that for systematic reviews, the ratings reflect the extent of the confidence that the estimates of the effect are correct, while for making recommendations, they reflect the extent of confidence that the estimates of an effect are adequate to support a particular decision or recommendation.

Quality level High Moderate

Low Very low

Definition We are very confident that the true effect lies close to that of the estimate of the effect We are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different Our confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect

Significance of the four levels of evidence (Balshem (2010) J Clin Epid (in press) The important thing to remember is that this is a rating of the overall body of evidence for each outcome. The idea is to start with the design, then look for reasons to upgrade or downgrade the initial quality rating. It is the overall quality you are interested in here, not the individual studies.

25

Design

Randomised trials

Observational studies

Initial quality of body of evidence High

Low

Lower if

Higher if

Quality of body of evidence

Risk of bias -1 serious -2 very serious

Large effect +1 large +2 very large

High ++++

Inconsistency -1 serious -2 very serious

Dose response +1 evidence of gradient

Moderate +++

Indirectness -1 serious -2 very serious

All plausible residual confounding +1 would reduce demonstrated effect +1 would suggest spurious effect if no effect observed

Low ++

Imprecision -1 serious -2 very serious

Very low +

Publication bias -1 serious -2 very serious

Summary of the GRADE approach to rating quality of evidence (Balshem (2010) J Clin Epid (in press)

Criteria for moving studies down 1. Risk of bias A judgement about the overall limitations of the study. 2. Inconsistency Unexplained heterogeneity in the magnitude of effect, for example different point estimates; non-or minimal overlap of confidence intervals; statistical tests of heterogeneity have a p-value of 75%. If you do see this, think about possible differences in populations, interventions, outcomes or study methods. If you can identify a reason for the differences in outcomes, studies should be presented separately for each strata that explains the difference. 3. Indirectness

The evidence is not directly related to the question that is being asked, for example the population may be slightly different; the intervention or outcome might be different; or if you are interested in A v B, you may only be able to find studies comparing A v C and B v C.

26

4. Imprecision How precise is the estimate of effect? The best way of assessing this is to look at the 95% confidence interval. 5. Publication bias This may be difficult for you to assess, funnel plots are useful, but are not uncontroversial themselves. Criteria for moving studies up 1. Large effect - strength of association When methodologically strong studies show consistently high levels of effect or magnitude, this might strengthen your confidence in the finding. In observational studies reporting a relative risk, GRADE identifies two levels of association: Strong is equivalent to a RR of >2 (or 5 (or 2)

Randomised trial

Moderate Low Very low

Inconsistency Indirectedness

Observational study

Imprecision Publication bias

Any other

Recommendations

Most well informed people...

Do it

would do it

Probably do it

would do it but some would not

Probably don’t do it

would not do it but some would

Don’t do it

would not do it

29

Very strong evidence of association (e.g. RR>5) Evidence of dose-response gradient All plausible confounding would reduce a demonstrated effect

Meta-analysis If you have sufficiently detailed statistics you may like to do a meta-analysis. Although these are a little controversial, they are widely conducted and reported, so it is a good idea if you can to try it yourself. The main statistical programmes such as SPSS and Minitab do not do meta-analysis, but Stats Direct, which is on the PAWS machines does. There is also a free programme known as MetaAnalyst which can be downloaded from http://tuftscaes.org/meta_analyst/ Note that there are two models of meta-analysis, the fixed and random-effects model. You must choose one and justify its use. Using both is a cop out and suggests that you don’t understand the difference. If you undertake a meta-analysis you need to include as a minimum the forest plot, the pooled estimate, its p-value, the heterogeneity statistics Q and I2. We will discuss meta-analysis in class. Even if you decide not to undertake this, you should try to understand them, as they are increasingly common and you should have a basic understanding. Heterogeneity If you decide to do a meta-analysis, you will (if you have done it right) end up with a heterogeneity statistic. You must interpret these (look for the Q and I2 statistics). Even if you have not done a meta-analysis, you must consider and discuss the significance of heterogeneity. Are the studies all the same? Are they all showing the same outcome? If not, why not? If they are, why is this the case when they have been done in different groups and at different times? Limitations The ability to critique ones own work is very important at this level, and you will be expected to show that you have done this. This is where you discuss the limitations of your review, and all reviews have them. Some things to consider are • The question • The search • The quality of the evidence found • Biases, such as publication, positive findings, funding biases • Any evidence that was or might have been missed • The reliability and validity of the data extraction and synthesis • Your personal ability to do the review Conclusions and recommendations In this section you need to decide what the evidence that you have summarised and synthesised means in the context of the clinical question. You may or may not decide to use all of the data that you have found, but whatever you decide, you must discuss and justify the decision. There should be discussion of how the presence of potential biases and uncertainty in the evidence has influenced your conclusions and why. Those who do best are those who make strong clinical recommendations, NICE suggest that these have the following features.

30

Action focussed, what needs to be done Include what readers need to know, does this make sense without reading the entire paper? 3. Reflect the strength of the recommendation a. Some things must or must not be done b. Some things should or should not be done c. Some things could or could not be done 4. Emphasise patient involvement The most important thing is that your key thought processes and rationale for the recommendation should be clear. 1. 2.

Research recommendations Unless the evidence is complete (and it won’t be) there will be areas that would benefit from further research. You will be expected to identify and comment on these. These must be specific, a general ‘more research is needed’ statement is not sufficient. You might consider what are the questions that need researching, how could this research best be undertaken, and what are the barriers to doing this? One widely used model is EPICOT E evidence-what is the current state of evidence? P what populations might be studied? I what interventions might be studied? C what comparisons might be studied? O what outcomes might be studied? T time-how urgent is the need for research? Appendices These are not in the word limit, and so in general must not contain information that should be in the text as they are not marked. None are necessary for this assignment. References These must be complete and comply with the School guidelines. Missing references lays you open to accusations of plagiarism, which is an academic and potentially professional offence. It is recommended for those going on to further study, particularly dissertations, that you use bibliographic software such as Endnote or Reference Manager. Tables Finally a word about tables. These are very useful as you can very easily present a significant amount of data in a very small place. However, remember 1. You can not show understanding in a table, this can only come over in discussion. 2. Tables always need to be referred to and explained in the text. They are never self-explanatory. 3. They need to be easy to read and understand, they should add, not detract from understanding. 4. They always need a number and a title. 31

Further reading A good place to start is the book How to read a paper, by Trisha Greenhalgh, of which there are many copies in the library. Guideline manuals are available from the NICE, NHS Centre for Reviews and Dissemination and SIGN websites. The Cochrane Library also has its own manual which can be used, but it is very long and is specific to them.

32