Development and validation of reporting ... - Wiley Online Library

4 downloads 5705 Views 87KB Size Report
tool available to appraise the quality of studies ... linkage technology and data analysis ... This study aimed to develop and validate reporting guidelines for.
Data Quality Article

Development and validation of reporting guidelines for studies involving data linkage Megan A. Bohensky, Damien Jolley

Abstract

Centre for Research Excellence in Patient Safety, Department of Epidemiology & Preventive Medicine, Monash University, Victoria

Objective: Data or record linkage is commonly used to combine existing data sets for the purpose of creating

Vijaya Sundararajan

more comprehensive information to conduct research. Linked data may create

Department of Human Services, Victoria

additional concerns about error if cases are not linked accurately. It is important that factors compromising the quality of studies using linked data be reported in a clear

Sue Evans, Joseph Ibrahim, Caroline Brand Centre for Research Excellence in Patient Safety, Department of Epidemiology & Preventive Medicine, Monash University, Victoria

and consistent way that allows readers and researchers to accurately appraise

D

the results. The aim of this study was to develop and test reporting guidelines for evaluating the methodological quality of studies using linked data. Method: The development process included a literature review, a Delphi process and a validation process. Participants in the process were all Australian and included biostatisticians, epidemiologists, registry administrators, academic clinicians and a peer-reviewed journal editor. Results: The final guidelines included four domains and 14 reporting items. These included: data sources (six items), research selected variables (four items), linkage technology and data analysis (three items), and ethics, privacy and data security (one item). Conclusion: This study is the first to develop guidelines for appraising the quality of reported data linkage studies. Implications: These guidelines will assist authors to report their results in a consistent, high-quality manner. They will also assist readers to interpret the quality of results derived from data linkage studies. Key words: Data collection, medical record linkage, guideline, research design, peer review, research Aust NZ J Public Health. 2011; 35:486-9

ata or record linkage is “a process of pairing records from two files and trying to select the pairs that belong to the same entity.”1 It is commonly used to combine existing data sets to create more comprehensive information to conduct research. Studies involving data linkage are becoming more common. The Australian Government through the National Collaborative Research Infrastructure Strategy awarded $20 million to the Population Health Research Network (PHRN) to establish national capacity for data linkage in Australia.2 The PHRN also received more than $30 million in direct and indirect support from each of the states and territories. Data linkage will be conducted within nodes operating in each individual State3-5 (see www.phrn.org.au), as in the model of Western Australia, with the Centre for Data Linkage, a national network, to co-ordinate the activities of the state-based groups. Although the linkage of data sources generates valuable information and improves data quality, it may create additional biases and methodological concerns.6 To link data sets accurately requires stable and sufficiently unique identifiers. However, unique identifiers are not always available due to ethical and privacy constraints.

Different methods are used for standardising data and linking data sets and the choice of these may influence the quality of results.7,8 A systematic review of the accuracy of probabilistic linkage processes found the sensitivity (i.e. the proportion of truly linked records detected) ranged from 74% to 98%.9 The authors noted that this variation was likely to be due to the number and quality of fields available for linkage. Where there is low sensitivity of linkage processes, differential inaccuracies in the data may result in systemic bias. Linkage rates vary by participants’ age,10 gender,11,12 ethnicity,13 health status, regional location14,15 and socio-economic status.13 These variations may affect the conclusions of research studies.16 Factors compromising the quality of studies using linked data should be reported in a systematic way to allow readers and researchers to accurately appraise different studies. Reporting guidelines have improved the quality of information in other areas of research by highlighting the shortcomings and prompting improvements in the quality of published studies.17 Currently, there is no tool available to appraise the quality of studies using data linkage.

doi: 10.1111/j.1753-6405.2011.00741.x Submitted: September 2010

Revision requested: March 2011

Accepted: April 2011

Correspondence to: Dr Megan Bohensky, Centre for Research Excellence in Patient Safety, Department of Epidemiology, Monash University, 99 Commercial Road, Level 6 Alfred Centre, Prahran, Victoria 3181; e-mail: [email protected]

486

AUSTRALIAN AND NEW ZEALAND JOURNAL OF PUBLIC HEALTH © 2011 The Authors. ANZJPH © 2011 Public Health Association of Australia

2011 vol. 35 no. 5

Data Quality

Reporting guidelines for studies involving data linkage

Because the different data linkage nodes in Australia are subject to different privacy legislation and organisational structures, there is the potential that they may need to utilise differing methods and identifiers to link data in each jurisdiction. The development of standardised guidelines for reporting and appraising the quality of linkages within each node is timely and can help achieve greater national consistency in linkage results, especially where jurisdictional data will be aggregated and considered at a national scale. The Centre for Data Linkage is making concerted efforts to harmonise these data for national analyses. This study aimed to develop and validate reporting guidelines for evaluating the methodological quality of studies using linked data.

Methods A modified Delphi process was used to gain agreement about the contents of reporting guidelines.18 This method has been employed in the development of reporting guidelines for other types of research studies, including the CONSORT statement for the reporting of randomised controlled trials.19 There were three stages to the reporting guideline development process: 1) a literature review; 2) a Delphi process, incorporating an informant consultation process and two Delphi voting rounds; and 3) a tool validation process.

Literature review The literature review6 summarised articles that identified quality issues with data linkage studies published from 1991 to 2007. Thirty-three articles met the inclusion criteria from which four domains and 26 items were identified that addressed issues of data linkage reporting quality. The reasons for and the nature of bias that arose from unlinked records were summarised and forwarded to the participants in the consultation process.

Consultation process The key informant consultation process was conducted with 10 experts selected through purposive sampling of Australian experts in a range of fields related to data linkage. Participants were asked to review the domains summarised from the literature review and advise if additional domains and items, not previously identified, should be included. Participants suggested a fifth domain focusing solely on the variables to be used within the research study and an additional 21 items to be added to the preliminary list. The final list of domains and items was pilot tested by three independent researchers for face and content validity.

Validation The validation process randomly selected a sample of 25 from the 75 eligible articles (impact factors ranging from 0 to 15.7 grouped into impact factor quintiles). The majority of articles were from Australia, North America, United Kingdom and Scandinavian countries. Two researchers (MB and CB) applied the guidelines to the de-identified articles to rate how well each item was reported within the article (not applicable, poorly addressed, adequately addressed, well addressed). The median number of items rated as ‘well addressed’ by at least one reviewer in each article was six (range: 1-12). There was not strong evidence of a relationship between impact factor and the summary rating of items for each study (r=0.20). The proportion agreement of the validation process was 71% and the kappa score was 0.6. Domain-specific kappa scores were as follows: existing data sources k=0.4; researcher selected variables and data preparation k=0.5; technology and analysis of linked data k= 0.8; and ethical review k=0.9. Ethics approval for this study was received from the Monash University Standing Committee on Ethical Research in Humans.

Results

Delphi process Two Delphi voting rounds were undertaken and participants’ identities were kept anonymous. Before the Delphi voting rounds, all participants were given a background summary report of the project and literature review. The Delphi survey process included participants who had data linkage experience as researchers, technicians or users of data

2011 vol. 35 no. 5

linkage studies. Twelve (60%) of the 20 invited experts agreed to take part in the Delphi process. The disciplinary backgrounds of participants were: 17% biostatisticians, 33% epidemiologists, 17% registry administrators, 25% clinician academics, 8% computer scientists and 8% journal editors. The ratings from the first round were summarised quantitatively and qualitatively. The median group score, range and proportion of participants rating the item at eight or above were calculated for presentation in the second round. An a priori decision was made to include items with a panel median score of eight or higher and with a high level of agreement based on the Rand/UCLA Appropriateness Method User’s Manual for strict agreement (A7R).18 Inter-percentile Range Adjusted for Symmetry (IPRAS) scores, which are a measure of score dispersion adjusted for panel symmetry, were used to determine the level of agreement for each item. After round one, 20 of the 47 items were ranked within the ‘included’ range according to Delphi criteria. In the second round, participants were asked to re-rate items taking into consideration the findings from the first round. The final list of ‘accepted’ items and ‘threshold’ items (where nine or more people rated eight or above and there was a high level of agreement) were circulated to participants for review and comment after the end of the second round. Following round two, 14 items remained.

The Delphi consensus process identified and validated reporting guidelines including four domains and 14 reporting items (presented in Table 1). The final list of items incorporated six (43%) items from the domain on data sources, demonstrating the importance of having high-quality existing data systems to conduct high-quality linkage research. Of the 14 items, four (29%) items were from the domain on

AUSTRALIAN AND NEW ZEALAND JOURNAL OF PUBLIC HEALTH © 2011 The Authors. ANZJPH © 2011 Public Health Association of Australia

487

Bohensky et al.

Article

researcher-selected variables related to the researcher’s consideration of the quality of specific variables to be used. This domain also demonstrates the importance of understanding the quality of existing data sources and specific variables to be used for linked analysis. There were three (21%) items included from the domain concerning the technology and analysis of the linked data and one (7%) item from the domain on ethics, privacy and data security was included in the final list. Review of the written comments indicated that most participants who gave a low score to items in the domain on ethics, privacy and data security concluded that if the study had the approval of an ethics committee they could assume that the other items had been addressed (e.g. security of data was maintained at all times and consent to the linkage was obtained or not required). This underscores the importance of ethical review and the governance of research using linked data, especially if participant consent was not provided for the collection or linkage of the data initially. IPRAS scores in the second round showed agreement on all of the items that met the criteria for acceptance and high agreement on low rated items.

Discussion This study is the first to develop guidelines for appraising quality issues in studies using data linkage. This is an important endeavour as more studies and reports using linked data are being published. It is expected that these guidelines could be utilised by authors, data linkage analysts and reviewers as a basis for understanding the quality of their data sets, the linkage process and the possible limitations of the associated findings. The guidelines are intended to serve as a general framework. It is not expected that every item will apply to each study. For example, many studies will not be

able to establish the specificity of their linkage results (i.e. if a one-to-one link is not expected, it may be difficult to quantify the number of false negatives). Nonetheless, the guidelines assist in identifying where assumptions about the accuracy or quality of data have been made. Given the systematic investigation, the expertise of the Delphi participants and broad disciplinary representation, this study offers an exploratory basis for developing an accepted list of reporting criteria for studies using data linkage. The validation findings demonstrate that the criteria considered important by the experts are not consistently reported in the literature, with a median of only six items reported in the selected studies. This highlights an important gap that should be addressed. The differences between the criteria considered important by the experts and those consistently reported in the literature may relate to the fact that many researchers utilising linked data are not familiar with the issues that can impact linkage quality, especially where data are linked by a third party, such as a data linkage centre. Having standardised guidelines will help to highlight these concerns for researchers and readers of data linkage studies. There are several limitations of this study. First, Delphi methods that include only anonymous voting components have been criticised for not including expert group discussion. However, we modified this process by including the initial group interactions before the voting. The high levels of agreement precluded the need for further direct discussions. Although the total number of participants in this project was small, all participants had publications in the area of expertise and took part in both rounds of the process, so participant drop-out did not influence our findings. The reliability of the application of these guidelines was moderate with a kappa statistic of 0.6.20 While the overall kappa score reflects moderate agreement, this is consistent with other critical appraisal tools tested for reliability

Table 1: Reporting guidelines for studies using data linkage. DOMAIN 1: The existing data sources to be linked (complete for each dataset to be linked in the study) Dataset 1

Dataset 2

Dataset 3

Dataset 4

1. Purpose of the dataset was given 2. Description of the type of dataset (administrative/ clinical registry/ research study) was specified 3. Any standardised coding system/data dictionary used should was stated 4. % population coverage 5. Data collection methods were described 6. Data quality assurance process described, including the frequency of checks DOMAIN 2: Researcher-selected variables and data preparation 7. Were the participant inclusion criteria specified? 8. Were the variables used for linkage stated (including rates of missing data)? 9. Were changes to the coding systems reported (including changes over time or revisions to disease/risk factor definitions revised during the study period)? 10. Were potential sources of bias adequately described? DOMAIN 3: Technology/linkage process 11. Was the intended precision of the linkage stated? 12. Was a description of the linkage method given (e.g. deterministic, probabilistic methods including use of blocking and phonetic coding, if used) and a justification for the use of this type of linkage provided? 13. Was a measure of the quality of the linked data-sets provided (e.g. % linked records, false positive/false negative rates)? DOMAIN 4: Ethics, privacy, data protection and access arrangements 14. Did the study receive approval from a human research ethics committee?

488

AUSTRALIAN AND NEW ZEALAND JOURNAL OF PUBLIC HEALTH © 2011 The Authors. ANZJPH © 2011 Public Health Association of Australia

2011 vol. 35 no. 5

Data Quality

Reporting guidelines for studies involving data linkage

(inter-rater agreement 0.6 to 0.7, ranging from 0.12 to 0.95).21,22 The poorest agreement was in the domain on existing data sources, which may reflect the number of items and complexity of this domain, especially if studies involved more than two data sources. The domain which considered whether ethics approval had been received had the highest level of agreement. The use of the Delphi process ensures a high degree of face validity. A relationship was not found between impact factor and the number of items that were considered to be well addressed, but impact factor is a debated proxy measure for scientific quality.23 This study may also have been under-powered to detect such a relationship. Nevertheless, the tool is intended to serve as a general guide to authors and reviewers in relation to aspects of their study of data linkage. It is likely that other topical content will also be taken into consideration during the authorship and review of the article, and high-quality linkage methods do not necessarily imply that the research is of high scientific merit. As we restricted the process to 12 participants, this process could be expanded in the future to include participants from outside Australia and public health disciplines.

Conclusion This study is the first to develop reporting guidelines for appraising the quality of studies using data linkage. It is fundamental that high quality and systematic data linkage methods be supported to advance research. As studies relying on linked data are becoming more prevalent in the Australian research community with the development of national data linkage capacity, these guidelines may assist authors and reviewers in producing high-quality research.

Acknowledgements The authors gratefully acknowledge the members of the Delphi panel and their colleagues Dr Cameron Willis, Sacha Höttje and Christine Moje for their helpful review and feedback on drafts of this manuscript. Megan Bohensky received funding for her PhD through an Australian Postgraduate Award.

2011 vol. 35 no. 5

References 1. Winglee M, Valliant R, Scheuren F. A case study in record linkage. Surv Methodol. 2005;31:3-11. 2. National Collaborative Research Infrastructure Strategy. Strategic Roadmap for Australian Research Infrastructure. Canberra (AUST): Commonwealth Department of Innovation, Industry, Science and Research; 2008. 3. Holman CDA, Bass AJ, Rouse IL, Hobbs MST. Population-based linkage of health records in Western Australia: development of a health services research linked database. Aust N Z J Public Health. 1999;23:453-9. 4. Centre for Health Record Linkage. Guide To Health Record Linkage Services. Version 1.3; Sydney (AUST): Cancer Institute NSW; undated. 5. Deapartment of Health. Victorian Data Linkages [Internet]. Melbourne (AUST): State Government of Victoria; 2011 [cited 2011 Mar]. Available from: http:// www.health.vic.gov.au/vdl/index.htm 6. Bohensky M, Jolley D, Sundararajan V, Evans S, Scott I, Brand C. Data linkage: a powerful tool with potential problems. BMC Health Serv Res. 2010;10:346. 7. Yancey WE. Evaluating string comparator performance for record linkage. In: Research Report Series (Statistics #2005-05) [Internet]. Washington (DC): U.S. Census Bureau, Statistical Research Division; 2005 [cited 2010 Nov]. Available from: http://www.census.gov/srd/papers/pdf/rrs2005-05.pdf 8. Gomatam S, Carter R, Ariet M, Mitchell G. An empirical comparison of record linkage procedures. Stat Med. 2002;21:1485-96. 9. Silveira DP, Artmann E. Accuracy of probabilistic record linkage applied to health databases: systematic review. Rev Saude Publica. 2009;43:875-82. 10. Li B, Quan H, Fong A, Lu M. Assessing record linkage between health care and Vital Statistics databases using deterministic methods. BMC Health Serv Res. 2006;6:48. 11. Bopp M, Minder CE. Mortality by education in German speaking Switzerland, 1990-1997: Results from the Swiss National Cohort. Int J Epidemiol. 2003;32:346-54. 12. Dunn KM, Jordan K, Lacey RJ, Shapley M, Jinks C. Patterns of consent in epidemiologic research: evidence from over 25,000 responders. Am J Epidemiol. 2004;159:1087-94. 13. Adams MM, Wilson HG, Casto DL, et al. Constructing Reproductive Histories by Linking Vital Records. Am J Epidemiol. 1997;145:339-48. 14. Gyllstrom ME, Jensen JL, Vaughan JN, Castellano SE, Oswald JW. Linking birth certificates with Medicaid data to enhance population health assessment: methodological issues addressed. J Public Health Manag Pract. 2002;8:38-44. 15. Hoving JL, Monaco A, MacFarlane E, et al. Methodological issues in linking study participants to Australian cancer registries using different methods: lessons from a cohort study. Aust N Z J Public Health. 2005;29:378-82. 16. O’Reilly D, Rosato M, Connolly S. Unlinked vital events in census-based longitudinal studies can bias subsequent analysis. J Clin Epidemiol. 2008;61:380-5. 17. Plint AC, Moher D, Morrison A, et al. Does the CONSORT checklist improve the quality of reports of randomised controlled trials? A systematic review. Med J Aust. 2006;185:263-7. 18. Fitch K, Bernstein SJ, Aguilar MS, et al. The RAND/UCLA Appropriateness Method User’s Manual [Internet]. Santa Monica (CA): Rand; 2001 [cited 2010 Sep]. Available from: http://www.rand.org/pubs/monograph_reports/MR1269/ index.html 19. Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet. 2001;357:1191-4. 20. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159-74. 21. Hartling L, Ospina M, Liang Y, et al. Risk of bias versus quality assessment of randomised controlled trials: cross sectional study. BMJ. 2009;339:b4012. 22. Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S. Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin Trials. 1995;16:62-73. 23. Grzybowski A. The journal impact factor: how to interpret its true value and importance. Med Sci Monit. 2009;15:SR1-4.

AUSTRALIAN AND NEW ZEALAND JOURNAL OF PUBLIC HEALTH © 2011 The Authors. ANZJPH © 2011 Public Health Association of Australia

489

Suggest Documents