The current issue and full text archive of this journal is available at www.emeraldinsight.com/0952-6862.htm
IJHCQA 22,6
Designing and implementing an Australian and New Zealand intensive care data audit study
572 Received 27 November 2007 Revised 28 February 2008 Accepted 29 April 2008
Jacqueline Martin Department of Epidemiology & Preventive Medicine, ANZICS CORE Critical Care Resources, Carlton, Australia and Monash University, Melbourne, Australia
Peter Hicks Intensive Care Department, Wellington Hospital, Wellington, New Zealand and Australian & New Zealand Intensive Care Society (ANZICS), Research Centre for Critical Care Resources (ARCCCR), Carlton, Australia
Catherine Norrish Geelong Hospital, Geelong, Australia
Shaila Chavan and Carol George ANZICS Adult Patient Database, Carlton, Australia
Peter Stow Geelong Hospital, Geelong, Australia, and
Graeme K. Hart Department of Intensive Care, Austin Hospital, Heidelberg, Amsterdam, and Australian & New Zealand Intensive Care Society (ANZICS), Research Centre for Critical Care Resources (ARCCCR,) Carlton, Australia
Abstract
International Journal of Health Care Quality Assurance Vol. 22 No. 6, 2009 pp. 572-581 q Emerald Group Publishing Limited 0952-6862 DOI 10.1108/09526860910986849
Purpose – The aim of this pilot audit study is to develop and test a model to examine existing adult patient database (APD) data quality. Design/methodology/approach – A database was created to audit 50 records per site to determine accuracy. The audited records were randomly selected from the calendar year 2004 and four sites participated in the pilot audit study. A total of 41 data elements were assessed for data quality – those elements required for APACHE II scoring system. Findings – Results showed that the audit was feasible; missing audit data were an unplanned problem; analysis was complicated owing to the way the APACHE calculations are performed and 50 records per site was too time-consuming. Originality/value – This is the first audit study of intensive care data within the ANZICS APD and demonstrates how to determine data quality in a large database containing individual patient records. Keywords Auditing, Patient care, Databases, Hospitals, Australia, New Zealand Paper type Technical paper
Introduction Establishing large clinical databases is increasing world-wide. Data from these databases or registries are used by healthcare providers, hospital staff, clinical researchers and Government staff (Peabody et al., 2004; Hogan and Wagner, 1997). The Australian and New Zealand Intensive Care Society (ANZICS) Adult Patient Database (APD) was established in 1992 and receives data from more than 131 hospitals throughout Australia and New Zealand (Stow et al., 2006). These data now represents more than 700,000 individual patient records. For data to be useful, they must be high quality (Arts and De Keizer, 2002). A number of measures are necessary to ensure high data quality and these include data dictionaries, training tools, appropriate software controls over data entry fields and rigorous data error trapping when uploading to the central database. To ensure that data submitted and subsequently reported are accurate, a data audit was required. The APD managers developed a peer review system that allows contributing intensive care units (ICUs) to collect an agreed core dataset. A number of data quality reports are inbuilt into the software created and distributed by the APD. This software, Australasian Outcomes Research Tool for Intensive Care (AORTICq), is provided free of charge to Australian and New Zealand ICUs. Local analytic and reporting tools within the user software allow on site validation. Not all sites use the ANZICS-supplied AORTICq database. Data submitted to the central database are further error-checked and cleaned. Peer review reports allow contributors to assess their unit’s performance against bi-national, regional or similar sized Australian and New Zealand units, using mortality prediction programs for intensive care. These include the Acute Physiological and Chronic Health Evaluation (APACHE) II (Knaus et al., 1985) and III (Knaus et al., 1991) scoring systems. To assist good quality data collection, the APD has a data dictionary (ANZICS Adult Patient Database, 2007) that defines the core dataset, which has been incorporated into the Australian National Health Data Dictionary 2007. Aim The aim of our pilot audit study was to develop and test a model to examine existing APD data quality. We hoped that the pilot study would identify difficulties in the audit model design, data collection and analysis so that the audit model could be modified and used for a larger number of contributing ICUs. The larger audit study will provide information on the most common data collection or data entry problems, as well as provide a data quality index. Method We did a literature search to determine existing auditing medical registry protocols and knowledge. A pilot audit study looking at quality in the Project IMPACT database examined 45 records per site (Cook et al., 2002); the Australian and New Zealand Paediatric Intensive Care Registry (ANZPIC) audited 50 records per site (Norton and Slater, 2005) and another study examining data quality in a Dutch intensive care registry audited 20 records per site (Arts et al., 2003). Consequently, we decided that the APD pilot study would audit 50 records per ICU. Four pilot study sites were involved: the Northern Hospital and St Vincent’s Public Hospital in Melbourne; the Nepean Hospital and Westmead Hospital in Sydney. Staff in one site collected data
Designing a data audit study
573
IJHCQA 22,6
574
automatically from their electronic patient data system; staff in other sites collected their data using paper collection forms and then entered data manually into a database. Records of 50 adult patients admitted into each ICU between January 1 and December 31 2004, were randomly selected from the APD. A data audit program was designed by one of the authors (PH) using Microsoft Access, based on another database (Norton and Slater, 2005). A randomly selected dataset consisting of 50 records submitted by site staff to the APD were imported into the program by an ANZICS staff member (JM). Data extracted from the APD included demographic, diagnostic and physiology information (including 11 demographic variables; six chronic health variables; four Glasgow Coma Score variables and 18 physiologic variables) making up the APACHE II dataset (Knaus et al., 1985) (Table I). Data were re-extracted by an independent data auditor from another participating ICU. At the end of data entry, the program compared two data sets without showing the original data to avoid bias. The person responsible for data collection at each site was asked to be a data auditor at another site in the pilot study. Data auditors received training on how to use the audit database and the way to audit each record. The patients’ medical records were retrieved from the Hospital Records Department as the primary source document. A previous study (Arts et al., 2003) describes the use of a structured interview to assist the data audit process. A structured interview was designed to identify the ICU members responsible for data collection, which in turn made it easier to correlate data quality. Two types of collectors were defined, dedicated and general. Dedicated collectors may be nurses, ward clerks or doctors. These collectors have specific Hospital admission date Hospital admission time ICU admission date ICU admission time ICU discharge date ICU discharge time ICU admission source Reason for ICU admission Age Sex Elective admission? Chronic health evaluation Immunocompromised by disease Immunocompromised by treatment Respiratory disease Cardiovascular disease Liver disease Renal disease
Table I. Audit data set
Glasgow Coma Score (GCS) Verbal response score Motor response score Eye-opening response score Total Glasgow Coma Score (GCS)
Acute physiology parameters Core temperature Heart rate Respiratory rate Plasma bicarbonate Plasma sodium Plasma potassium Plasma creatinine Plasma urea Acute renal failure Haematocrit White cell count Mean arterial blood pressure Systolic blood pressure Diastolic blood pressure Blood gases Fraction of inspired oxygen Partial pressure oxygen Partial pressure carbon dioxide Arterial pH Intubated – yes/no Ventilated – yes/no
training in the collection process and are solely responsible for collected data. General collectors are personnel in the intensive care unit who have minimal data collection process training. These were often junior medical or administrative staff. There have been studies examining the effect of operator training or expertise on the risk of death and standardised mortality ratio using APACHE II (McHugh, 2004; Ledoux et al., 2005; Arts et al., 2003). These studies show a correlation between data quality variability and inadequately trained or inexperienced data collectors (Ledoux et al., 2005; Arts et al., 2003; Polderman et al., 1999). Audit data analysis was planned at three levels, reflecting the way the APACHE II method works (Table II): (1) Tolerance levels were assigned to each continuous data variable (e.g. þ /- 5 beats/min for heart rate) and differences between audit and original data were compared to the tolerance level. Because the central APD data contained only the worst variable (rather than the highest and lowest), a significant difference between audit and original data was counted as present or absent but not quantified. (2) Severity scores (0-5) for each variable in the APACHE II data set were compared and quantified between audit and original data. (3) Total APACHE II score and APACHE II predicted risk of death were compared.
Designing a data audit study
575
The audit was summarised as: . the number of data fields with differences within each patient record; . the number of differences within a specific field for all the records; . the total of the absolute value of severity score differences; . the size of difference of total APACHE scores; and . the difference in total number of predicted deaths. Results Our pilot audit study identified some issues that need to be considered and addressed prior to the full audit study. When designing the pilot study it was decided that 50 records per site should be audited. However, the audit process was time-consuming The following data values are discrete and were assigned a score if they were present: Acute renal failure and Chronic Health Evaluation. However, for physiological values, the highest and lowest scores were extracted from the patients’ charts. The value with the highest score was selected, see table below for an example using heart rate (HR) HR
High abnormal range
Score Value
þ4 $ 180
þ3 140-179
Low abnormal range þ2 110-139
0 70-109
þ2 55-69
þ3 40-54
þ4 #39
Notes: The total APACHE II score is the sum of the scores of all of the worst values. Death risk is calculated using the total APACHE II score and the APACHE II diagnosis weighting (reason for the patient’s admission to ICU). Source: Extracted from the APACHE II Severity of Disease Classification System
Table II. APACHE II scores and death risk calculation
IJHCQA 22,6
576
and took a minimum of one week to audit 50 records. As some of the auditors were employed part-time, it meant that the audit process spanned several weeks, which was disruptive for the auditors, the unit and also for the study. The full audit project is hoped to be a regular occurrence and by reducing the number of records audited per site to 25, it was deemed more likely that units and auditors would participate in what is a voluntary, unpaid activity. The auditors were able to view the original data at the completion of each record so that differences between the original and audited data viewed and comments added to the database thereby attempting to explain these differences. For example, data were missing in the paper records or paper records were difficult to read (e.g. illegible handwriting). This meant that data issues could be identified at the site whilst the audit was taking place. Some data could not be found in the patient records, an issue not envisaged in the methods and analysis. This caused problems with the audit program algorithms as a difference between audit and original data occurred when it was not a valid comparison. This resulted in an over-estimate of the percentage error calculations and underestimated the severity of illness score because a missing value was assumed to be a value in the normal range. The APACHE II diagnoses (reason for a patient’s admission to ICU) were often difficult to code. Differences and difficulties with inter-observer APACHE diagnostic coding are well recognised (McHugh, 2004; Ledoux et al., 2005). The retrospective nature of this exercise, without access to the patients’ specialists, led to concern by the auditors about coding accuracy. Where doubt existed, the auditors were able to seek assistance from intensive care specialists at the site. One auditor was not familiar with determining the admission reason code because an intensive care specialist performed this role on-site. This meant that it was not possible to determine differences in APACHE II risk of death (ROD) for this particular audited site. Data analysis was complicated and difficult to understand for investigators and for units. Data audit produced a large number of detailed summary data, which can be summarised horizontally within an individual patient’s record, or vertically just within a particular field across all records. Table III shows theoretical data comparisons for patients. Originally, we planned to compare raw data values but the process of selecting the worst value as either highest or lowest and banding values for scoring meant the difference magnitude was not meaningful. For example Patient 2 showed a large difference in heart rate (HR) values but both values produced the same APACHE II score. Patient 3 had a small difference in HR but the audit data crossed a banding point and thus changed the APACHE II score value. Each variable summary proved the most useful for establishing common problems. The horizontal summary can measure quality but does not locate problem areas. The APACHE II scores total was initially assessed but it was observed that an increase in one score item could be balanced by a decrease in another score item that resulted in the same total for Patient 3 in Table III. For this reason, the absolute value of the score differences were taken and totalled. This method identified variations in individual scores. If the total APACHE II score and subsequent ROD prediction had the same values in the audit and original data, this did not necessarily mean that the two data sets were identical. Kappa statistics were used by other authors (Ledoux et al., 2005; Goldhill and Sumner, 1998), which describe the agreement level (when two measurements are the same). The variable scores are not binary but range from 0-5.
Data ¼ 78 Score ¼ 0 Data ¼ 130 Score ¼ 2 Data ¼ 52 Score ¼ 3
Data ¼ 75 Score ¼ 0
Data ¼ 60 Score ¼ 2
Data ¼ 56 Score ¼ 2 Data ¼ 1 Score ¼ 1 Absolute score ¼ 1
Data ¼ 2 4 ¼ No Score ¼ 1
Data ¼ 70 ¼ Yes Score ¼ 0
Data ¼ 3 ¼ No Score ¼ 0
HR difference
Data ¼ 160 Score ¼ 2
Data ¼ 138 Score ¼ 0
Data ¼ 134 Score ¼ 1
Na original
Data ¼ 151 Score ¼ 1
Data ¼ 150 Score ¼ 1
Data ¼ 138 Score ¼ 1
Na audit
Data ¼ 2 Score ¼ 0 Absolute Score ¼ 2
Data ¼ 2 9 ¼ Yes Score ¼ 2 1
Data ¼ 12 ¼ Yes Score ¼ 1
Data ¼ 4 ¼ No Score ¼ 0
Na difference
Note: The heart rate (HR) threshold was set at five beats/min and the sodium (Na) threshold at four mmol/min
Summary per field
Patient 3
Patient 2
Patient 1
HR audit
HR original
Data ¼ 3 Score ¼ 1 Absolute score ¼ 3
Data differences ¼ 2 Score differences ¼ 0 Absolute score ¼ 2
Data differences ¼ 2 Score differences ¼ 1 Absolute score ¼ 1
Data differences ¼ 0 Score differences ¼ 0 Absolute score ¼ 0
Summary per record
Designing a data audit study
577
Table III. Theoretical results showing original and auditor data and the differences
IJHCQA 22,6
578
The kappa score will identify the level of agreement but it does not quantify the disagreement range. Using the score difference absolute value appears to identify the variables with the most difference. After we finished the pilot study, analysis was performed to determine data completeness and accuracy in three sites. The site with missing APACHE II diagnosis codes was excluded from the analysis. Results showed that 7.6 per cent data fields were incomplete and 23.8 per cent were incorrect. Some data fields were more likely to contain errors. These included Chronic Health Evaluation status, Glasgow Coma Score and the physiological data field mean arterial blood pressure, blood oxygen values, potassium, heart rate, haematocrit, respiratory rate, creatinine, pH, sodium, temperature, white blood cell count and age. An audit report was produced for each ICU in the study but it required an extensive explanatory note to understanding the results. Discussion Data quality is critically important in any clinical registry (Peabody et al., 2004; Hogan and Wagner, 1997; Arts et al., 2003; McHugh, 2004). Data from these registries are used for many purposes including research, quality assurance and benchmarking (Peabody et al., 2004; Arts et al., 2003; Cook et al., 2002; Goldhill and Sumner, 1998; Dreisler et al., 2001; Volk et al., 1997; Haan et al., 2004; Khwaja et al., 2002; Yoon et al., 2006). For APACHE II scores and standardised mortality ratios (SMRs) to be used as a benchmarking and comparison mechanism between similar ICUs, data must be collected consistently and accurately (McHugh, 2004; Goldhill and Sumner, 1998). Initial results showed that data audit was feasible but only with a reduced number of records audited per unit. We expected that a large proportion of raw data values would vary but that impact may be negated by summation. Chance predicts that some physiologic audit values will be higher and some lower than the original. These could produce individual score differences but they may negate each other when added together. However, it has been suggested that in some data fields such as blood pressure, very small differences in the value may make significant differences to the APACHE II score (Goldhill and Sumner, 1998). The Chronic Health Evaluation and APACHE II diagnostic codes rely on judgement from the assessor and so may be more likely to be biased in a non-random manner. Chronic Health Evaluation also scores five points, so differences in these values have a larger effect. Large databases such as the APD are being used to examine individual variables’ effects such as night-time discharge, highest glucose level and creatinine levels. It is therefore important that not only is total APACHE II scores correct but also that individual data are accurate. For the pilot study, it was believed that randomly selecting records for auditing was appropriate but an alternative is that sicker patient records would be more affected by poor data quality. This is because it was thought that entered data differences would have more impact on their scores. Thus, for the full audit study, it may be more appropriate to randomly select patient records for auditing based on a stratified patient illness severity system. The pilot study did not have enough records to verify this point. Non-random record selection makes it more difficult to extrapolate results to the whole database. It is equally possible that many variables are recorded as normal when in fact they are not so that low scoring patients may have significant errors. Analysing scores allowed us to see whether differences were uniformly spread across raw data ranges or are more likely to occur at the extremes.
With these types of audits, it is not always clear which data sets (the original or audit) are more accurate. It was expected that audit data will be more correct because the auditor was aware of the audit process, aware that there was a limited number of records to audit and were also experienced data collectors. However, external auditors were not familiar with the audit site charts and location of misplaced data sources. Additionally, they do not have contemporaneous access to specialist advice for coding information – especially diagnostic codes. The structured interview usefully determines site data collection process, which enables us to determine the relationship between data collection processes and data quality in the full audit study. It is not feasible for a large database to be error-free (Arts et al., 2003) but data quality can be improved by performing regular audits. Without an audit process, data errors and their causes will not be identified and hence data quality will not improve (Arts et al., 2003; Goldhill and Sumner, 1998; Dreisler et al., 2001; Volk et al., 1997; Haan et al., 2004; Khwaja et al., 2002; Yoon et al., 2006; Warsi et al., 2002; Fine et al., 2003; Herbert et al., 2004; Reilly et al., 2001). Implementing changes to data collection and/or entry after errors are discovered should assist sustained accurate data collection within the database. The data submitted to APD is iteratively loaded, so any corrections to original data by contributors will be progressively uploaded onto the central database. Conclusion and next steps Our pilot study showed we underestimated the time it would take and, therefore, we needed to reduce the number of records per site. Audit data analysis was more complicated than anticipated because audit data were missing. We had planned for missing audit data but had thought this would be matched by missing original data and were surprised to find otherwise. Data analysis was changed as it became apparent that our initial plans were not the best method. Unit feedback reports were difficult to understand and were revised. Despite careful planning, we found problems we hadn’t anticipated thereby reinforcing the benefits of piloting a major audit program. The next step is to run the audit program in a further 16 units and measure data quality. It has been a valuable exercise running the pilot audit to ensure we optimise the audit process. References ANZICS Adult Patient Database (2007), “APD data dictionary version 2.0.1”, available at: http:// sas.anzics.com.au/Portal/viewItem.do?com.sas.portal.ItemId¼Link%2Bomi%3A%2F %2FFoundation%2Freposname%3DFoundation%2FDocument%3Bid%3DA59GWZUF. AZ000OOX (accessed March 2007). Arts, D.G.T. and De Keizer, N.F. (2002), “Defining and improving data quality in medical registries: a literature review, case study and generic framework”, Journal of the American Medical Informatics Association, Vol. 9 No. 6, pp. 600-11. Arts, D.G.T., Bosman, R.J., de Jonge, E., Joore, J.C.A. and de Keizer, N.F. (2003), “Training in data definitions improves quality of intensive care data”, Critical Care, Vol. 7 No. 2, pp. 179-84. Cook, S.F., Visscher, W.A., Hobbs, C.L. and Williams, R.L. (2002), “Project IMPACT: results from a pilot validity study of a new observational database”, Critical Care Medicine, Vol. 30 No. 12, for the Project IMPACT Clinical Implementation Committee, pp. 2765-70.
Designing a data audit study
579
IJHCQA 22,6
Dreisler, E., Schou, L. and Admasen, S. (2001), “Completeness and accuracy of voluntary reporting to a national case registry of laparoscopic cholecystectomy”, International Journal of Quality Health Care, Vol. 1, pp. 51-5.
580
Fine, L.G., Keogh, B.E., Cretin, S., Orlando, M. and Gould, M.M. (2003), “How to evaluate and improve the quality and credibility of an outcomes database: validation and feedback study on the UK Cardiac Surgery Experience”, British Medical Journal, Vol. 7379, for the Nuffield-Rand Cardiac Surgery Demonstration Project Group, pp. 25-8. Goldhill, D.R. and Sumner, A. (1998), “APACHE II: data accuracy and outcome prediction”, Anaesthesia, Vol. 53 No. 10, pp. 937-43. Haan, C.K., Adams, M. and Cook, R. (2004), “Improving the quality of data in your database: lessons from a cardiovascular center”, Joint Commission Journal on Quality and Safety, Vol. 30 No. 12, pp. 681-8. Herbert, M.A., Prince, S.L., Williams, J.L., Magee, M.J. and Mack, M.J. (2004), “Are unaudited records from an outcomes registry database accurate?”, Annals of Thoracic Surgery, Vol. 77 No. 6, pp. 1960-5. Hogan, W.R. and Wagner, M.M. (1997), “Accuracy of data in computer-based patient records”, Journal of the American Medical Informatics Association, Vol. 4 No. 5, pp. 342-55. Khwaja, H.A., Syed, H. and Cranston, D.W. (2002), “Coding errors: a comparative analysis of hospital and prospectively collected departmental data”, British Journal of Urology International, Vol. 89 No. 3, pp. 178-80. Knaus, W.A., Draper, E.A., Wagner, D.P. and Zimmerman, J.E. (1985), “APACHE II: a severity of disease classification system”, Critical Care Medicine, Vol. 13 No. 10, pp. 818-29. Knaus, W.A., Wagner, D.P., Draper, E.A., Zimmerman, J.E., Bergner, M., Bastos, P.G., Sirio, C.A., Murphy, D.J., Lotring, T. and Damiano, A. (1991), “The APACHE III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults”, Chest, Vol. 100 No. 6, pp. 1619-36. Ledoux, D., Finfer, S. and McKinley, S. (2005), “Impact of operator expertise on collection of the APACHE II score on the derived risk of death and standardised mortality ratio”, Anaesthesia and Intensive Care, Vol. 33 No. 5, pp. 585-90. McHugh, G.J. (2004), “Quality and reliability of data collected in a regional hospital intensive care unit”, Critical Care and Resuscitation, Vol. 6 No. 4, pp. 258-60. Norton, L. and Slater, A. (2005), Report of the Australian and New Zealand Paediatric Intensive Care Registry, 2004, ANZICS, Melbourne. Peabody, J.W., Luck, J., Jain, S., Bertenthal, D. and Glassman, P. (2004), “Assessing the accuracy of administrative data in health information systems”, Medical Care, Vol. 42 No. 11, pp. 1066-72. Polderman, K.H., Thijs, L.G. and Girbes, A.R.J. (1999), “Interobserver variability in the use of APACHE II scores”, The Lancet, Vol. 353 No. 9150, p. 380. Reilly, J., Mitchell, A., Blue, J. and Thomson, W. (2001), “Breast cancer audit: adapting national audit frameworks for local review of clinical practice”, Health Bulletin, Vol. 59 No. 1, pp. 60-2. Stow, P.J., Hart, G.K., George, C., Herkes, R., McWilliam, D. and Bellomo, R. (2006), “Development and implementation of a high-quality clinical database: the Australian and New Zealand Intensive Care Society Adult Patient Database”, Journal of Critical Care, Vol. 21, for the ANZICS Database Management Committee No. 2, pp. 133-41.
Volk, T., Hahn, L., Hayden, R., Abel, J., Puterman, M.L. and Tyers, G.F. (1997), “Reliability audit of a regional cardiac surgery registry”, Journal of Thoracic and Cardiovascular Surgery, Vol. 114 No. 6, pp. 903-10. Warsi, A.A., White, S. and McCulloch, P. (2002), “Completeness of data entry in three cancer surgery databases”, European Journal of Surgical Oncology, Vol. 28 No. 8, pp. 850-6. Yoon, S.S., George, M.G., Myers, S., Lux, L.J., Wilson, D., Heinrich, J. and Zheng, Z.-J. (2006), “Analysis of data-collection methods for an acute stroke care registry”, American Journal of Preventive Medicine, Vol. 31 No. 6, pp. s196-s201. Further reading Hlaing, T., Hollister, L. and Aaland, M. (2005), “Trauma registry data validation: essential for quality trauma care”, Journal of Trauma, Vol. 61 No. 6, pp. 1400-7. Corresponding author Jacqueline Martin can be contacted at:
[email protected]
To purchase reprints of this article please e-mail:
[email protected] Or visit our web site for further details: www.emeraldinsight.com/reprints
Designing a data audit study
581