An algorithm to identify preterm infants in ...

7 downloads 17317 Views 403KB Size Report
4Department of Pediatrics, College of Medicine, University of Florida, FL, USA ... achieved sensitivity of 52.6%, specificity of 99.8%, and PPV of 91.7% in FL.
pharmacoepidemiology and drug safety (2012) Published online in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/pds.3264

ORIGINAL REPORT

An algorithm to identify preterm infants in administrative claims data† Efe Eworuke1*, Christian Hampp2, Arwa Saidi4 and Almut G. Winterstein1,3 1

Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, Gainesville, FL, USA Division of Epidemiology I, Office of Pharmacovigilance and Epidemiology, Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, Food and Drug Administration, MD, USA 3 Department of Epidemiology, Colleges of Medicine and Public Health and Health Professions, University of Florida, FL, USA 4 Department of Pediatrics, College of Medicine, University of Florida, FL, USA 2

ABSTRACT Purpose To develop and validate an algorithm to identify preterm infants in the absence of birth certificates within Medicaid data. Methods Medicaid fee-for-service claims data from Florida (FL) and Texas (TX) were linked to vital statistics data for infants who were continuously eligible during the first 3 months following birth or died within that period. Prematurity was defined as less than 34 weeks gestational age. Using FL as exploratory dataset and vital statistics birth data as gold standard, we developed a logistic regression model from diagnostic and procedure codes commonly associated with preterm care, creating a prematurity score for each infant. A score cutoff was selected that maximized sensitivity while maintaining a positive predictive value (PPV) ≥ 90%. Confirmatory analyses were conducted in the TX datasets. Results The prevalence of prematurity was 5.2% (95%CI: 5.1–5.2) and 4.5% (95%CI: 4.4–4.6) in FL and TX, respectively. Using only gestational age International Classification of Disease version 9, Clinical Modification (ICD-9-CM) codes (765.20–765.27) associated with inpatient claims achieved sensitivity of 25.7% (FL) and 12.5% (TX), specificity of 99.9% (FL) and (TX), and PPV of 91.7% (FL) and 84.8% (TX). The model had excellent discriminatory validity with a c-statistic of 0.928 (95%CI: 0.925–0.931). The selected cutoff point achieved sensitivity of 52.6%, specificity of 99.8%, and PPV of 91.7% in FL. In TX, sensitivity was 46.8%, specificity was 99.9%, and PPV was 82.2%. Conclusion Identification of prematurity based on gestational age ICD-9-CM codes is not sensitive. The prematurity score has superior construct validity and allows more comprehensive identification of preterm infants in the absence of birth certificates. Copyright © 2012 John Wiley & Sons, Ltd. key words—prematurity; sensitivity; specificity; Medicaid; gestational age; birth certificates; claims data; pharmacoepidemiology Received 11 October 2011; Revised 23 February 2012; Accepted 24 February 2012

INTRODUCTION In the USA, 12–13% of all infants are born prematurely, compared with 5–9% in Europe.1 Approximately 75% of perinatal deaths occur in infants born prematurely, with two-thirds of these deaths attributable to infants born less than 32 weeks of gestation.2 Prematurely born infants are at increased risk for neuro-developmental disabilities, such as cerebral palsy and mental retardation, and chronic lung conditions.3 Although studies to identify high-risk groups, *Correspondence to: E. Eworuke, Department of Pharmaceutical Outcomes and Policy, College of Pharmacy, University of Florida, PO Box 100496, Gainesville, FL 32610, USA. E-mail: efe1odia@ufl.edu † The views expressed are those of the authors and not necessarily those of the US Department of Health and Human Services or the FDA.

Copyright © 2012 John Wiley & Sons, Ltd.

assess prevention and investigate long-term effects of prematurity have been widely conducted,1,4–7 further research in etiology and preterm care is needed. In the few studies that have utilized administrative databases for epidemiologic research in preterm infants, researchers have relied on birth certificate data to ascertain prematurity.8,9 A key advantage of administrative data is the option to follow mothers and infants over long periods at relatively low data acquisition cost.10 Diagnostic codes, often utilized to identify target conditions, reflect provider–payer interaction rather than the underlying prevalence of the disease. To improve validity (i.e., sensitivity and specificity) of diagnostic codes, multivariate models have been developed from a variety of variables available within administrative data to quantify the

e. eworuke et al.

probability of disease. For instance, a kidney disease model developed from multiple variables improved the sensitivity by twofold when compared with the single diagnostic code for kidney disease.11 In claims data, prematurity can be identified by the International Classification of Disease version 9, Clinical Modification (ICD-9-CM) codes for gestational age, but the validity of these codes and the magnitude of bias introduced by potential misclassification remain unknown. Therefore, investigators have linked claims data with birth certificates, which provide gestational age based on the reported last menstrual period or a clinical estimate.9,12,13 However, with increasing complexity in confidentiality requirements (e.g., social security number (SSN) procurement), linkage of data sources is difficult, and the development of an algorithm to identify preterm births in the absence of birth certificates would greatly facilitate population-based studies. Such an algorithm would make claims data accessible for a wide range of research questions, including epidemiologic research on risk factors, sequelae, and lifetime burden of prematurity. In this study, we aimed to develop a claims-based algorithm to identify preterm infants and validate it against gestational age estimates from birth certificates.

METHODS Data sources and study population We utilized 1999–2004 Medicaid Analytic Extract (MAX) files for Florida (FL) and Texas (TX) to establish a birth cohort for each state. MAX data include monthly updates on eligibility for Medicaid benefits, sociodemographic information, diagnostic and procedure claims for inpatient and outpatient encounters, and pharmacy dispensing data. We required continuous eligibility for fee-for-service benefits from birth until 3 months of age or death, whichever came first. The study population included Medicaid infants linked to vital statistics birth certificate data by using SSNs and dates of birth (DOBs) to provide gestational age estimates. Therefore, the resulting study sample consisted of only live births delivered between 15 and 50 weeks. This matched cohort was also linked to vital statistics death certificates by using SSNs and dates of birth. To address missing SSN on some death certificates, we verified records with death dates in MAX that could not be linked via SSN/DOB by using first and last names and date of death. We utilized the FL cohort to develop the predictive model Copyright © 2012 John Wiley & Sons, Ltd.

for prematurity, and the TX dataset formed the confirmatory cohort. Definition of prematurity Prematurity was determined from gestational ages contained in vital statistics data. State birth certificate data are collected within 24–48 hours after birth by the hospital, which is where 99% of births in the USA are delivered.14 For development of the algorithm, we considered infants to be preterm if they were born before 34 completed weeks of gestation. Near term infants (34–36 completed weeks of gestation) have been found to be functionally similar to term infants (>37 weeks) and utilize the same care as term infants.15 Therefore, we included near term infants in the term cohort. Assessment of ICD-9-CM gestational age codes We used the ICD-9-CM codes 765.21 through 765.27 (less than 24 weeks of gestation through 33–34 completed weeks of gestation) in the inpatient claims associated with the birth hospitalization to ascertain prematurity. Using gestational ages extracted from birth certificates as gold standard, we determined sensitivity as the proportion of correctly classified preterm infants by the codes, specificity as the proportion of correctly classified full-term infants by the codes, and positive predictive values (PPVs) as the proportion of classified preterm infants who are truly preterm according to the birth certificates. Algorithm development and validation We obtained all inpatient and outpatient diagnostic and procedure codes during the first 3 months of life from the MAX data and extracted codes that appeared more frequently associated with prematurely born infants in univariate comparisons. In collaboration with the pediatric cardiologist (AS), these candidate predictors were reviewed for clinical plausibility and augmented with predictor variables based on clinical experience. The final set consisted primarily of disorders associated with extreme immaturity and respiratory conditions such as respiratory distress syndrome, bronchopulmonary dysplasia, or pulmonary hemorrhage. Procedure codes included, for example, ophthalmoscopy, neonatal and pediatric critical care, electrolyte panel assay, and theophylline immunoassay (Appendix A). In addition, we also included the total length of hospital stay associated with delivery within 90 days Pharmacoepidemiology and Drug Safety, (2012) DOI: 10.1002/pds

algorithm to identify preterm infants

of birth. We categorized length of hospital stay as follows: 0–4, 5–10, 11–20, 21–30, 31–40, and >40 days following data from the American Academy of Pediatrics and the American College of Obstetricians and Gynecologists.16 Death within 3 months of life was also considered a candidate predictor. Using the candidate predictors, we developed a multivariate logistic regression model to calculate the probability for prematurity in the exploratory cohort (FL). The model used a forward stepwise inclusion process with an a-error entry criterion of 0.30. With the beta coefficients of the multivariate model, we calculated each infant’s predicted risk of prematurity based on his covariate values. To assess the model’s ability to discriminate between preterm and term infants, we utilized the area under the receiver operating characteristics curve (c-statistic) with 95% confidence intervals.17 We also plotted the sensitivity and PPV of prematurity by prematurity scores in both FL and TX. We report the threshold score that maintained the PPV above 90%, to minimize false-positives, which oftentimes is more problematic in the research of conditions with low prevalence, such as prematurity. Next, we conducted a confirmatory analysis in the TX dataset by obtaining a prematurity score for each infant as a function of the infant’s explanatory variables using the respective beta coefficients determined in the explanatory model.

Table 1. Successful vital statistics to Medicaid matches in percent by gestational age Florida (vital statistics recorded births [1999–2006] = 1 239 148) Gestational age categories

Texas (vital statistics recorded births [1999–2006] = 2 245 630)

% matched (n)

Severe (36 weeks) Unknown

44.6 (10 104) 60.0 (12 564) 54.9 (107 589) 51.8 (1 099 919) 76.3 (8972)

Medicaid (MAX) Beneficiaries born between 01/01/1999 and 12/31/2004 FL: 818,703 TX: 1,506,355

% matched (n) 49.4 (11 047) 64.8 (14 264) 59.9 (120 944) 58.0 (1 262 022) 1.6 (837 353)

Recorded Vital Statistics births between 01/01/1999 and 12/31/2004 FL: 1,239,148 TX: 2,245,630

Matched on SSN and DOB FL: 647,462 (79% of MAX) TX: 831,889 (55% of MAX)

Fee-for-service recipients with eligibility in month of birth FL: 462,772 TX: 528,013

RESULTS From 1999 to 2004, 1 239 148 and 2 245 630 births were recorded by FL and TX vital statistics, respectively. We observed that 99.3% (FL) and 98.5% (TX) of these births had valid gestational age (15–50 weeks) recorded in the birth certificates. We were able to match 79% and 55% of all MAX births in FL and TX, respectively. The lower percentage of infants matched in TX is attributed to the higher proportion of infants with non-usable (missing or duplicate) SSN found in TX (30.5%) compared with FL (9.4%). Of these successful matches, 98.9% (FL) and 98.4% (TX) had valid gestational age information from their birth certificate. Matching performance varied with gestational age and by states (Table 1). Restricting the study cohort to patients with fee-for-service benefits including Medicaid eligibility in the birth month resulted in 462 772 and 528 013 infants in FL and TX. Finally, the requirement of at least 3 months continuous eligibility from birth and inclusion of infants who died within 3 months from their birth resulted in a sample size of 293 133 in FL and 393 758 in TX (Figure 1). Copyright © 2012 John Wiley & Sons, Ltd.

Continuous eligibility until the earliest of: age 3 months or death FL: 293,133 TX: 393,758

Figure 1. Derivation of the study cohort. DOB, date of birth; FL, Florida; MAX, Medicaid Analytic Extract; SSN, social security number; TX, Texas

Prematurity prevalence in the matched study population was similar to the overall state prevalence according to vital statistics data in Texas. In Florida, prematurity prevalence was about 1% higher in the matched Medicaid population when compared with overall birth statistics (Table 2). The mean (median) gestational age for preterm infants was 30.9 (32.0) weeks in both cohorts, FL and TX (Table 3). Other characteristics such as gender, length of hospital stay, or mortality were similar across cohorts. The most common diagnoses among preterm infants in both cohorts included respiratory distress syndrome (29.3% in preterm infants vs. 0.5% in term infants for FL), neonatal jaundice associated with preterm delivery (38.1% in preterm infants vs. 1.4% in term infants for TX), and Pharmacoepidemiology and Drug Safety, (2012) DOI: 10.1002/pds

e. eworuke et al. Table 2. Prematurity prevalence in all registered births versus in the matched study population (%) (95%CI) All births according to vital statistics (Florida) Gestational age (weeks) 15–34 15–36

Study population (Florida)

All births according to vital statistics (Texas)

Study population (Texas)

N = 1 230 176

N = 239 134

N = 1 408 277

N = 393 758

4.3 (4.2–4.3) 10.6 (10.5–10.6)

5.2 (5.1–5.2) 11.7 (11.6–11.8)

4.2 (4.1–4.2) 10.4 (10.3–10.4)

4.5 (4.4–4.5) 10.7 (10.6–10.9)

Table 3. Description of study cohort Florida

Variable Mean gestational age according to birth certificate, weeks ( standard deviation) Gender, female (%) Disorders relating to extreme immaturity of infant: Unspecified weight Less than 500 g 500–749 g 750–999 g 1000–1249 g 1250–1449 g 1500–1749 g 1750–1999 g 2000–2499 g 2500 g and over Disorders relating to other preterm infants: Unspecified weight Less than 500 g 500–749 g 750–999 g 1000–1249 g 1250–1449 g 1500–1749 g 1750–1999 g 2000–2499 g 2500 g and over Unspecified weeks of gestation Completed weeks of gestation: