Data mining and the Implementation of a Prospective ... - Google Sites

0 downloads 185 Views 128KB Size Report
The Health Care Financing Administration (HCFA), renamed the Centers for ... the course of administering the program, re
Data mining and the Implementation of a Prospective Payment System for Inpatient Rehabilitation Daniel Relles, Greg Ridgeway, and Grace Carter {relles, gregr, carter}@rand.org RAND Santa Monica, CA 90407-2138

Abstract

This paper describes the development of a new Medicare Prospective Payment System (PPS) for inpatient rehabilitation care. Congress mandated such a system in the Balanced Budget Act of 1997. To help implement this system, we assembled four years of Medicare hospitalization data, linked it to rehabilitation hospitals’ information about impairment and the functional status of patients, and developed case mix groups using the CART algorithm, a common method for determining groups in health services.

While CART readily produces simple and effective rules for prediction, it adheres to a restrictive functional form and its fitting algorithm does not necessarily produce a global optimum. We wanted to know how these limitations affect our results. So, we compared CART’s performance with methods receiving attention in the data mining community and in the statistics literature. We estimated that the CART models explained about 90 percent of the potentially explainable variance in individual cost and they predicted annual hospital costs that were essentially identical to other methods’ predictions.

Keywords: Health care financing, prospective payment, rehabilitation, regression trees, data mining

1 Introduction Partitioning patient stays into groups of homogeneous resource use is a recurring theme in health services research. Partitioning schemes such as the Diagnosis Related Groups (DRGs), Resource Utilization Groups (Fries et al., 1994), and Psychiatric Patient Classes (Ashcraft et al., 1989) are in widespread use and form the fundamental basis for reporting and resource allocation.

-1-

The Health Care Financing Administration (HCFA), renamed the Centers for Medicare & Medicaid Services (CMS) in 2001, manages Medicare’s $4.2 billion budget for inpatient rehabilitation. As a result of the Balanced Budget Act of 1997, CMS must implement a prospective payment system based on classifying patients into case mix groups. This new classification system for patients in rehabilitation must be based on empirical evidence that resource use within each case mix group is relatively constant. Also, the system must make clinical sense and provide adequate compensation to hospitals providing the services.

The overall goal of our analysis is to group together patients with similar features, such as impairment, age, and functional ability, so that resource use within that category is relatively constant. In this paper, we describe the construction of a set of Function Related Groups (FRGs) that partition the population into groups that are medically similar and that have similar expected resource needs. We measure resource use by the logarithm of wage-adjusted dollars spent from admission to discharge. To group patients we consider using 21 impairment categories, measures of motor and cognitive function, and age. We use the CART (Classification and Regression Trees) algorithm (Breiman et al., 1984) within each impairment category to develop the partitioning that best predicts cost.

In addition to simply fitting the CART models, we wished to further explore the strengths and limitations of our payment system. Even after computing an unbiased estimate of the predictive performance of a particular regression tree it is still difficult to judge how much better we might have done if we were not subject to CART’s limitations. We know that R-squared is always between 0.0 and 1.0 with higher values indicating better prediction, but when a model’s Rsquared is potentially much lower than 1.0 we need a way to judge whether CART’s performance is as best as could be expected by other competing modeling strategies. To further investigate this we compared CART’s performance with other methods that have received attention in the statistics and data mining literature: generalized additive models (GAM) and multiple adaptive regression trees (MART). Both methods are automated, flexible, and effective at fitting complex prediction formulas, and both methods have gained acceptance in the statistical and data mining communities.

-2-

Section 2 describes the study design and the data available for developing the system. Section 3 describes CART and our application of CART to the problem of determining rehabilitation FRGs. Section 4 discusses the other methods we examined to evaluate the performance of the CART and the results of that evaluation. Section 5 offers some general conclusions. A complete description of this payment system is available in Carter et al. (2001).

2 Study Data and Design The population of interest here is all Medicare patients who used inpatient rehabilitation services following an acute care stay. Our initial goal is to produce a patient-level dataset that has measures of resource use as well as medical condition information. Next, we want to group patients that are similar in terms of impairment, functional ability, and age, so that all the patients in a group have roughly the same cost. We obtained patient data for those facilities providing rehabilitation services for Medicare and used these data to stratify patients into 21 internally homogeneous clinical groups that have generally been accepted in the rehabilitation community. Then, we designed and ran a computational study to produce and evaluate a cost classification system.

2.1 Data We combined data from two sets of patient files. Medicare data provided the population frame, information on resource use, and characteristics of each rehabilitation hospital stay. Rehabilitation hospital data provide information on impairment and functional status. On the Medicare side, we examined discharge abstracts that HCFA collected on all Medicare patients in the course of administering the program, recorded for all rehabilitation hospitals. HCFA provided us with records of calendar year 1996 through 1999 discharges from the Medicare Provider Analysis and Review (MEDPAR) file. From this file we extracted information on departmental charges, age at admission, and characteristics of the stay. Payments for transfer cases and deaths are based on adjustments to the standard payments for cases discharged to the community. This paper focuses on developing the payment system for community discharged cases.

-3-

We used data on costs and charges from the Hospital Cost Reporting Information System to estimate accounting cost from the MEDPAR charge data. The method used is described in Newhouse et al. (1989). We adjusted each cost estimate for area wages using the hospital wage index from the acute care PPS.

On the rehabilitation side, we measured the functional status of individual rehabilitation patients using the Functional Independence Measure (FIM) data. The FIM is an 18-item measure covering six domains, self-care (six activities of daily living), sphincter control (two items on bowel and bladder management), mobility (three transfer items), locomotion (two items on walking/wheelchair use and stairs), communication (two items on comprehension and expression), and social cognition (three items on social interaction, problem solving, and memory). All 18 items are scored into one of seven levels of function ranging from complete dependence (1) to complete independence (7). FIM data also contain an impairment code that gives the primary reason for the rehabilitation admission. We collected FIM data from the Uniform Data System for medical rehabilitation, from the Clinical Outcomes Systems data for medical rehabilitation, and from HealthSouth Hospitals.

The MEDPAR and FIM files described the same set patients and we needed to link them in order to develop our resource use models. For privacy reasons there were no patient identifiers available to link them together. The literature on techniques for dealing with this problem is rich and we turned to a probabilistic matching technique (Jaro, 1989) to accomplish the linking. Probabilistic matching takes a set of candidate match variables (here, admission date, discharge date, age, sex, race, and zip code) and attempts to develop a linear scoring function such that scores above a certain cutoff level offer high probabilities of correct matching. Using this technique, we were able to match roughly 95% of the FIM data with patients in the MEDPAR record to form our final dataset. The development of the MEDPAR/FIM dataset is described in Relles and Carter (2002). The merged MEDPAR/FIM data contained several variables useful for modeling and classification.

Table 1 identifies these variables, and indicates at which stages of the process they were used. The selection variables define what we think of as the typical case. We exclude transfers to

-4-

hospitals and to long term care settings, deaths, cases of three days or less duration, and statistical outliers (cases that are outside the three standard deviation confidence interval in log(cost)). Also, the clinical partitioning and resource use variables needed to be present and in range. The FIM data measure functional independence in two main dimensions, the cognitive and the motor. The sum of the thirteen motor components represents an overall measure of motor ability and the sum of the five cognitive components does likewise for cognition. Case selection was based on the intersection of the rules shown in Table 2.

-5-

Table 1: MEDPAR/FIM Variables and Stages of Use Purpose

Variable AGE DISSTAY LOS IMPCD PROVCODE PROVNO

Source MEDPAR FIM† MEDPAR FIM MEDPAR MEDPAR

Description age discharge stay indicator Selection length of stay rehabilitation impairment codes provider code provider number total cost estimates, based on cost to charge TCOST MEDPAR ratios, adjusted by area wage index † IMPCD FIM impairment code Clinical Rehabilitation Impairment Category – partitioning RIC FIM† indicates one of 21 clinical groups resulting from impairment code mappings total cost estimates, based on cost to charge Resource use TCOST MEDPAR ratios, adjusted by area wage index Cognitive score total – sum of 5 components comprehension expression COGNITIVE* FIM† social interaction problem solving memory Motor score total – sum of 13 components eating grooming bathing Functional items dressing — upper body dressing — lower body toileting MOTOR* FIM† bladder management bowel management bed, chair, wheelchair transfer toilet transfer tub or shower transfer walking or wheelchair stair ascending and descending * these individual components are organized into various types of indices, according to body areas and types of impairment. Each component of the motor and cognitive scores are ordinal scales that range from 1 (complete dependence) to 7 (complete independence). Therefore the cognitive scores can range from 5 to 35 and the motor scores can range from 13 to 91. †

Patient’s FIM data came from either UDSmr, COS, or HealthSouth

-6-

Table 2: Rules for Selecting Cases Variable AGE DISSTAY LOS IMPCD, TCOST IMPCD TCOST, COGNITIVE, MOTOR

Selection requirement between 16 and 105 indicates discharged to the community more than three days, less than one year. we excluded cases with log(wage-adjusted cost) more than three standard deviations from its average within RIC contained in an impairment list for assignment to one of the 21 rehabilitation categories (see Table 4) greater than zero

Table 3 shows the effects of the selection rules in 1998 and 1999 on the number of cases available for analysis. The full population is reduced by at least a third owing mostly to the nonparticipation of hospitals in our FIM sources. Missing cost information accounted for another three percent drop, mostly from all-inclusive providers for whom separating out rehabilitation charges was not possible. About a quarter of the remaining cases were discharged someplace other than the community. Other drops in sample sizes were small. Table 3: Number of observations at each stage of selection Population Sizes Population of Medicare rehab patients Matched cases at participating hospitals With cost information With cost and FIM information Discharged to community Exclude transfers to hospitals Exclude age, cost, and LOS outliers

1998 370,352 234,622 228,622 228,248 174,011 170,270 169,816

1999 390,048 259,017 250,254 249,941 191,924 187,258 186,766

2.2 Case Stratification and Sample Sizes The first step in developing case mix groups is to partition the data into clinically similar groups, called rehabilitation impairment categories (RICs), based on the primary reason for the rehabilitation admission. Previous work had established 21 such groups within which we would be fitting models. Table 4 describes those groupings and the sample sizes available for the modeling effort according to the selection rules in Table 2. Over time the sample size increased -7-

largely due to an increase in the number of hospitals participating in the source databases. Table 4 also includes the final number of FRGs in each RIC discussed later.

After establishing the 21 RICs we fit models within each RIC predicting cost from patient features. This is equivalent to interacting RIC with all other covariates used to predict cost. Table 4: RIC Definitions, Sample Sizes, and Number of FRGs

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Rehabilitation Impairment Category 1996 1997 1998 Stroke 32,687 35,026 37,012 Traumatic brain injury 1,383 1,629 1,871 Non-traumatic brain injury 2,517 2,863 3,402 Traumatic spinal cord 738 810 930 Non-traumatic spinal cord 3,782 4,340 5,295 Neurological 4,730 5,717 7,832 Hip fracture 16,017 17,167 18,774 Replacement of lower extremity joint 31,151 37,383 40,931 Other orthopedic 5,292 6,547 8,022 Amputation, lower extremity 4,810 5,423 5,930 Amputation, other 354 477 542 Osteoarthritis 2,340 2,854 3,983 Rheumatoid, other arthritis 1,169 1,521 1,944 Cardiac 4,097 5,662 6,885 Pulmonary 2,442 3,561 4,340 Pain Syndrome 1,321 1,873 2,529 MMT, no brain or spinal cord injury 1,188 1,288 1,540 MMT, with brain or spinal cord injury 156 222 221 Guillain-Barre 240 278 299 Miscellaneous 10,097 13,398 17,423 Burns 70 103 111 Total 126,581 148,142 169,816

1999 FRGs 37,340 14 2,053 5 3,758 4 953 4 5,837 5 8,875 4 20,627 5 43,427 6 9,310 4 6,156 5 662 3 5,036 5 2,350 4 8,104 4 5,382 4 2,993 2 1,679 3 256 4 313 3 21,553 5 102 2 186,766 95

2.3 Computational Experiment The main goal of this study was to evaluate the out-of-sample predictive performance of the different prediction methods and compare them with CART. Out-of-sample evaluation is the process of fitting a model on one sample and evaluating its predictive performance on a new sample of subjects, yielding realistic estimates of the prediction error likely to be observed upon the PPS’s implementation. An important element of a payment system is whether payment formulas offer accurate prospective estimates of cost. In particular we wished to determine

-8-

whether CART was capable of capturing most of the cost information available in the predictor variables, age and functional ability.

Table 5 shows the layout of the computational experiment consisting of four experimental factors that we varied. The first factor was the algorithm used to predict cost from the patient covariates. Subsequent sections of this paper discuss the three candidate methods in greater detail. Linear least squares regression is not among these candidates since age and the functional measures almost certainly have a non-linear relationships to cost and, as expected, the method performed poorly in initial experiments. Besides, linear regression can be viewed as a special case of GAM discussed later. For each method we considered five candidate sets of predictor variables to predict cost. Table 5 lists the five candidate predictor sets in increasing order of granularity. These were the only predictors allowable at this stage. Other variables were either not acceptable (e.g. sex, marital status, wheelchair status) or set apart as adjustments to a base payment (e.g. comorbidities and facility characteristics). To validate the various predictors of log(cost) we fit the various models using the five candidate sets of predictor variables with data from each of the years and predicted in the other years. Naturally we are interested in whether we can estimate the model in, say, 1997 and accurately predict cost in 1999. We initially tried fitting separate models for each year and seeing how well they performed on all other years. This would yield 12 out-of-sample evaluations. We later improved that by observing that some RICs (e.g., 04, 11, 18, 19, 21) were quite small, and it might be advantageous to pool their data. This led to experimenting with fitting periods 1996-97 and 1998-99. Thus, Table 5 describes the full set of fits and predictions with the exception that we did not fit and predict on the same year’s data.

-9-

Table 5: The computational experiment Experimental factors Possible values for each experimental factor CART 1.Model GAM MART Age, Standard FIM Motor and Cognitive Score Age, Standard but remove transfer to tub from the Motor score Age, Standard but decompose Motor into ADLs and mobility 2. Predictor set (without tub transfer) Age, Standard but decompose Motor into transfer (without tub transfer), locomotion, sphincter, and selfcare Age, and the 18 individual FIM components 1996, 1997, 1998, 1999 3. Fitting year 1996-7, 1998-9 4. Evaluation year 1996, 1997, 1998, 1999 Section 3 will detail our use of the CART algorithm and Section 4 describes the other two modeling methods that we were considering.

3 Modeling Cost Using CART The rehabilitation PPS system is to be based on discharges classified according to function related groups. CART is the traditional method of generating FRGs (Stineman et al., 1997) and a reasonable method of determining rules to classify patients into groups that explain cost. Various algorithms have been proposed to build tree structured regression models, many of which are variations on the CART theme.

CART requires a dependent variable (here, the logarithm of wage-adjusted cost) and it seeks to develop a predictor of the dependent variable through a series of binary splits from a candidate set of independent variables. Here the predictor variables are age, the FIM motor score, and the FIM cognitive score. The FIM cognitive score is simply the sum of the five components of cognition. Section 5.1 describes in more detail our use of the FIM motor score, for which we used a sum of 12 of the 13 motor components.

The CART algorithm is recursive. First, it examines the set of independent variables and searches the dataset for a partition that best explains variation in the dependent variable. For

- 10 -

example, CART might examine the partition separating patients with motor score exceeding 50 from those with motor score less than 50. For those patients with motor scores less than 50, CART would predict the average log(cost) of all patients with motor scores less than 50. A similar prediction strategy applies to those patients with motor score exceeding 50. We can evaluate the quality of the split using the squared prediction error. CART searches amongst all variables and split points choosing the variable to split and the split point so that the new partitions minimize the estimated squared prediction error.

CART then recursively splits each partition until it satisfies a stopping criterion. Naturally we want to stop the partitioning process when prospective prediction is optimal. As a surrogate we considered 10-fold cross-validation (Breiman et al., 1984) to estimate prospective prediction error. However, this method estimated that the collection of trees from each RIC should have a total of 359 terminal nodes. This is not too surprising since CART adds nodes as long as the decrease in mean squared error seems statistically significant and with large samples even minor differences can be statistically significant. However, 359 terminal nodes means 359 FRGs, too many to administer. We took additional steps to decrease the number of nodes within each RIC. This included using the “1 standard error rule” (Breiman et al., 1984). This method effectively stops the recursion when a cross-validated estimate of the prediction error is within one standard error of the minimum estimated prediction rule. We also enforced “practical significance” on each node. The predicted costs in neighboring nodes must differ by more than $1500, merging nodes would not change the predictions by more than $1000 from their original values. With these steps we obtained a more manageable 95 terminal nodes or, equivalently, 95 FRGs.

Another policy constraint required predicted costs only to decrease with increasing levels of functional independence. Our technical expert panel believed that if CMS paid less for cases with less function, it would provide incentives that many clinicians would find unacceptable. In fact the data only rarely result in a violation of this monotonicity constraint.

- 11 -

9.107

8.901

35 30

9.079 9.251 9.345

9.481

9.589

9.791

8.926

5

10

9.674

25 20

9.909

15

Cognitive score

8.734

20

40

60

80

Motor score

Figure 1: CART predictions of log(cost) from motor and cognitive scores for N=74,352 stroke cases from 1998-9 To demonstrate we used patient data from RIC 01 (Stroke) for 1998 and 1999 combined and fit a CART model predicting log(cost) from the motor and cognitive scores. Figure 1 shows how CART partitions the data example. The lines show the partitions and the number in each partition is the average log(cost) of the patients with the associated motor and cognitive scores. Costs decrease as the shading gets darker. We can see that motor is the primary effect although at high motor scores cognitive ability can be influential. Figure 2 shows the CART model as a decision tree of the plot. Positive answers to the questions at each node traverse to the left and negative answers move to the right until we reach a prediction in one of the leaves of the tree.

- 12 -

motor

Suggest Documents