Rasch Validation and Predictive Validity of the Action Research Arm

0 downloads 0 Views 282KB Size Report
Mar 14, 2012 - Research Arm Test in Patients Receiving Stroke Rehabilitation ..... Displace 2.25-cm alloy tube from one side of table to the other. 0.92. 4.27.
1039

ORIGINAL ARTICLE

Rasch Validation and Predictive Validity of the Action Research Arm Test in Patients Receiving Stroke Rehabilitation Hui-fang Chen, PhD,* Keh-chung Lin, ScD, OTR,* Ching-yi Wu, ScD, OTR, Chia-ling Chen, MD, PhD ABSTRACT. Chen H-F, Lin K-C, Wu C-Y, Chen C-L. Rasch validation and predictive validity of the Action Research Arm Test in patients receiving stroke rehabilitation. Arch Phys Med Rehabil 2012;93:1039-45. Objective: To validate the internal construct and predictive validity of the Action Research Arm Test (ARAT). Design: Secondary study. Setting: Seven medical centers. Participants: Patients with stroke (N⫽191). Interventions: Not applicable. Main Outcome Measure: The internal construct validity of the ARAT score at pretreatment was examined using Rasch analysis. The predictive validity was examined by the correlations between performance on the ARAT before treatment and scores on the Wolf Motor Function Test, the Motor Activity Log, and the Stroke Impact Scale after treatment. Results: The 4-point ARAT scale had a disordered rating scale structure. Further Rasch modeling suggested revising the original 4-point scale into a 3-point scale. The 19 items measured 1 construct. The item difficulty hierarchy indicated that excluding the gross subtest, a score of 3 on the first item of any other subtest indicated the highest motor ability, and a score of 1 (the revised lowest rating) on the second item indicated the lowest motor ability. Tasks of “place hand behind head” and “place hand on top of head” showed poor item fit and item bias relevant to participants’ ages. The ARAT items can reliably separate participants into 5.44 strata. Moderate to good correlations indicated good predictive validity. Conclusions: The ARAT possesses good psychometric properties in stroke patients with mild to moderate motor severity and without severe cognitive impairment, and has evidence of unidimensionality, predictive validity, and reliability. The revised 3-point rating scale is recommended when the ARAT is administered on this population. The “place hand behind head”

From the Department of Occupational Therapy and Graduate Institute of Behavioral Sciences, College of Medicine, Chang Gung University, Taoyuan (H.-F. Chen, Wu); School of Occupational Therapy, College of Medicine, National Taiwan University, Taipei (Lin); Division of Occupational Therapy, Department of Physical Medicine and Rehabilitation, National Taiwan University Hospital, Taipei (Lin); Department of Physical Medicine and Rehabilitation, Chang Gung Memorial Hospital, Taoyuan (C.-L. Chen); and the Graduate Institute of Early Intervention, Chang Gung University, Taoyuan (C.-L. Chen), Taiwan. *Hui-fang Chen and Keh-chung Lin contributed equally to this manuscript. Supported in part by the National Health Research Institutes (grant nos. NHRIEX100-9920PI and NHRI-EX100-10010PI), the National Sciences Council (grant nos. NSC 97-2314-B-002-008-MY3 and NSC 99-2314-B-182-014-MY3), and the Healthy Ageing Research Center at Chang Gung University (grant no. EMR PD1A0891) in Taiwan. No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit on the authors or on any organization with which the authors are associated. Correspondence to Ching-yi Wu, ScD, OTR, Dept of Occupational Therapy and Graduate Institute of Behavioral Sciences, Chang Gung University, 259 Wen-Hwa 1st Rd, Kwei-Shan Tao-Yuan, Taiwan, e-mail: [email protected]. Reprints are not available from the author. In-press corrected proof published online on Mar 14, 2012, at www.archives-pmr.org. 0003-9993/12/9306-01092$36.00/0 doi:10.1016/j.apmr.2011.11.033

and “place hand on top of head” tasks misfit the Rasch model’s expectations. Future studies are needed in the use of the ARAT on stroke patients with different levels of motor severity or with cognitive impairment. Key Words: Motor activity; Rehabilitation; Reproducibility of results; Stroke; Upper extremity. © 2012 by the American Congress of Rehabilitation Medicine PPROXIMATELY 80% OF stroke patients experience A motor impairment of the upper limb that limits their ability to perform activities of daily living (ADLs). The over1

2

all goal of training after stroke is to assist an individual to return to a normal life. A wide range of upper extremity (UE) interventions, such as constraint-induced movement therapy (CIMT)3 and bilateral arm training (BAT),4 have been used to improve the use of the affected UE in daily life5 and functional ability6 in stroke patients. However, the choice of valid and reliable outcome measures for examining intervention effects is still challenging. Researchers and clinical practitioners have advocated that existing measures should be tested empirically and improved if necessary.7,8 There is a pressing need to ascertain whether the assessment tools commonly used in poststroke UE recovery research can accurately quantify impairment and characterize recovery. The Action Research Arm Test (ARAT), a widely used measure of UE motor function after stroke,9,10 was derived from the Upper Extremity Function Test11 by Lyle.12 The ARAT includes 4 subtests: grasp, grip, pinch, and gross motor. Positive psychometric evidence supported the ARAT as a good tool for assessing recovery of UE function in stroke patients.8,13,14 However, scant data exist on the appropriateness of the 4-point scale, item difficulty hierarchy, and the predictive validity of the ARAT.15 List of Abbreviations ADLs AOU ARAT BAT CIMT DIF MAL MnSq PCA QOM SIS UE WMFT WMFT-FAS WMFT-TIME Zstd

activities of daily living amount of use Action Research Arm Test bilateral arm training constraint-induced movement therapy differential item functioning Motor Activity Log mean square principal component analysis quality of movement Stroke Impact Scale upper extremity Wolf Motor Function Test functional ability scale of the Wolf Motor Function Test performance time of the Wolf Motor Function Test Z-score standardized fit statistic

Arch Phys Med Rehabil Vol 93, June 2012

1040

VALIDATION OF THE ACTION RESEARCH ARM TEST, Chen

A limitation of the ARAT is that the 4-point scale is arbitrarily defined, and little empirical evidence on the appropriateness of using the 4-point scale is available.15,16 The ARAT is scored as 0 (no movement possible), 1 (can partially perform the test), 2 (can complete the test but takes an abnormally long time or has great difficulty), and 3 (movement performed normally). One study proposed a standardized approach to administering the ARAT materials and rules for scoring,16 but researchers have not examined if the 4-point scale has a logical scale structure that sufficiently reflects patients’ UE function; that is, patients with lower overall UE motor capacity should obtain lower scores on certain items than participants with higher average ability in UE motor function. The proposed item difficulty hierarchy has not been well documented by taking both scoring categories and task into consideration. Lyle12 proposed that obtaining a score of 3 on the first item of a subtest requires the highest UE motor function, whereas obtaining a score of 0 on the second item requires the least ability. The results of previous studies10,17 that used nonparametric item response theory to estimate the average difficulty of tasks partially supported the proposed item difficulty hierarchy.12 However, nonparametric methods are often less powerful than parametric methods such as Rasch analysis. Information of the required UE ability for obtaining a specific score on a task to validate Lyle’s12 suggestions is not available to date. Although 1 previous study10 also conducted Rasch analysis to examine the hierarchical properties of the ARAT, their findings indicated that the 19-item ARAT constituted a unidimensional construct measuring UE function in stroke patients, but only 4 of 19 items fit the Rasch model’s expectations. The problematic item fit might result from the 4-point scoring system or potential factors that might influence responses on the ARAT after controlling patients’ UE function, termed as differential item functioning (DIF). To establish the hierarchical properties and the internal construct validity, it is necessary to reexamine the appropriateness of the 4-point scale and study if stroke patients with similar levels of UE impairment might respond differently on some UE tasks depending on certain clinical or demographic characteristics. There is little evidence in the motor literature to support the predictive validity of the ARAT. Previous investigations7,18 that used basic ADLs assessments as the criterion (eg, the Functional Independence Measure19 and Barthel Index20) found weak to moderate correlations. These assessments do not measure UE function but rather measure how patients are able to compensate with the nonparetic arm for the UE deficit.21 Therefore, these findings did not provide strong evidence of the predictive validity. The predictive validity of the ARAT can be more adequately investigated by using health outcome measures specific to hand function (eg, the Wolf Motor Function Test [WMFT]),22 ADLs relevant to UE motor and physical function (eg, the Motor Activity Log [MAL]),23 and quality of life (eg, Stroke Impact Scale [SIS]).24 The present study used Rasch analysis to examine whether the ARAT is a valid measure at the scoring category level and item level and examined factors—time since stroke onset and age—that could affect item difficulty. We also investigated the predictive validity of the ARAT. METHODS Participants Data were drawn from patients who were enrolled in our ongoing research investigating the efficacy of CIMT3 and BAT.4 These patients were screened and recruited from 7 Arch Phys Med Rehabil Vol 93, June 2012

participating sites. The inclusion criteria were (1) first-ever stroke, (2) Brunnstrom stage ⱖII for the proximal UE,25 (3) no cognitive impairment (Mini-Mental State Examination score ⬎21),26 (4) no excessive spasticity at any joint of the arm (Modified Ashworth Scale score ⬍2),27 and (5) no severe physical conditions and medical problems. The exclusion criteria were (1) excessive pain in any joint that might limit participation in rehabilitation, (2) participation in experimental rehabilitation or drug studies within the past 6 months, and (3) balance problems that could cause danger when patients wore the constraint mitt. Institutional review board approval was obtained from all study sites and all participants signed a consent form. Interventions and Procedures Eligible participants were randomly assigned to receive CIMT, BAT, or the control intervention for 90 to 120 minutes every weekday for 3 to 4 weeks. The CIMT restricted use of the unimpaired arm and participants received intensive training of the impaired arm for 2 hours every weekday. Participants also wore a restrictive mitt for 6 hours daily outside of therapy sessions during the treatment period. The BAT group was encouraged to use the affected and unaffected UEs simultaneously. The control group intervention primarily focused on neurodevelopmental techniques, including stretching of the affected arm, strength, hand function and coordination, unilateral and bilateral task practice, and compensatory strategy learning. The interventions were provided by 3 certified occupational therapists. Before the participants were evaluated, the 3 evaluators received intensive training provided by the senior investigators, and all of them completed a demonstration and written tests of competency. Researchers and evaluators met regularly to discuss the possible problems encountered in the administration of the clinical evaluations. The 3 evaluators were blinded to the groups, and the same rater administered the clinical evaluation at baseline and after the 3- to 4-week intervention for each participant. Instruments The ARAT12 was the major outcome measure. The ARAT consists of 19 items, and a total score of 57 indicates normal performance. We chose 3 criterion measures to examine the predictive validity of the ARAT: the WMFT,22 a widely used measure for assessing UE motor function; the MAL,23 a measure of ADLs used for assessing self-perceived use of the limb in important daily activities; and the SIS,24 which documents the overall health status in stroke patients. The WMFT consists of 17 tasks, including 15 performance and 2 strength tasks. Performance tasks are evaluated using a performance time of the Wolf Motor Function Test (WMFTTIME) scale and a 6-point functional ability scale of the Wolf Motor Function Test (WMFT-FAS). Studies have reported sound psychometric properties of the WMFT in stroke patients.18,28,29 The MAL uses a semistructured interview in which patients report self-perceived amount of use (AOU) and the quality of movement (QOM) for the affected UE in 30 ADLs during a specified period. Each task is scored on a scale of 0 to 5 points. The MAL has adequate reliability and excellent validity.30,31 The SIS (version 3.0) consists of 59 items providing comprehensive information about health status in stroke patients and has sound psychometric properties.32 We chose scores for hand function and the composite physical domain, comprising the sum of scores in strength, hand function,

1041

VALIDATION OF THE ACTION RESEARCH ARM TEST, Chen

ADLs/instrumental ADLs, and mobility domains, as predictive outcomes in this study. Data Analyses We used Winsteps 3.70 softwarea to investigate the internal construct validity using the Rasch rating scale model and SPSS 17.0b for other statistical analysis. Rasch analysis. The Rasch measurement model offers several advantages over the traditional psychometric approaches. These include rating scale diagnostics to examine if the 4-point scale is used as intended, estimates of person ability to obtain a specific score on an ARAT task as well as item difficulty hierarchy, and DIF analysis. Rasch analysis provides mean square (MnSq) and Z-score standardized (Zstd) fit statistics to indicate how much the residuals vary relative to their expected variance as calculated by the model. Two types of unexpected ratings are summarized: outfit which is more sensitive to unexpected observations by persons on items that are relatively very easy or very hard for them, and infit which is more sensitive to unexpected patterns of observations by persons on items that are roughly targeted on them.33 Rasch analysis assumes that all items constitute a single construct. To test the unidimensionality, a principal component analysis (PCA) of residuals was used to examine if the Rasch dimension explained at least 50% of the variance.34 Rating scale diagnostics were used to examine the appropriateness of the 4-point scale. We used this diagnostic tool to calculate the average measures of persons choosing each rating category to detect idiosyncratic category usage. The criteria of a rating scale structure included35: at least 10 responses per rating category; the average measure of each rating category increases as the rating value increases; outfit MnSq of each rating category of less than 2; and a regular observation distribution of categories. If a rating category failed to meet these criteria, collapsing the rating category would be considered. We used the infit statistics to evaluate the item because the infit statistics are sensitive to responses on items that are closely matched to a participant’s UE function. An item with an MnSq higher than 1.5 and Zstd higher than 2.0 is considered as a misfit.36,37 An item with an MnSq and Zstd exceeding 2, which degrades the whole measurement, would be removed.33 Two factors examined that could affect the item difficulty were age (ⱖ65 and ⬍65y) and the time since stroke onset (ⱖ12 and ⬍12mo). An item has a noticeable DIF when the DIF contrast (the difference between the mean UE motor function of 2 groups) is above 0.5 logits34 and the t statistic with a Bonferroni multiple-comparison correction was significant (P⫽.05/ 19 ⱌ .003). The item-person map was used to understand the relation between item difficulty and person ability. When the average UE motor function is relatively close to the average item difficulty and the range of item difficulty covers a substantial range of participants’ UE motor function, the ARAT is sensitive enough to detect the difference in UE motor function among patients. Test reliability was assessed, including person separation, person (separation) reliability, and Cronbach alpha. Person separation estimates how many strata the ARAT can divide participants into. Person (separation) reliability is equivalent to traditional test reliability. For person separation reliability and Cronbach alpha, a value above .80 represents a good level of reliability and one exceeding .90 indicates an excellent level.38 Examination of predictive validity. This study examined whether the total ARAT score before treatment could predict the total scores of the WMFT-TIME, WMFT-FAS, MALAOU, MAL-QOM, SIS-hand, and SIS-physical in patients

Table 1: Characteristics of the 191 Participants Characteristic

Value

Age (y) Sex Women Men Side of hemiplegia Left Right Stroke type Hemorrhage Infarction Ischemia Unknown Months since stroke Inpatient Outpatient Pretreatment evaluations ARAT Posttreatment evaluations WMFT-TIME WMFT-FAS MAL-AOU MAL-QOM SIS-hand function SIS-physical domain

55.17⫾11.14 51 (26.7) 140 (73.3) 99 (51.8) 92 (48.2) 70 (36.6) 86 (45) 31 (16.2) 4 (2.1) 17.19⫾15.29 42 (22) 149 (78) 35.78⫾16.7 89.43⫾84.13 52.34⫾12.37 1.42⫾1.05 1.59⫾1.12 50.61⫾29.28 64.76⫾15.72

NOTE. Values are mean ⫾ SD or n (%).

after treatment. The associations were examined with Spearman rank correlation coefficients (␳). Predictive ability of the measures was considered low for correlations 0 to 0.25, fair for 0.25 to 0.5, moderate to good for 0.5 to 0.75, and good to excellent for those above 0.75.39 RESULTS The present study included 191 stroke patients who completed the ARAT, WMFT, MAL, and SIS before and after treatment. The mean time since stroke onset ⫾ SD was 17.19⫾15.29 months, and 50.3% were classified at the chronic stage (onset ⱖ12mo). Table 1 lists their demographic and clinical characteristics. Sixty-five participants were in the CIMT group, 56 in the BAT group, and 70 in the control group. The 3 groups did not show significant differences in age, time since stroke onset, and the ARAT baseline scores (F⫽.03–.89, P⬎.05). Rasch Analysis The sample size required to achieve stable item calibrations ranges from 108 to 243 subjects for an accuracy of ⫾0.5 logit at a 99% confidence interval.40 The sample size in the present study (N⫽191) was thus enough to achieve stable item calibrations. A residual PCA showed 72.5% of the variance explained by the Rasch dimension, indicating the 19-item ARAT constitutes a unidimensional construct. Disordering of the threshold measure was detected, indicating the 4-point scale could not effectively differentiate patients and showed the redundancy of 4 rating categories. The scoring category 1 was used in 7% of our samples compared with categories 0 in 20%, 2 in 36%, and 3 in 37%. From the disordering of the step difficulties and response frequency on each category of the items, we decided that categories 0 and 1 were collapsed, and the 2 and 3 ratings were retained. Reanalysis showed that the new 3-point rating scale met all essential criteria and funcArch Phys Med Rehabil Vol 93, June 2012

1042

VALIDATION OF THE ACTION RESEARCH ARM TEST, Chen Table 2: Fit Statistics of the Revised ARAT Item

Item Number

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Item

Grasp subscale Block, 10cm3 Block, 2.5cm3 Block, 5cm3 Block, 7.5cm3 Cricket ball Sharpening stone Grip subscale Pour water from one glass to another Displace 2.25-cm alloy tube from one side of table to the other Displace 1-cm alloy tube from one side of table to the other Put washer over bolt Pinch subscale Ball bearing, held between ring finger and thumb Marble, held between index finger and thumb Ball bearing, held between middle finger and thumb Ball bearing, held between index finger and thumb Marble, held between ring finger and thumb Marble, held between middle finger and thumb Gross subscale Hand to behind the head Hand to top of head Hand to mouth

tioned properly. The revised data were analyzed in the subsequent Rasch analysis. Table 2 lists the estimates of item difficulty, SEs, infit MnSq, and infit Zstd for the revised ARAT. The tasks “place hand behind head” and “place hand on top of head” revealed poor fit (MnSq⫽1.71–1.72; Zstd⫽5.0 –5.4). Significant DIFs related to age were detected in these 2 items. The older group (ⱖ65y) scored significantly higher, on average, than the younger group (⬍65y). No significant DIF related to the time since stroke onset was found. The tasks “place hand behind head” and “place hand on top of head” did not meet the Rasch model’s expectations but did not exceed the criterion for removal. The empirically derived order of the 19 items showed that the highest person ability was required for all subtests to obtain the highest rating (the revised rating category 3) in the first item within each subtest (see table 2). For the subtests of grasp, grip, and pinch, to obtain the lowest rating (the revised rating category 1) in the second item required the least person ability in UE motor function, whereas the third item within the gross subtest required the least person ability. Figure 1 shows that on average, the most difficult task was “ball bearing, held between middle finger and thumb,” and the easiest task was hand to mouth. Tasks requiring greater finger extensions (eg, grasping a 10cm3 block) or a greater range of movement (eg, hand behind the head) were rated more difficult than tasks requiring a small degree of finger extension (eg, grasping a 2.5cm3 block) or less range of movement (eg, hand to mouth). The overall average UE function at pretreatment was .27 logits, which was very close to the mean of item difficulty (see fig 1), indicating that the difficulty of items targeted well the participants’ UE motor function. A person separation index of 3.83 indicated that the ARAT items separated the 191 participants into 5.44 statistically distinct ability levels (strata)41 by UE motor function. Person reliability (.94) and Cronbach alpha (.97) confirmed high reliability of the ARAT. Arch Phys Med Rehabil Vol 93, June 2012

Difficulty (average measure)

Infit

Overall

Rating 1

Rating 2

Rating 3

MnSq

Zstd

1.59 ⫺1.32 ⫺0.95 ⫺0.21 ⫺0.54 ⫺0.60

–2.12 ⫺4.75 ⫺4.34 ⫺3.48 ⫺3.90 ⫺4.02

1.05 ⫺0.84 ⫺0.57 ⫺0.15 ⫺0.35 ⫺0.34

4.65 3.32 3.49 3.84 3.76 3.72

1.14 0.66 0.64 0.70 0.65 0.62

1.3 ⫺3.3 ⫺3.5 ⫺2.8 ⫺3.3 ⫺3.7

1.24 ⫺0.92 ⫺1.01 0.08

⫺2.43 ⫺4.27 ⫺4.10 ⫺3.12

0.75 ⫺0.50 ⫺0.48 ⫺0.11

4.74 3.45 3.23 4.13

0.86 0.82 0.98 0.93

⫺1.3 ⫺1.6 ⫺0.1 ⫺0.6

⫺0.01 0.89 2.26 ⫺0.71 0.72 1.89

⫺1.70 ⫺3.46 ⫺2.45 ⫺3.13 ⫺1.89 ⫺2.53

1.27 ⫺0.52 0.36 ⫺0.16 1.16 0.31

5.18 3.35 4.37 3.95 4.75 4.14

0.86 0.98 1.23 1.01 1.02 1.09

⫺1.2 ⫺0.1 0.2 0.2 0.2 0.9

0.75 ⫺0.42 ⫺2.72

⫺2.48 ⫺3.10 ⫺5.60

0.65 ⫺0.14 ⫺1.44

3.88 3.32 2.22

1.72 1.71 1.37

5.4 5.0 2.9

Predictive Validity The predictive validity of the ARAT before treatment (table 3) had fair correlations with the composite physical domain of the SIS after treatment (␳⫽.45). Correlations were moderate to good between the ARAT and the WMFT-TIME (␳⫽–.66), MAL-AOU, MAL-QOM, and SIS-hand function (␳⫽.58 –.66). The relationship between the ARAT and the WMFT-FAS was good to excellent (␳⫽.76). DISCUSSION Our examination found that the item difficulty hierarchy is consistent with Lyle’s12 suggestions. The revised 3-point scale could be used to assess UE motor function in stroke patients with moderate to mild impairment in UE motor ability given that 96% of our samples had mild to moderate UE dysfunction. Except for the “place hand behind head” and “place hand on top of the head” tasks, the remaining 17 items met the Rasch model expectations; the 2 items with misfit need more investigation before being removed from the ARAT test. The moderate to excellent predictive validity and high reliability shows the ARAT is a good tool for assessing mild to moderate UE dysfunction in stroke patients. Our analyses revealed a redundancy of the original 4-point scale. The scoring category 1 was infrequently used in our samples (7%), compared with other categories (20%–37%). Hence, we combined scoring categories 1 and 0 to create 3 new scoring categories. As a result of our modifications to the response scale, the revised rating categories were labeled as 1, can perform no part of test or partially perform the test within 60 seconds; 2, completes test, but takes an abnormally long time (5– 60s) or has great difficulty; and 3, performs test normally within 5 seconds. To the best of our knowledge, this study is the first to take scoring categories and item information together to validate and support the proposed ARAT administration rules.12 After

1043

VALIDATION OF THE ACTION RESEARCH ARM TEST, Chen

6

5

4

3

2

1

0

-1

-2

-3

-4

-5

-6

PERSON - MAP - ITEM | ######## + | | | ###### | + | | ### | | . ### + | | . ## | # | S+ # | ## | .# |T | Ball bearing, held between middle inger and thumb # + ## | Marble, held between middle inger and thumb ### | Block 10 cm3 # | . # | S Pour water from one glass to another ## + . # | Hand behind the head; Marble, held between index inger and thumb . # | Marble, held between ring inger and thumb . #### | . # M| ##### +MBall bearing, held between ring inger and thumb; Put washer over bolt . ## | Block 7.5 cm3 . # | Hand to top of head #### | Cricket ball, Sharpening stone ###### | Ball bearing, held between index inger and thumb ## + Displace 2.25-cm tube; Block, 5 cm3; Displace 1-cm tube . ## | S # | Block, 2.5 cm3 .# | ## | .# + | . # S| T . | # | Hand to mouth + ## | ## | | ## | + | . ## | | | + ### T | | | | . ### + |

Fig 1. The column of numbers to the left is logit. The symbol “#” to the left of the center line represents 2 participants’ UE motor function. The symbol “.” to the left of the center line represents a participant’s UE motor function. The most able people and the most difficult items are at the top, and vice versa. Items plotted along the center line are based on the average difficulty of the items. Abbreviations: M, mean; S, SD; T, 2 SDs.

the gross subtest was excluded, we found that within each subtest, the average measure of category 3 in the first item was the highest, whereas the lowest value was in category 1 (the

revised lowest rating) of the second item. That is, a patient rated 3 in the first item was capable of performing the remaining tasks and was very likely to obtain 3 in the remaining tasks within that subtest. A participant who obtained 1 (the revised lowest rating) in the second task was very likely to obtain the same ratings on all tasks within that subtest. Therefore, the evaluator can follow the rules to administer the 3 subtests of grasp, grip, and pinch. Item fit statistics and DIF analyses conducted on the revised 19 ARAT items of the 4 subtests suggested that 2 items of the gross subscale, “place hand behind head” and “place hand on top of head,” needed further investigations. Most ARAT tasks require only forward flexion for performing the task. These 2 tasks, however, involve upward flexion, which requires a larger degree of forearm flexion and a smaller degree of forearm pronation compared with other tasks42 and might reflect a different aspect of UE motor function. Also, the young group (⬍65y) had 3 times more participants than the older group (ⱖ65y), which further increased the DIF between these 2 groups. Future studies can create a balanced sample size in the 2 groups to see if these 2 items are still more difficult for younger patients (⬍65y) than for older patients before creating and administering 2 different forms based on the patient’s age. This study showed differential findings from the study of Koh et al.10 The discrepancy might mainly be due to participants’ clinical characteristics and the revised scoring categories. In the Koh study, UE motor dysfunction was severe in 50% of participants (ARAT⬍5), moderate in 33%, and mild in 17% (ARAT⬎51). In our study, the ARAT total score was below 5 in 4.2%, between 5 and 51 in 70.7%, and exceeded 51 in 25.1% of our patients. Most of Koh’s patients had severe to moderate motor severity, with a lower median baseline ARAT of 5 (interquartile range, 0 – 40), whereas most of our samples had moderate to mild motor severity, with a median ARAT of 38 (interquartile range, 25–52). Second, their study did not explore all possibilities to collapse scoring categories. Our study used the revised 3-point scale to efficiently reflect patients’ UE function and evaluated its measurement properties in stroke patients. Future studies may examine the appropriateness of the revised scoring categories (ie, the revised 3-point scale) in patients with severe or severe to moderate UE dysfunction to understand if the ARAT using the 3-point scale is more appropriate for stroke patients with specific levels of UE motor ability. The ARAT has the precision to differentiate patients with different UE motor function after stroke. The item-person map indicated that the difficulty of items targeted well the participants’ UE function. Item difficulty range covered a substantial distance on the targeted construct, and the revised scoring categories of individual tasks can capture different levels of UE ability in stroke patients. Also, the ARAT could divide our samples into at least 5 groups by the level of their UE motor ability and had good predictive validity for UE function, ADLs, and quality of life. These findings, consistent with previous

Table 3: Predictive Validity of the ARAT Pretreatment

Posttreatment

Scale

WMFT-TIME

WMFT-FAS

MAL-AOU

MAL-QOM

SIS-Hand

SIS-Physical

ARAT Spearman ␳ 95% CI

–0.66* ⫺0.57 to ⫺0.73

0.76* 0.70 to 0.82

0.62* 0.53 to 0.70

0.66* 0.58 to 0.74

0.58* 0.49 to 0.67

0.45* 0.33 to 0.56

Abbreviation: CI, confidence interval. *P⬍.001.

Arch Phys Med Rehabil Vol 93, June 2012

1044

VALIDATION OF THE ACTION RESEARCH ARM TEST, Chen

research in the ARAT, supported conclusions that the ARAT is sensitive and able to distinguish individuals with stroke.16,18 Study Limitations The generalizability of our findings is limited to patients with mild to moderate UE motor dysfunction and to patients without severe cognitive impairment. Patients with other characteristics might perform differently on the ARAT, particularly the rating categories. Future studies can recruit patients with severe or severe to moderate severity of UE motor function, or individuals with severe cognitive dysfunction, to see if the revised 3-point scale can be used in patients with different levels of severity. Further investigations are warranted to study the predictive validity of the ARAT against outcomes at a long-term follow-up (eg, 6 and 12mo or even longer after intervention). A measure with strong longitudinal validity not only helps plan a design for the optimal development of individually tailored treatments but also elucidates the time-course related effects of rehabilitation treatment. In addition, functional image and kinematics have been used to evaluate the efficacy of rehabilitation techniques.43,44 Changes in functional image and kinematic performance can serve as criterion measures to examine the predictive validity of the ARAT. The correlations between the ARAT baseline scores and functional imaging and kinematic performance after treatment need to be studied in future research. Third, this study enrolled patients with a wide range of time since stroke onset (range, 1– 88mo) to broadly elucidate patients’ UE motor function before and after stroke rehabilitation. Longitudinal studies have suggested a nonlinear pattern of recovery of neurologic impairment and disability over time,1 implying that stroke patients might perform differently on the ARAT at various times. To validate our findings, future studies may recruit a larger sample to study patients differing in time poststroke onset. CONCLUSIONS The study advanced knowledge of the utility of ARAT in the scoring scale, administration rules, and predictive validity in stroke rehabilitation. The findings support decision rules in the ARAT administration suggested by Lyle,12 where once a participant scores the highest rating (3) in the first item of a subtest, the remaining items are skipped and scored 3, and if a participant scores the lowest rating in the second task of a subtest, the remaining items are skipped and scored the lowest rating. Use of the revised 3-point scale is recommended when the ARAT is administered in patients with mild-to-moderate UE motor dysfunction after stroke. The tasks “place hand behind head” and “place hand on top of head” might assess an aspect of UE motor function that is distinct from the remaining 17 items. Given the high reliability and satisfactory predictive validity, the ARAT is a useful measure for assessing motor function of stroke survivors during recovery or treatment course. References 1. Nakayama H, Jorgensen HS, Raaschou HO, Olsen TS. Recovery of upper extremity function in stroke patients: the Copenhagen Stroke Study. Arch Phys Med Rehabil 1994;75:394-8. 2. Clarke PJ, Black SE, Badley EM, Lawrence JM, Williams JI. Handicap in stroke survivors. Disabil Rehabil 1999;21:116-23. 3. Taub E, Uswatte G, Pidikiti R. Constraint-induced movement therapy: a new family of techniques with broad application to physical rehabilitation—a clinical review. J Rehabil Res Dev 1999;36:237-51. Arch Phys Med Rehabil Vol 93, June 2012

4. Whitall J, Waller SM, Sorkin JD, et al. Bilateral and unilateral arm training improve motor function through differing neuroplastic mechanisms: a single-blinded randomized controlled trial. Neurorehabil Neural Repair 2011;25:118-29. 5. Wu CY, Lin KC, Chen HC, Chen IH, Hong WH. Effects of modified constraint-induced movement therapy on movement kinematics and daily function in patients with stroke: a kinematic study of motor control mechanisms. Neurorehabil Neural Repair 2007;21:460-6. 6. McCombe Waller S, Liu W, Whitall J. Temporal and spatial control following bilateral versus unilateral training. Hum Mov Sci 2008;27:749-58. 7. Lin JH, Hsu MJ, Sheu CF, et al. Psychometric comparisons of 4 measures for assessing upper-extremity function in people with stroke. Phys Ther 2009;89:840-50. 8. Platz T, Pinkowski C, van Wijck F, Kim IH, di Bella P, Johnson G. Reliability and validity of arm function assessment with standardized guidelines for the Fugl-Meyer test, Action Research Arm Test and Box and Block test: a multicenter study. Clin Rehabil 2005;19:404-11. 9. Rowland TJ, Cooke DM, Gustafsson LA. Role of occupational therapy after stroke. Ann Indian Acad Neurol 2008;11:99-107. 10. Koh CL, Hsueh IP, Wang WC, et al. Validation of the Action Research Arm Test using item response theory in patients after stroke. J Rehabil Med 2006;38:375-80. 11. Carroll D. A quantitative test of upper extremity function. J Chronic Dis 1965;18:479-91. 12. Lyle RC. A performance test for assessment of upper limb function in physical rehabilitation treatment and research. Int J Rehabil Res 1981;4:483-92. 13. van der Lee J, Degroot V, Beckerman H, Wagenaar R, Lankhorst G, Bouter L. The intra- and interrater reliability of the Action Research Arm Test: a practical test of upper extremity function in patients with stroke. Arch Phys Med Rehabil 2001;82:14-9. 14. Hsieh CL, Hsueh IP, Lin PH. Inter-rater reliability and validity of the Action Research Arm Test in stroke patients. Age Ageing 1998;27:107-13. 15. McDonnell M. Action Research Arm Test. Aust J Physiother 2008;54:220. 16. Yozbatiran N, Der-Yeghiaian L, Cramer SC. A standardized approach to performing the Action Research Arm Test. Neurorehabil Neural Repair 2008;22:78-90. 17. van der Lee JH, Roorda LD, Beckerman H, Lankhorst GJ, Bouter LM. Improving the Action Research Arm Test: a unidimensional hierarchical scale. Clin Rehabil 2002;16:646-53. 18. Hsieh YW, Wu CY, Lin KC, Chang YF, Chen CL, Liu JS. Responsiveness and validity of three outcome measures of motor function after stroke rehabilitation. Stroke 2009;40:1386-91. 19. Keith RA, Granger CV, Hamilton BB, Sherwin FS. The Functional Independence Measure: a new tool for rehabilitation. Adv Clin Rehabil 1987;1:6-18. 20. Wade DT, Collin C. The Barthel ADL index: a standard measure of physical disability. Int Disabil Stud 1988;10:64-7. 21. Duncan PW, Samsa G, Weinberger M, et al. Health status of individuals with mild stroke. Stroke 1997;28:740-5. 22. Wolf SL, Lecraw DE, Barton LA, Jann BB. Forced use of hemiplegic upper extremities to reverse the effect of learned nonuse among chronic stroke and head-injured patients. Exp Neurol 1989; 104:125-32. 23. Taub E, Miller NE, Novack TA, et al. Technique to improve chronic motor deficit after stroke. Arch Phys Med Rehabil 1993; 74:347-54. 24. Duncan P, Bode RK, Lai SM, Perera S; Glycine Antagonist in Neuroprotection Americans Investigators. Rasch analysis of a new stroke-specific outcome scale: the Stroke Impact Scale. Arch Phys Med Rehabil 2003;84:950-63.

VALIDATION OF THE ACTION RESEARCH ARM TEST, Chen

25. Brunnstrom S. Movement therapy in hemiplegia. New York: Harper & Row; 1970. 26. Folstein MF, Folstein SE, McHugh PR. “Mini-mental state”: a practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res 1975;12:189-98. 27. Bohannon R, Smith M. Interrater reliability of a modified Ashworth scale of muscle spasticity. Phys Ther 1987;67:206-7. 28. Fritz SL, Blanton S, Uswatte G, Taub E, Wolf SL. Minimal detectable change scores for the Wolf Motor Function Test. Neurorehabil Neural Repair 2009;23:662-7. 29. Lin KC, Hsieh YW, Wu CY, Chen CL, Jang Y, Liu JS. Minimal detectable change and clinically important difference of the Wolf Motor Function Test in stroke patients. Neurorehabil Neural Repair 2009;23:429-34. 30. van der Lee JH, Beckerman H, Knol D, de Vet HC, Bouter LM. Clinimetric properties of the Motor Activity Log for the assessment of arm use in hemiparetic patients. Stroke 2004;35:1410-4. 31. Uswatte G, Taub E, Morris D, Light K, Thompson PA. The Motor Activity Log-28: assessing daily use of the hemiparetic arm after stroke. Neurology 2006;67:1189-94. 32. Duncan PW, Wallace D, Lai SM, Johnson D, Embretson S, Laster LJ. The Stroke Impact Scale version 2.0: evaluation of reliability, validity, and sensitivity to change. Stroke 1999;30:2131-40. 33. Linacre JM. What do infit and outfit, mean-square and standardized mean? Rasch Meas Trans 2002;16:878. 34. Linacre JM. A user’s guide to Winsteps Ministep: Rasch-model computer programs. Chicago: Winsteps.com; 2010. 35. Linacre JM. Optimizing rating scale category effectiveness. J Appli Meas 2002;3:21.

1045

36. Linacre JM. Rasch power analysis: size vs. significance: standardized chi-square fit statistics. Rasch Meas Trans 2003;17:918. 37. Linder HY, Linacre JM, Hermansson LM. Assessment of capacity of myoelectric control: evaluation of construct and rating scale. J Rehabil Med 2009;41:467-74. 38. Salter K, Jutai JW, Teasell R, Foley NC, Bitensky J, Bayley M. Issues for selection of outcome measures in stroke rehabilitation: ICF activity. Disabil Rehabil 2005;27:315-40. 39. Colton T. Statistics in medicine. Boston: Little, Brown and Co; 1974. 40. Linacre JM. Sample size and item calibration stability. Rasch Meas Trans 1994;7:328. 41. Fisher WJ. Reliability statistics. Rasch Meas Trans 1992;6:238. 42. Magermans DJ, Chadwick EK, Veeger HE, van der Helm FC. Requirements for upper extremity motions during activities of daily living. Clin Biomech (Bristol, Avon) 2005;20:591-9. 43. Zhao Y, Shi YX, Tian FH, Yang KH. Modified constraint-induced movement therapy versus traditional rehabilitation in patients with upper-extremity dysfunction after stroke: a systematic review and meta-analysis. Arch Phys Med Rehabil 2011;92:972-82. 44. Chen YP, Duff M, Lehrer N, et al. A novel adaptive mixed reality system for stroke rehabilitation: principles, proof of concept, and preliminary application in 2 patients. Top Stroke Rehabil 2011; 18:212-30. Suppliers a. Winsteps, PO Box 811322, Chicago, IL 60681. b. SPSS Inc, 233 S Wacker Dr, 11th Fl, Chicago, IL 60606.

Arch Phys Med Rehabil Vol 93, June 2012

Suggest Documents