Reading Research and Instruction Readability level of ...

11 downloads 0 Views 868KB Size Report
Jan 28, 2010 -
This article was downloaded by: [Tamilnadu Agricultural Univ] On: 22 March 2014, At: 09:38 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Reading Research and Instruction Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/ulri19

Readability level of standardized test items and student performance: The forgotten validity variable a

Margaret A. Hewitt & Susan P. Homan

a

a

University of South Florida Published online: 28 Jan 2010.

To cite this article: Margaret A. Hewitt & Susan P. Homan (2003) Readability level of standardized test items and student performance: The forgotten validity variable, Reading Research and Instruction, 43:2, 1-16, DOI: 10.1080/19388070409558403 To link to this article: http://dx.doi.org/10.1080/19388070409558403

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.

Downloaded by [Tamilnadu Agricultural Univ] at 09:38 22 March 2014

This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-andconditions

Reading Research and Instruction Winter 2004, 43(2) 1-16

Downloaded by [Tamilnadu Agricultural Univ] at 09:38 22 March 2014

Readability Level of Standardized Test Items and Student Performance: The Forgotten Validity Variable

Margaret A. Hewitt Susan P. Homan University of South Florida

Abstract Test validity issues considered by test developers and school districts rarely include individual item readability levels. In this study, items from a major standardized test were examined for individual item readability level and item difficulty. The HomanHewitt Readability Formula was applied to items across three grade levels. Results of correlations of readability level and item difficulty at all three levels support the belief that the higher the item readability, the more students miss that item. A possible conclusion is that students miss items due to reading problems, not because of a lack of content knowledge. These data appear to support measuring individual item readability on standardized tests.

Standardized testing remains a powerful force in today's public schools. During the 1980s a threat from the "authentic testing" movement seemed to be mounting (Sacks, 1997; Wiggins, 1993). Alternative or authentic student assessments typically rely on samples of actual student work or judgments of student performance which are used as indices of the process of student learning, not just the end product (Resnick & Resnick, 1992; Wiggins, 1993; Wolf, 1993). However, the current focus on national and state testing provides evidence of the growing investment in standardized tests. In just 16 years, from 1980 to 1996, the Educational Testing Service's total sales jumped 256% from $106 million to $380 million. This is triple the 85% rise in consumer prices over the same period (Sacks, 1997). According to Popham (1999), educators are experiencing more and more pressure to demonstrate their effectiveness, with their chief indicator being student performance on standardized achievement tests. With this increasing emphasis on standardized tests, additional examination of all aspects of these tests appears warranted. Although many articles and books discuss the reliability and validity of

Downloaded by [Tamilnadu Agricultural Univ] at 09:38 22 March 2014

2 Reading Research and Instruction Winter 2004, 43 (2)

standardized tests (Resnick & Resnick, 1992; Sacks, 1997; Wiggins, 1989; Wolf, 1993), the issue of validity in terms of readability level of the test items is rarely mentioned. Homan, Hewitt, and Linder (1994) reported that test manuals typically do not provide information about the design principles involved in creating test items and answers, and readability formulas are rarely applied (Drum, Calfee & Cook, 1981). When test developers do consider readability, they treat the entire test as one continuous prose unit and apply a traditional readability formula to the entire unit. This results in an average readability formula for the total test. When test items are grouped together in this way, as traditional readability formulas require (Klare, 1984), individual test items will invariably be above or below the test readability average and the intended reading level (Hewitt & Homan, 1991). The problem of matching reader ability and text difficulty has been of continuing concern to educators. This issue is closely related to readability (Gilliland, 1975), which has been more broadly defined as the "comprehensibility of written text" (Hewitt & Homan, 1991). Even though questions about how to most accurately measure readability are continually raised, readability formulas are still being widely used in business, newspapers, and government (Fry, 1987). Some of the concerns raised about readability formula use are delineated by Bertram, Bolt and Newman, (1981), Berliner and Calfee, (1996), and Davison, (1981). Berliner and Calfee (1996) discuss the legitimate goal of readability formulas as that of considering factors that contribute to text comprehensibility. They believe a major problem has been the over emphasis on readability of text while ignoring other factors. We believe the situation with test items is the opposite, readability level is not given enough consideration. While we agree it shouldn't be the major consideration, we contend it should be seriously considered. The publication of The Living Word Vocabulary (Dale & O'Rourke, 1981) has made examining the readability level of a unit as small as one sentence or test item more feasible. The authors of The Living Word Vocabulary tested over 50,000 word meanings and determined a grade level at which each was familiar. The variable of word difficulty in the Homan-Hewitt readability formula is determined by the grade level familiarity level reported in The Living Word Vocabulary. Bertram et al.(1981) discuss three weaknesses of readability formulas. The first concern involves the belief that most formulas affect only sentence length and word difficulty and ignore factors such as cohesion, complexity of ideas, and required schemata. The second is the lack of accountability of reader specific factors such as interest and purpose for reading. The third concern is the lack of statistical backup for most readability formulas. The Homan-Hewitt Readability Formula for individual test items does take most of these concerns into consideration. A stepwise regression was used to develop the HomanHewitt Formula. Because this formula is intended for test items, the purpose

Downloaded by [Tamilnadu Agricultural Univ] at 09:38 22 March 2014

Readability Level of Test Items 3

and interest of the student is not of primary concern. In the present study, the social studies subtest was used because it was anticipated students had been taught and therefore were familiar with this content. We share the concern expressed by Davison (1981). She contends that if a text is being rewritten or revised to match a particular level of reading ability the changes may be made based on readability not content. Davison states that changes should be made because of inherent difficulty or problems of ambiguity not just to influence the score a text might receive from a readability formula. Many legitimate concerns about readability formula use are specific to application on continuous text but are not as relevant to readability of individual test items. The situation for test developers in this time of high stakes testing is quite different. We are concerned about items which inappropriately punish students with reading problems. We are not suggesting test developers write items to match Homan-Hewitt Readability levels. However, we would like to see missed items evaluated for readability level. We want the users of high stakes test results to be aware of the possible danger to validity if the items missed were written at an above-grade level readability. Given today's student profile, it is more important than ever to have both high reliability and validity in standardized tests. Even with the current efforts of test developers, these tests often reveal more about the quantity and quality of opportunities a student has than what a student knows (Popham, 1999). Any additional steps to improve validity would seem desireable. Drum et al. (1981) stated that, when a child is struggling to recognize words, there is diminished attention to the content of passages. They reported that any condition that increases vocabulary load in tests would depress student performance. However, these factors are rarely taken into consideration by test developers even though standardized and other test scores are sometimes used as the basis for decisions which seriously affect the lives of those being tested (Homan et al., 1994). In their article validating the Homan-Hewitt Readability Formula, Homan et al. questioned if readability level of test items was related to differential student performance. The Homan-Hewitt Readability Formula was developed for use specifically with individual test items. The validity study published in the Journal of Educational Measurement (Homan et al., 1994) describes the development and validation of the formula. The formula estimates the readability level of single sentence test items. Its initial development was based on the assumption that differences in readability level will affect item difficulty. The validation of the formula was achieved by estimating readability levels of sets of test items predicted to be written at second through eighth grade levels and then administering the tests to 782 students in grades 2 through 5.

Downloaded by [Tamilnadu Agricultural Univ] at 09:38 22 March 2014

4 Reading Research and Instruction Winter 2004, 43 (2)

After experimentation with several different models and variables as early as 1980, the current Homan-Hewitt Readability Formula was developed (Hewitt & Homan, 1989; Homan & Hewitt, 1983; Homan et al., 1994). Existing readability research provided assistance in determining the most effective predictor variables. Vocabulary load and syntactic complexity have proven to be the most robust predictors of readability (Klare, 1984). The Homan-Hewitt Readability Formula uses The Living Word Vocabulary (Dale & O'Rourke, 1981), in addition to the variable for word length, as a gauge of vocabulary load. Word length was added after careful reexamination of the formulas of Fry (1987), Raygor (1977), Flesch (1943), and Bjornsson (1968). Word length was measured by counting the number of words containing seven or more letters per sentence. The Living Word Vocabulary (Dale & O'Rourke, 1981) provides familiarity scores for over 50,000 words. Students in grades 4 through college level were tested on word meanings using a multiple-choice format. Scores and grade levels are listed for each word. For the Homan-Hewitt Formula, words from sentences at all grade levels had to be familiar at the 4th-grade level for at least 80% of the students. Words that did not meet the criteria were considered unfamiliar. Number of words per sentence has been the typical measure of syntactic complexity. The Homan-Hewitt formula used Hunt's Unit (1965), a measure of clauses per sentence, to measure syntactic complexity. This formula was developed using sentences from comprehension sections of standardized tests and informal reading inventories as a criterion. Both sources use standardized norming procedures rather than readability formulas to assess reading levels. Three hundred sentences were chosen (approximately 35 to 40 from each grade level 1 through 8) and individually coded for each grade level. The three predictor variables used in the current formula are (a) number of difficult words, measured by familiarity in The Living Word Vocabulary; (b) word length, measured by determining how many words have seven or more letters; and (c) sentence complexity, measured by the average number of words per Hunt's t-Unit. A stepwise regression analysis was used to select the predictor variables that would contribute the most in accounting for variation in the reading difficulty of the sentences. A linear regression was developed using the three predictor variables. The criterion variable was the readability level assigned to each of the sentences by their source (passages established by standardized norming procedures). For example, if the passage was said to be at the 3rd grade level, it was assigned a grade level of 3. From the 300 sentences in the sample, 180 were randomly selected (an equal number from each grade level) and used for the regression model. Table 1 presents the R2 and the R2 change scores for the predictor variables used in

Readability Level of Test Items 5

Table 1 Regression Results and Cross-Validation of Readability Levels for the Homan-Hewitt Readability Formula Source R2 R2 Change Regression predictors WUNF (unfamiliar words) .383 .383 WNUM (average words per T-unit) .460 .077 .496 WLON (long words) .037 R=.7O, F(3,176) = 56.84, p