Norming tests of basic reading skills

3 downloads 0 Views 1MB Size Report
lence may arise from nonlinearities in the test scores. A norming process developed for a new nonword test, the Martin and Pratt Nonword Reading Test (Martin ...
I39

Norming Tests of Basic Reading Skills James R.M. Alexander and Frances Martin University of Tasmania In developing the concepts of phonological and lexical subtypes of dyslexia, criteria have been proposed based on the projection of linear regression for raw test scores. Substantial discrepancies in subtype prevalence may arise from nonlinearities in the test scores. A norming process developed for a new nonword test, the Martin and Pratt Nonword Reading Test (Martin & Pratt, 2000), was applied to the Word Identification subtest of the Woodcock Reading Mastery Test (Woodcock, 1987) and the Regular and Irregular Word Tests published in Coltheart and Leahy (1996). These tests were administered to a representative sample of 863 children aged 6 to 15 years in the Southern TasmanianState school population. An inverse normal transform results in a distribution which is approximately normal within age groups. On this scale the age effect was well approximated by a linear increase with the logarithm of (age - 5 years). This process can be adapted to provide norms for word lists more economically and allows convenient spreadsheet formulae for norms. Substantial differences from other norms may be attributed to school district family income differences found in this sample.Male means are lower than female for all tests, but this reflects comparablehigh performance and disproportionate poor performance by males on reading tests.

I

n reading an alphabetic language such as English, children learn some general rules for converting letters or short sequences of letters to sounds (such as grapheme-phoneme rules). Competence in converting letters to sound can be demonstrated by asking them to read nonwords. They must also learn specific associations between written words and their pronunciation when these are irregular or inconsistent with similar spellings. Competence in lexical access can be demonstrated by children’s ability to read irregular words. Dual route models (Coltheart, Curtis, Atkins, & Haller, 1993) represent these two processes as the alternative ways to read a word (nonlexicaVphonologica1 and lexical respectively). Other effective models of reading, such as neural net models (e.g., Plaut, McClelland, Seidenberg, & Patterson, 1996) may incorporate these processes as matters of degree of generalisation acquired for specific strings rather than two distinct processes, but some corresponding distinction must be incorporated to match human capabilities. Castles and Coltheart (1993) applied 30-item lists of regular, irregular, and nonwords to developmental dyslexics and to normally developing readers aged 7 to 15 years. They used two approaches to defining subtypes: relying either on linear regression of nonwords and irregular words on age, or on linear regressions of nonwords and irregular words against each other. Castles and Coltheart (1993) reported that they found no evidence of departures from linearity, but their figures show ceiling effects for older children, and the age effect is unlikely to be linear over an extended range. Manis, Seidenberg, Doi, McBride-Chang, and Petersen (1996) replicated Castles and Coltheart’s (1993) approach and included a reading age match (RAM) control group. Although they used longer lists of nonword and exception (irregular) words, there is evidence of ceiling effects on the nonwords for chronological age match (CAM) controls. Using CAMs as the basis for regressions resulted in substantial proportions (about

30%) of dyslexics being classified in each subtype. When the regression criteria were based on the RAM group, however, they found 24% of the dyslexics categorised as phonological subtype and only one case (2%, and 4 % from reanalysing Castles & Coltheart’s 1993 data with their criteria) as surface subtype. From another reanalysis of Castles and Coltheart’s (1993) data, Stanovich, Siegal, and Gottardo (1997) showed that both their irregular and nonwords display significant quadratic trends against reading age. The slope is steeper at lower reading ages, corresponding to dyslexic performance levels. Using RAM regression limits, they confirmed Manis and colleagues’ ( 1996) finding that surface dyslexics identified by projecting from CAMs largely overlap with RAMS, but phonological dyslexics are distinct. This asymmetry implies a nonlinear relationship between irregular and nonwords, with a steeper increase in irregular words for older participants. The inconsistencies in the prevalence of surface dyslexia by different criteria stem partly from questionable use of regression criteria. However, if the relationships are not linear, then linear regression will not be appropriate unless the nonlinearity can be corrected by appropriate scaling. Coltheart and Leahy (1996) presented limited normative data from a Sydney sample for 30-item lists of regular, irregular, and nonwords, which differ by six items from those reported by Castles and Coltheart (1993), the Coltheart and Leahy items being apparently easier. The Coltheart and Leahy sample excluded children who were not making satisfactory progress in reading according to their teachers. For each age, they provided limits for Band A (lowest score after deleting outliers) and for Band B (2 standard deviations below the mean). Edwards and Hogben (1999) added a Perth sample to provide normalised standard scores and percentiles from 7 to 12 years for the Coltheart and Leahy (1996) word lists. These show substantial ceiling effects for the Regular Word Test. The

This work was supported by a grant from the Tasmanian Department of Education. The authors gratefully acknowledge the assistance of the schools and students involved, Professor Chris Pratt and the research assistants. W e also acknowledge the constructive comments of three anonymous reviewers. Address for correspondence:James Alexander, School of Psychology, University of Tasmania, GPO Box 252-30. Hobart TAS 7001. Australia. Email: J [email protected] Ausmlian Journalof Psychology Vol. 52. No. 3, 2000 pp. 139-148

I 40

James R.M. Alexander and Frances Marun

ceiling effect is less severe for the Irregular Word Test (17% over 90% correct at 12 years) but substantial for the Nonword Test (17% over 90% correct at 7 years, 45% by age 10). Ceiling effects are not necessarily a problem for assessing disabilities when only poor performance is of interest, and are relatively common for normal participants on clinical assessments. They do, however, create the need for extreme caution in the interpretation of parametric statistics if these are applied, especially those with strong assumptions such as linear regression or analysis of covariance. No transformation (norming) can recover the information lost through a ceiling effect for the individual cases concerned. However, it may be possible to mitigate ceiling effects on statistics by placing the group of scores concerned at the midpoint of their estimated range on an appropriate scale, rather than necessarily below this. If resources are available, it is clearly preferable to develop a full-range version of the test and to norm it on a large and representative sample. This was the approach adopted by Martin and Pratt (2000) for nonword reading. They have developed two parallel forms of 54 nonwords. Both forms are free of substantial floor and ceiling effects for children aged 6.5 to 16.0 years. This paper is conceined with the application of the norming process used for the Martin and Pratt Nonword Reading Test (Martin & Pratt, 2000) to the Coltheart and Leahy (1996) Regular and Irregular Word Tests and to the Word Identification subtest from the Woodcock Reading Mastery Tests - Revised (WRMT-R; Woodcock, 1987). Given the demonstrated (Coltheart & Leahy, 1996; Edwards & Hogben, 1999; Manis et al., 1996) ceiling effects in the Irregular Word Test for older children, and the much more severe ceiling effects for the Regular Word Test, this is clearly a substantial challenge for any norming process. Any scaling process, whether item selection for raw scores or transformation for norming a given scale, involves stipulating scaling assumptions. No empirical information is then available from statistics determined by stipulation for populations similar to the norming one. For instance, Wechsler IQs were scaled by stipulating, across the intended age range for normal participants, means (loo), standard deviations (IS), and distributions (normal) (Anastasi. 1990). While it is possible to establish how some other population differs from the norm, it is not possible from such a scale to consider whether the intended variable has a normal distribution or equal spread across ages. Such questions could, however, be addressed by the scaling adopted for the Mental Age-based IQs of the older versions of the Stanford-Binet (Anastasi, 1990). These scales only constrained means (to 100, largely by item selection). It was thus an ernpirical finding that standard deviations were approximately constant (about 16, though with reliable variations) across ages, and that distributions were approximately normal. These findings with a relatively loosely constrained scale establish that tighter constraints, as adopted for the Wechsler series, are unlikely to misrepresent IQ,but it does not follow that they will be appropriate for other variables, particularly those intended to assess more basic or specific skills. Specific skills, such as knowing the names of the letters of the alphabet, are likely to have spreads greater than zero only within the age range between those where effectively no children can name any items and where effectively all children can name all items, if this is the case. Within that range, standard deviations presumably increase, then decrease. It is not obvious that scaling to normal distributions and equal standard deviations across ages would be desirable. It may also be inappropriate to constrain normed scores to be normally distributed if there are substantial categorical differences (e.g., for sex). Discovering whether these situations apply may require some reasonably distributed initial scale.

The psychometric basis of tests of the component skills of reading has been less effective than that of IQ tests, Whereas many tests offer age-equivalent scores, these are often empirical or projected and lack the technical sophistication of the Stanford-Binet mental ages from 1916. This test recognised that when the developmental progression flattened (at about 16 years for IQ scales) it was necessary to adopt a different basis for above-average Performance. Age-equivalent scores such as those from the WRMT-R (Woodcock, 1987) are unsuitable for parametric statistics, being extremely skewed for older and younger children and not consistently defined for very high and low performance. The WRMT-R Instructional Profile plots grade-equivalents on an approximately logarithmic scale (Woodcock, 1987). In addition, chronological/reading age discrepancies or ratios do not appear to be consistently distributed across the school years. Aside from age and grade equivalent scores, the WRMT-R (Woodcock, 1987) provides, for a selection of reading assessments, both purportedly interval measures, W scores, based on Rasch scaling and also standard scores which are Wechsler scaled (across ages constrained to Ms = 100, SDs = 15). Varying degrees of skew correction towards normal distribution occur in deriving WRMT-R standard scores from the distributions found on the W scale. Rasch scaling (Wright & Stone, 1979) is based on an assumed ability-item response function (logistic distribution with equal discriminability between items), which does not directly constrain the distribution of ability within the population. This makes it difficult to say what effect its scaling assumption may have for a particular application. The logistic distribution is apparently adopted because (a) it is a good approximation to the normal distribution and (b) it is mathematically more tractable for the derivations used in developing the model. It seems likely that results established using the logistic distribution would apply with a reasonable degree of approximation using the more familiar normal distribution. The norming approach adopted in this study was to seek a transformation of the raw scores that provided a reasonable approximation to a normal distribution of individual differences with similar standard deviations for each age group if this were consistent with the data. Subtracting the smoothed age trend from this, and linear re-scaling, would leave approximate Wechsler scaled standard scores. It is possible that such a transform does not exist. The first preference for a transform was the inverse normal transform of percentage correct, which approximates the use of the logistic in the initial stage of Rasch scaiing. The normal distribution used in earlier Thurstone scaling (Wright & Stone, 1979). Further steps would depend on the pattern of departure from the scaling objectives. It is highly desirable that norming samples be representative, commonly taken to mean representative of the national population. It is arguable that local norms, representative of narrower populations such as the local school system, may be preferable when these are available. If local norms are used, information about how the local population differs from the wider population is required to allow generalisation across studies. Poor reading skills are commonly associated with socioeconomic disadvantage. However, the Edwards and Hogben (1999) norms from Sydney and Perth cover major urban centres with more favourable socioeconomic indicators, such as income and unemployment levels, than the Australian average. This is especially true for the Sydney sample, which has also excluded poor readers. Raz and Byrant (1990) studied socioeconomic differences in reading in the United Kingdom. They found that the difference between middle-class and disadvantaged children in phonolog-

AustralianJournal of Psychology

- December 2000

Norming Tests of Basic Reading Skills

ical abilities became serious after they commenced school at about 5 years. Some social-class differences in reading comprehension remained after allowing for IQ and phonological abilities. Studies conducted in the United States have found rural disadvantages in cognitive abilities that are not accounted for by family income differences (e.g., Coon, Carey, & Fulker, 1992). These authors suggested that the differences may be due to less variety of stimulation or less competition in rural environments compared to urban environments. These studies suggest that the use of norms from advantaged urban areas is likely to be least appropriate where reading disability is of greatest prevalence if similar effects are substantial in Australia. Recognising that research measures, such as selected word lists, are unlikely to be provided with fully developed norms, this study aims to demonstrate a procedure that provides distributions sufficiently close to normal and sufficiently linear with a transform of age, for parametric statistics to be appropriate. It is a further aim of this study to provide norms that include economically disadvantaged urban and rural areas in order to estimate the effect of these demographic factors on reading norms.

I4I

ing all tests and then carried out the assessments. All testing was carried out in a quiet room in each participant’s school. The Word Identification subtest of the WRMT-R was administered according to standard instructions (Woodcock, 1987), as were both forms of the Martin and Pratt Nonword Reading Test (Martin & Pratt. 2000). That is, testing ceased on each of these tests when the child reached the specified ceiling. All children attempted to read all words from the Regular and Irregular Word Test (Coltheart & Leahy, 1996). For all tests, self-corrections were permitted, and if a child had not responded to any word after prompting (“Can you tell me what that word says?”), the experimenter moved on to the next word. All responses for all tests were scored as either correct or incorrect. Tests were administered in counterbalanced order in one session for each child. The single session ranged in length from 20 minutes to 40 minutes depending on the wordrecognition skills of the individual child. The study received ethical approval from the University of Tasmania Human Ethics Committee and from the Tasmanian Department of Education Ethics Committee. RESULTS Development of Norms

METHOD Sample

A sample of 863 students aged 6.0 to 16.0 years were selected to represent the southern Tasmanian State School student population. A small proportion of students was sampled from a private independent high school to represent the proportion of students from the state primary system, who follow that pattern of enrolment. All schools were co-educational. Children were randomly selected from class lists, excluding only those inclusion students with severe disabilities. The sample included 433 females and 430 males. All students were tested in the second half of the school year, in Grades 0 (called Prep in the Tasmanian State School system, prior to Grade 1) to Grade 10. As agedgrade differ between education systems, results are reported by age. Measures

Age (decimalised, 6.5 represents 6 years 6 months at the date of testing) and sex were recorded for individual children. The average family income for the Australian Bureau of Statistics (ABS) census district in which each school was located (which does not usually correspond with the boundaries of the school recruitment area) from the 1991 census (ABS, 1992) was recorded. Schools were also classified as urban (Hobart and its outer suburbs) versus nonurban (all other) on the basis of school location. This does not correspond to the ABS urban categorisation, which includes towns over 1,000 population. The Martin and Pratt Nonword Reading Test (Martin & Pratt, 2000) consists of two parallel forms, A and B, of 54 nonwords each, which were combined to give a total of 108 nonwords for this study. The Regular and Irregular Word Tests (30 words each) which derive from Castles (1994) are reported by Coltheart and Leahy (1996) as matched on word frequency. All 60 words were presented in random order. The WRMT-R Word Identification subtest (Woodcock, 1987), 106 words, including regular and irregular words, was also administered. These words appear to have been approximately rectangularly distributed on the Rasch scale. Procedure

Schools were selected to represent the family income and urbanlnonurban distribution for the region. Children were selected randomly from class lists in selected schools. Postgraduate psychology students were trained in administer-

Children were divided into I-year class intervals from 6.0 to 9.9, and 2-year intervals from 10.0 to 15.9. The number of items correct for each test was converted to a proportion treating number correct as the midpoint of a class interval. Interpolated percentiles for grouped data were obtained for nine percentages corresponding to the normal integrals for z scores from -2.0 to +2.0 in 0.5 increments, 2.3%, 6.7%, 15.9%. 30.9%. 50%. and so on. These will be equally spaced if the distribution of scores is normally distributed. Figure 1 shows the proportion correct corresponding to these percentiles across age for the four tests. The Martin and Pratt Nonword Reading Test shows a degree of contraction of lower scores for younger children and of higher scores for older children. These contractions represent reduced discrimination towards these extremes. Three children scored zero over both forms, and no child was correct on all 108 items. There appears to be some negative acceleration in the increase with age, though this is reduced by the lower contraction for lower percentiles and increased by the upper contraction for higher percentiles. Similar effects occur for the Word Identification subtest, with a more obviously negatively accelerated increase with age. These effects are more substantial for the Irregular Words and extreme for the Regular Words. Detailed norming of the Martin and Pratt Nonword Reading Test used cumulative percentage frequencies for each score for I-year intervals. This established that an inverse normal transformation of proportion correct gives a close approximation (? > 0.97) to a normal distribution of scores between the 2.5th and 97.5th percentiles for almost all age groups. The inverse normal transformation of scores does not generally constrain the resulting distribution to be normal; it has been found to do so for this test. The curve-fitting approach allowed means and standard deviations of the normal distributions fitted to the transformed scale between the 2.5th and 9721th percentiles to be obtained, which excludes the effects of outliers. The means were well fitted as a linear increase in the logarithm of (age - 5 years). The standard deviations showed no systematic change with age. These fitted functions were used, with appropriate re-scaling to equate forms A and B, to provide an interval score without age corrections (approximately normally distributed within age groups) and an age-normed standard score (intended to have a mean of 100 and a standard deviation of 15 at each age). These are shown in a form intended for spreadsheet calculation (Appendix A).

Ausvalian Journalof Psychology - December 2000

142

James R.M. Alexander and Frances Mardn 1.o

1.o

0.9

0.9

0.8

0.8

3 0.7

d

[g:

t: 8 Oh

E 0.5

0.5

0

2 0.4

3 0.4

& 0.2

2

k 0.2

0.1

0.1

0

90.3

0.3

0.0

0s

6

7

8

9

10 11 12 l3 14 15

6

7

8

9

10 11 12 13 14 15

Age

Age

la Martin and Pratt Nonword Reading Test

lb. WRMT-RWord Identification subtest

1.o

1.o

0.9

0.9

0.8

w

0.8

8 0.7

% 0.7 h

0.6 E 0.5

0.6

u c 0.5

2 0.4

0

3

80.3

80.3

0.4

%; 0.2

& 0.2

0.1

0.1

oa

0.0

6

7

8

9 10 11 12 13 14 15

+ -t--

7

8

9

10 11 12 13 14 15

- Id. Regular Word Test -

lc. Irregular Word Test b .

6

Age

Age 97.7Sbile 93.3 Iile 84.1 Iile

-+-

69.1 I i l e SO'kilc --C- 30.9 I i l e

-P-

15.9Sbik 6.7 'kile 2.3 Iile

Figure I

Proportion correct for selected percentiles across age. A simplified version of the norming process used for the Martin and Pratt Nonword Reading Test is illustrated in Figure 2a, and applied to the other tests in Figures 2b to 2d. The proportions correct used in Figure 1 are converted to z scores by the inverse normal function, and plotted against LogAge5, the logarithm of (age - 5 years). Excluding the lines representing the 97.7th and 2.3rd percentiles, which are least reliably estimated, the lines are closer to parallel (implying constant spread), equally spaced (implying normally distributed), and straight (allowing a simple age correction) than in Figure 1. The departures from these criteria do not appear systematic across the age range for the Nonword, Word Identification, and Irregular Word Tests, except that 6-year-olds are well below this estimated age correction for the word reading tests. For the Martin and Pratt Nonword Reading Test, 14- to 15year-olds appear to have less spread, and for Word Identification, 7-year-olds appear to have more spread. The Regular Word Test is not appropriately scaled by this procedure, but as more than 50% of participants above 9 years scored 28 to 30 out of a possible 30, good scaling could not be expected. It would be possible to improve the fits (e.g., subtracting a higher value than 5 from age would improve the linearity for the 6-year-olds for the word-reading tests). However, it is desirable to establish whether variations are reliable and not otherwise accounted for before adjusting scaling to fit the sample, as scaling to fit random error in the norm sample

would include that random error in all subsequent uses of the norms. The initial scaling degree of fit to assumptions is clearly sufficient to allow the appropriate use of parametric statistics, except arguably for Regular Words. Age-normed scores were derived for the four tests with intended means of 100 and standard deviations of 15. These are expressed as spreadsheet functions in Appendix A. They are preceded by interval scale equations that d o not adjust for age and are comparable in their intended uses to the WRMT-R W scale. Analyses of age-normed scores over an extended age range may suffer from differential restriction of extreme scores. Older children are limited in high scores and younger children restricted in low scores. To reduce this effect and limit the influence of extreme scores, scores above 130 were re-coded as 130 and those below 70 as 70, effectively limiting the range to two standard deviations either side of the mean on the intended scale, to give restricted age-normed scores used for statistical analyses. This restriction will reduce standard deviations of 15 to 14 for a normal distribution. Schooling Effects The departure of the 6-year-olds from the proposed age function for the three word reading tests, but not for the Martin and Pratt Nonword Reading Test, might arise from a grade effect. Most 6-year-olds were in Grade 0 (Prep, where reading

Australian Journalof Psychology

-December 2000

Norming Tests of Basic Reading Skills 25

25

2.0

2.0

1.5

B

E

I .5

1.0

B

03

a

tc

0.0

0.5

-0.5 -1.0

5

-15

1.0

0.0

5

tg -OJ-1.0 5

-1.5

-2.0

-2.0

-2.5

-2.5

0.1 0.2 0.3 0.4 0 5 0.6 0.7 0.8 0.9 1.0

0.1 0.2 0.3 0.4 0 5 0.6 0.7 0.8 0.9 1.0

2a. Martin and Pratt Nonword Reading

LogAgd 2b. WRMT-R: Word Identification sub-

Test

test 2.5

2.5

2.0

2.0

1.5

P 2

c

1.5

0.5

& 1.0 a 0.5

0.0

0.0

1

1.0

6c

E

E -03

-05 w -1.0

* -1.0 0

a

-1.5

d -15

-2.0

-2.0

-2.5

-23

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

-

LogAgd 2c Irregular Word Test 97.7 Qile

+ 93.3 %ile --O- 84.1 Qilc

Figure 2

I43

LogAg6

2d. Regular Word Test 69.1 %ilc

-C- 504bilc

30.9Qilc

-rp,

15.9 Qilc 6.7 Qilc 2.3 Qile

Normal transform of proportion correct for selected percentiles across log transform of age.

teaching is limited). To determine whether there were age effects within grades, children from Grades 0 to 2 were divided into three groups of 4-month intervals of age at initial enrolment. As the Tasmanian State School system followed a relatively strict age-for-grade policy during this period, only 12 participants fell outside this 1-year interval and were excluded for these analyses. As students were almost all tested late in the school year, this categorisation by age at enrolment largely overlaps with a categorisation based on age at test in slightly larger intervals, and equivalent results were found analysing by subcategories of age at test. ANOVAs were conducted for three grades and three age subcategories for the restricted age-normed scores, which tests whether the initial age correction is inadequate. For the Martin and Pratt Nonword Reading Test there was no significant departure from the initial age norming for grade or age subcategories. For the Word Identification standard scores, which compare these children with the United States norm sample, there are significant grade effects, F(2, 228) = 6.8,p = 0.001, and age subcategory effects, F(2, 228) = 3.1, p = 0.049. Student-Newman-Keuls (SNK) post-hoc tests at the 0.05 level found performance at Grade 2 (M = 99.6)to be significantly less than that of Grade 0 (M= 105.5) and Grade 1 (M= 108.6) children. Children in Grades 0 and 1 of the Tasmanian State system are performing above the United States norm sample of the same age. For the marginally significant age subcategory effect, the younger children within the grade (M= 108.0)are significantly higher than the older children (M = 101.9)by

SNK. While the interaction is not significant, this age difference is most obvious within Grade 0 (M = 113 versus M = 100). This might reflect an overcorrection in detailed age nonning for the United States norms, attempting to correct for grade or teaching-pattern effects. The norming process used for the Martin and Pratt Nonword Reading Test was also applied to the Word Identification raw scores to provide local norms. With local norms, children in Grade 0 (M= 95.4) performed at a significantly (SNKs) lower level than children in Grades 1 (M= 106.7) and 2 (M= 102.7). However the age-within-grade effect did not approach significance, F(2, 228) = 0.57,p = 0.5,nor did the interaction. The Irregular Word Test shows a much larger grade effect, with children in Grade 0 (M = 84.9)performing at a significantly lower level than children in Grades 1 (M = 99.3)and 2 (M = 97.1)by SNK. There was no age-within-grade effect. The Regular Words showed a similar pattern, with Grade 0 (M = 83.6)performance significantly lower then Grades 1 (M = 96.9)and 2 (M = 96.7)by SNK, and no age-within-grade effect. Steeper corrections for age would not be justified on these analyses, which establish that it is necessary to allow for grade if Prep children (Grade 0) are to be scaled similarly to the later grades. A correction in which the Grade 0 mean was increased to 100 was therefore added to the age norms for the Irregular (+IS)and Regular (+16) Word Tests and for the local norms for the Word Identification subtest (+5) for some further analyses. This Prep correction can not be assumed to be idid where

Australian Journalof Psychology

-December 2000

I 44

James R.M. Alexander and Frances Manin

different patterns of early reading teaching occur. It is based

on assessment late in the teaching year and is not included in the age-norming equations in Appendix A. Family Income ANOVAs were conducted for age, sex, and income differences in 2-year age intervals for the Prep corrected, restricted, agenormed scores. The income categories are average family incomes for the district where the school is located from the Australian Bureau of Statistics 1991 census (ABS, 1992). Age differences were significant only for the Word Identification subtest, using United States norms, F(4, 833) = 9.5, p c 0.001. The 6 to 7 year group performed at a significantly higher level than all older groups by SNK.For local norms of the Word Identification subtest, the 6 to 7 and 8 to 9 year groups were significantly higher than the 12 to 13 year group. There were no significant interactions. Table 1 summarises the sex and income differences for the four tests, using the comparable local norms for the Word Identification subtest. It includes also medians for the regular words, which are substantially ceiling-limited for older children, with standard deviation estimates derived from the difference between the 50th and 15.9th percentiles. Children were also classified as urban (Hobart and outer suburbs) versus nonurban (all other) on the basis of school location. ANOVAs of age, sex, and urbanlnonurban found urban means significantly greater than nonurban by three points for the Nonword, Word Identification (United States norms), Irregular Word, and Regular Word Tests. There were no significant interactions with age or sex. However, after allowing for income differences by analysis of covariance, none of the urbadnonurban differences were significant. Comparison with Other Norms The Edwards and Hogben (1999) norms from Sydney and Perth for the Irregular and Regular Word Tests offer cumulative percentiles and normalised Z scores for age groups 7 to 12 years. Their approach, treating raw scores as categories, does not provide a Z score for the maximum score obtained, which affects a substantial number of participants for Regular Words and will give higher values than treating raw scores as class intervals. The Regular and Irregular scores for our participants aged 7 to 12 were converted to Z scores from their tables with 2 scores for maximum scores estimated by the class-interval approach. These Z scores were converted to 100-15 scores and restricted to the range 70 to 130 (which reduces the expected standard deviation to 14). For the Tasmanian sample, the mean

SydneyPerth norm scores did not vary with age (for year categories: F(5. 551) = 0.4, p = 0.9 for Regular Words; F(5, 55 1) = 0.3, p = 0.9 for Irregular Words), showing comparable age adjustments. The mean Sydney/Perth norm score for Regular Words was 92.8 (SD= 16.3) and for Irregular Words the mean was 95.1 (SD= 18.2). This suggests that the Tasmanian scores are lower and more dispersed than the SydneyPerth norm sample. Median Sydney family income from the 1991 census was $A35,400 and for Perth this was $A29,500 (ABS, 1998). Selecting Tasmanian children from schools in urban areas with incomes over $30,000 gives a mean Regular Words SydneylPerth norm score of 98.7 (SD= 14.6) and for IrreguIar Words a mean of 102.8 (SO= 15.7) for 109 children (mean income = $35,300). The WRMT-R Word Identification scores (United States norms) similarly allow a comparison of the Tasmanian sample with a representative United States sample, which is less urbanised than the Australian population. The Tasmanian 7year-olds are substantially above the United States norm (M= 108.3, SD = 16.4). which may reflect a teaching-pattern effect. However, for 9 to 11 years (M= 97.7, SD = 17.7) and for 12 to 15 years (M = 97.1, SD = 17.0). they are lower. The lower mean performance for these older children is consistent with a modest economic disadvantage. The increased standard deviations (expected to be 14, as the influence of outliers has been limited) would be consistent with substantially greater socioeconomic inequality in Tasmania, which seems unlikely, though unemployment (4% in the United States sample and above 10% in the Tasmanian population) may increase the inequality. Sex Differences

Sex differences (see Table 1) were significant for all tests except the Regular Word Test. Levene's test for inequality of variances was significant for all except the Word Identification subtest with United States norms. The general trends were for females to have lower standard deviations, especially in older groups. Males had lower means and higher standard deviations, showing that males are disproportionally frequent in the lower ranges of these reading tests, but not necessarily less frequent in higher ranges. Table 2 shows the percentages of males and females in high and low ranges of the unrestricted Prep-adjusted norms of the four reading tests. If a stringent criterion is adopted for poor reading (< 70), then males outnumber females by about 2 to 1; with a less stringent criterion (< 85). the ratio is about 1.6 to 1. However, the trend is for slightly more boys to be good readers (> 115) on most measures.

Table I

School Area Family Income (in Thousands $ W e a r in I99I ) and Sex Differences in Mean Age-normed Scores for the Martin and Pratt Nonword Reading Test, the WRMT-R Word identification Local Norms, and the Irregular and Regular Word Tests (Standard Deviations in Parentheses) WRMT-R Martin and Pratt Nonword Reading Test Word Identification

Irregular Word Test

Regular Word Test means

.oo I

Regular Word Test medians

Income

P

.oo I

.c .OOl

< .oOl

3042

101.5 (13.9)

103.6 (14.1)

102.0 ( 14.3)

99.2 (12.4)

101.9 (14.0)

25-29

99.5 ( 14.2)

100.6 (15.1)

98.7 ( I 5.8)

97.2 (I 3.6)

100.1 (16.7)

2&24

97.5 ( 14.6)

97.5 ( 14.3)

95.0 ( I 5.0)

95.2 (I 3.7)

96.8 ( I 7.5)

SeX

P Female Male

.022

.024

.002

.I35

100.6 ( I 3.2)

101.8 (13.3)

100.5 ( 14.2)

98. I ( I 2.0)

99.9 ( I 3.7)

98.6 (I 5.3)

99.6 ( I 5.8)

97.0 ( I 6. I )

96.5 ( 14.4)

99.3 ( I 8.8)

Ausualian Journal of Psychology

- December 2000

Norming Tests of Basic Reading Skills

Correlations

To examine the relations in individual differences between these variables, Prep students were excluded and the students were divided into three age groups: 6 to 8 years, 9 to 11 years and 12 to 15 years. Correlations between the restricted agenormed standard scores for these age groups are shown in Table 3. The general trends in individual differences shown by these correlations are for a slight decrease of correlations with age (except for Martin and Pratt Nonword Reading with WRMT-R WI), and for the three word reading tests to be more strongly interrelated than the Martin and Pratt Nonword Reading Test is with the word reading tests. The Martin and Pratt Nonword Reading Test is marginally least associated with the Irregular Word Test, r declining from .61 to .51 with age. Correlations of this size (after limiting the influence of extreme scores) allow substantial variation in students’ relative performance on Nonword versus Irregular Words. DISCUSSION

While it is desirable that tests’of basic reading skills be developed to professional standards on a well-developed theoretical basis, as for the WRMT-R, this is an expensive and conservative process. Newer developments may depend on specific subtests, such as the Castles and Coltheart (1993) or Coltheart and Leahy (1996) Regular and Irregular Word Tests, with substantial limitations in scaling. Ceiling effects are not necessarily a problem for individual assessment when the application is restricted to low performance, but, as with the attempt to categorise subtypes of dyslexia, scaling problems may substantially affect conclusions based on more powerful statistical procedures, such as linear regression, and render any parametric statistics questionable. The approach used to scale the Martin and Pratt Nonword Reading Test achieves the same objectives for this test as the more complex approach used by the WRMT-R in achieving age-normed standard scores, approximately normal distributions, and equal standard deviations across ages. The Raschbased approach of the WRMT-R has a better developed theoretical basis, but any scaling benefits are lost when this is

I45

re-scaled to correct skew and make standard deviations equal, as in the WRMT-R age-normed standard score derivation. T h e Martin and Pratt Nonword Reading Test scaling approach has also been found to be applicable to the Word Identification and Irregular and Regular Word Tests, though with decreasing effectiveness for the more limited tests. It is clear that these re-scaled scores are more appropriate for parametric statistical analyses than the original raw scores. The effectiveness of the inverse normal transformation in making distributions closer to normal, despite the differences in itemselection processes for the four tests, mainly reflects the contraction of raw total correct scores towards the extremes of the tests. This indirectly reflects the broad spread of individual participants’ item responses around average item-difficulty levels. The effectiveness of age corrections proportional to the logarithm of (age - 5 years) could be interpretable in terms of a power law of practice, suggesting that a 6-year-old child has the equivalent of about 1 year of relevant experience. Analyses based on scaled scores showed that the departures from the general age trend by the youngest children could be largely attributed to a schooling-pattern effect. For the three word-reading tests, there is evidence that children in the second half of the Prep year before Grade 1 in the Tasmanian school system perform below the levels projected from the age trends for children in later grades. This is consistent with a causal effect of schooling, which has been shown for phonological and reading measures by analysing age-of-entry cut-off effects in the United States (Momson, Smith, & Dow-Ehrensberger, 1995). Attempting to correct this by a steeper age correction, however, may produce perverse age-within-grade differences, and the data suggests that this occurs with the United States age norms for the Word Identification subtest. The norming approach used here achieves the objective of producing no significant age-within-grade effects. It is not possible to provide a general correction for grade-assignment effects because of differences between school systems, but it is likely that they will be negligible after the first 2 years. For the Martin and Pratt Nonword Reading Test, there is not a significant Prep effect, suggesting that by the second half of the school year these children were not lacking an introduction to the basic decoding skills within their developmental capabilities. However, differences in the Prep effect between

Table 2 Percentage of Males and Females in High and Low Ranges of Normed Scores

_-

-___ Martin and Pram Nonword Reading Test Females

Males

WRMT-R W o r d Identification subtest Females

Males

Irregular W o r d Test Females

Males

Regular Word Test Females

Males

Below 70

I.4

4.7

3.0

5.3

3.5

7.9

3.7

8.4

Below a5

13.2

21.4

10.9

17.9

13.9

23.0

I4.8

22. I

Above I 15

13.1

14.7

13.4

17.7

13.6

12.3

4.6

7.7

Table 3 Correlations Between Age-normed Scores for the Four Tests

6 to 8 years

Age Test

NW

NW WIL

.6a

WIL

IRR

REG

NW

9 to I I years WIL IRR

12 t o I 5 years REG

.66

IRR

.6I

.92

REG

.6a

.91

.a9

NW

WIL

IRR

REG

.67

.5b

.a6

.66

.a6

.a2

.5 I

.ao

.5a

.76

Note. NW = Martin and Pram Nonword Reading Tesr WIL = local norms for WRMT-R Word Identificationsubtest IRR Regular Word Test

Australian Journalof Psychology

- December 2000

.73

= IrregularWord Test REG =

I46

James RM. Alexander and Fnnces Martin

the tests must partly reflect scaling limitations, as it is much larger for the Irregular and Regular Word Tests than for the WRMT-R Word Identification subtest. This probably reflects a relative absence of items within the limited range of the Prep children on the Coltheart and Leahy (1996) lists. The Martin and Pratt Nonword Reading Test has found items suitable for most 6-year-olds, but its discrimination between the lowest performers is limited before 7 years. There are significant differences for all tests between higher and lower income school districts, about two to three points per $A5,000 of annual family income. As average annual family incomes differ between states and ruraYurban areas as well as between schools, the mean performance in other school systems is likely to differ from 100. There were urbadnonurban differences, but these could be largely accounted for by the family income differences between urban and nonurban areas in Southern Tasmania. Some United States studies have found ruraYurban differences in cognitive abilities that are not accounted for by family income differences (Coon et al., 1992), but that may reflect the negligible ruralhrban differences in average family income in those societies studied, associated with different bases for socioeconomic segregation. As the data in the current study comes predominantly from government-funded schools, it is unlikely to reflect differences in school funding. Average family income for the school district is probably partly an indirect index of more powerful individual differences that vary with socioeconomic status (SES). However, some United States studies have found differences in cognitive abilities between communities after controlling for differences in SES of the individual families (Coon et al., 1992). supporting the view that environmental influences on cognitive development may arise from neighbourhood or school influences as well as from the family. Badian (1994) found a correlation of 0.31 between Grade 1 word reading and SES, which was 0.28 after controlling for a verbal IQ measure; similar values were found for reading comprehension. Whithurst and Lonigan (1998) provided a detailed model for some SES-associated factors, predicting Grade 1 and Grade 2 reading in disadvantaged Head Start children. This suggests paths from mother’s IQ and education through literacy environment to preschool language skills, and largely via decoding skills to reading. Raz and Bryant (1990) found social-class differences in some home measures related to reading, such as number of books owned, but not in others, such as how often the child looked at books. They found that the number of books owned played a negligible role in reading ability within the social groups, suggesting that it may be a marker of other SES differences. Whatever the cause of social class differences in reading, it cannot be assumed that differences between schools reflect variations in teaching effectiveness, without appropriate controls for established demographic effects. There i s sufficient evidence to conclude that females average about 2 points higher than males on these tests, but it is misleading to interpret this as a general male disadvantage as there are as many boys who are good readers as girls in this sample. The disadvantage is restricted to boys who are average and below-average readers. Other studies have shown differences in the variance of males and females. Reynolds et al. (1996) obtained word-reading scores for 1,3 19 twin pairs aged 8 to 16. Females scored consistently but nonsignificantly higher across the age range. The male/female ratio of standard deviations (the square root of the more commonly reported variance ratios) was 1.07, compared with 1.13 to 1.20 in the present study. Feingold (1992) reviewed sex differences in variability for cognitive abilities and found that they varied from negligibly

or slightly larger for females for some language and speed measures to substantially higher for males for space relations and mechanical reasoning, reading comprehension, and spelling. Feingold (1992) also showed that some differences changed over time (1947-1980) or grade (e.g., for spelling, the male/female standard deviation ratio increased from 0.96 in Grade 8 to 1,18 in Grade 12). and reported appreciable differences between cultures and nations. If these changes in ratios are reliable, they seem to support environmental rather than genetic causes. Reynolds et al. (1996) interpreted their twin data as showing equal heritability for males’ and females’ word reading, with greater effects of both genetic and environmental factors on males. The alternative interpretation of their data is higher heritability for males. However, if the differences in variability reflect competitionlconformity differences between males and females, these may differ in non-twins. Hedges and Nowell (1995) reviewed data from a number of very large and representative United States samples. They concluded that males consistently have larger variance in the ability and achievement measures studied. The sex ratios in standard deviations for reading comprehension range from 1.01 to 1.08, though the lower ratios may reflect selective dropout of poor readers for studies based on Grade 12 populations. For complete populations or younger samples, the ratio of males to females in the bottom 10% for reading comprehension is 1.5 to 1.7, compared with 0.8 to 0.9 in the top 10%. The Tasmanian sample shows slightly higher sex ratios in standard deviations for nonword and word reading, reflecting the sex ratios for prevalence from the top 16% (about 1) to the bottom 16% (about 1.6). Ackerman and Dykman (1993) noted that a number of studies have found smaller male/female prevalence ratios in school population studies of poor readers than in schoolreferred or clinic samples of reading disability. This is often attributed to referral biases, perhaps related to the prevalence of behaviour problems or attentional deficits in boys with poor reading (Sanson, Prior, & Smart, 1996; Vogel, 1990). Ackerman and Dykman (1993) also noted some findings that sex prevalence ratios are less (about 1.5 maledfemale) for children with low IQ (backward readers, the majority of poor readers), than for specific reading-disabled children with above average IQs and poor reading (ratio 5 males/females). Maughan, Pickles, Hagell, Rutter, and Yule (1996) reported expected sex prevalence in the United Kingdom to be roughly equal in backward readers compared t o 3 to 1 (maledfemale) in specific reading disability. The correlations between these tests show a general tendency to decrease slightly with age. Nonword reading is less related to the three word reading tests than the wordreading tests are with each other; this is consistent with the correlation between the WRMT-R WI and Word Attack (nonword reading) subtests, 0.72 in Grade 3 to 0.66 in Grade 8 (Woodcock, 1987). In particular, individual differences in regular word reading are more related to irregular word reading than they are to nonword reading, contrary to the trend of raw score correlations reported by Coltheart and Leahy (1996). However, Coltheart and Leahy reported raw score correlations, which include the effects of common developmental trends and skewed distributions. Our analysis suggests that individual differences in the reading of regular words in such tests is more dependent o n lexical processes than on phonological decoding, and this is so across the school years. For a range of word-reading tests, it has been shown that distributions are more appropriate for parametric statistics if raw scores are transformed to the inverse normal of the proportion of items correct. On this scale, age effects across the school years are linearly related to the logarithm of (age -

Australian Journalof Psychology

-December 2000

Norming Tests of Basic Reading Skills

5 ) years. For the first few years of schooling (up to about 8 years of age), however, there is evidence for teaching-pattern effects, which means that the performance of individual children needs to be assessed by comparison with the distribution of those with a similar pattern of education. APPENDIX A Spreadsheet Formula Norms

For any substantial testing program, it will be convenient to enter data into a computer spreadsheet program. This section describes how interval scales and age-normed standard scores may be obtained as spreadsheet functions. The age standard scores obtained by these functions offer more continuous adjustments for age than are offered by tabled values at 6month intervals for younger children. The formulae refer to Microsoft Excel functions, but equivalent functions may be available in other spreadsheet or computer programs.

Form B. [NWageB] = 22.2.4*NORMSINV((m+ 0.5) /55) 47.5*LOG(&Q 5) + 127.63

-

[WIage] =

(2)

In these formulae, [NWIA] means type the following bold entry into the cell to contain the result (Le., type the entry beginning "= 23.20* ..." into cell E 3 if this is to contain NWIA), replacing underlined entries, e.p. NWA, by references to the cell containing the corresponding value (which might be D3, for instance). Once entered for the top cell in a column, the formula can then be Pasted Down to provide appropriate calculations for every row that has the data required in the same columns. Age-adjusted Nonned Scores for the Martin and Pratt Nonword Reading Test

The interval scaled value, NWIx, whether derived from Form A or Form B (or NWI, the average of NWIA and NWIB) is adjusted for age differences proportional to the logarithm to base 10 (LOG in Excel, not to be confused with LN) of (AgeD - 5 ) years, and the mean adjusted to give a mean of 100 for samples representative of the Southern Tasmanian State School student population. Age at the date of assessment must be represented as a decimal (AgeD, with months of age converted to a fraction of 12; if months are not known, it is recommended that 0.5 be added to age in years). [Wagex] = NwIx 475*LOG&@ - 5) (3)

-

Direct estimation of age-normed scores from raw scores can be obtained by substituting (1) and (2) in (3).

-

(4)

(5)

-

(7)

(8)

Scales from number correct on the irregular Word Test (Coltheart & Leahy, 1996). In these formulae, the Irregular Word Test raw score, CLI, is converted to an interval scale, CLII, and an age-normed standard score, CLIage. [CLII] = 27*NORMSINV((a+ 05) / 31) + 126

(9)

a-58*LOG(&& - 5)

(10)

-

Form B. [NWIB] = 22.24*NORMSINV((NWB+0.5)/55)

Form A. [NWageA] = 2 3 . 2 O * N O R M S I N V ( ~ 0.5)/55) + 4 7 5 ' L O G W 5) + 128.04

- 5)

[CLIage] = 27*NORMSINV((u+ 0.5) / 31) 58*LOG(Be(;g 5) + 126

(I)

+ 127.63

- 64*LOG(&&

[WIage] = 3 0 * N O R M S I h V ( ~ +0.5) I107) 64*LOG(&Q 5) + 136

[CLlage] =

Form A. [NWIA] = U.ZO*NORMSINV((NlW+ 0.5)/55) + 128.04

-

Scales from number correct on the WRMT-R Word Identification Subtest. In these formulae, the WRMT-R Word Identification subtest raw score is converted to an interval scale, WII, and an age-normed standard score, WIage. [ W l ] = 3O*NORhISINV(@j!L+ 0.5) I 107) + 136 (6)

Interval Scales from number correct on the Martin and Pratt Nonword Reading Test

The proportion correct on the test (which must be calculated in the strict statistical form, treating each raw score value (NWA or NWB) as the mid-point of a class interval, so the range for 54 items is 5 5 ) is converted to a z score by use of the inverse cumulative normal function (so 0.025 becomes approximately -2, and 0.975 becomes approximately +2), called NORMSINV in Excel. These are then scaled separately for Form A and Form B to adjust for differences between the forms. The scaling aims for standard deviations which approximate 15 for fixed age. Age effects remain on means, which are adjusted to make scaled values positive and means equal to 100 for age 6.0, and thus equal for forms A and B.

I47

(1 1)

Scales from number correct on the Regular Word Test (Coltheart & Leahy, 1996). In these formulae, the Regular Word Test raw score, CLR, is converted to an interval scale, CLRI, and an age-normed standard score, CLRage. [CLRI]I19.2*NORMSINV((u+ 0.5) I31) + 102 (9) [CLRage] = Q.EI

- 40*LOG(&& - 5 )

[CLRage] = 1 9 . 2 * N O R M S I N V ( ~ + 0.5) / 31) 40*LOG(BeeB 5) + 102

-

(10)

(1 1)

REFERENCES Ackerman, P.T.. & Dykman, R.A. (1993). Gender and reading disability. Journal of Learning Disabilities, 26, 498. Anastasi. A. (1990). Psychological resting (6th ed.). New York MacmilIan. Australian Bureau of Statistics. (1992). Urban centres and localities: Tasmania. 1991 census. Hobart: Author. Australian Bureau of Statistics. (1998). Selected family and labour force choracieristics: Australia. 1996 census. Canberra: Author. Badian. N.A. (1994). Preschool prediction: Orthographic and phonological skills, and reading. Annals ofDyslexia, 44, 3-25. Castles, A. (1994). Varieties of developmental dyslexia. Unpublished PhD thesis, Macquarie University. Castles, A., & Colthean. M. (1993). Varieties o f developmental dyslexia. Cognition. 47, 149-180. Coltheart, M., Curtis, B., Atkins, P., & Haller, M. (1993). Models of reading aloud: Dual-route and parallel-distributed-processing approaches. Psychological Review, IOQ, 589608. Colthem, M., & Leahy, J. (1996). Assessment of lexical and nonlexical reading abilities in children: Some normative data. Australian Journal of Psychology, 48. 136-140. Coon, H., Carey, G.. & Fulker, D.W. (1992). Community influences on cognitive ability. Intelligence, 16, 169-188. Edwards, V.T., & Hogben, J.H.(1999). New norms for comparing children's lexical and nonlexical reading: a further look at subtyping dyslexia. Australian Journal of Psychology, 51,3749. Feingold, A. (1992). Sex differences in variability in intellectual abilities: A new look at an old controversy. Review of Educational Research, 62, 61-84. Hedges, L.V., & Nowell. A. (1995). Sex differences in mental test scores, variability, and numbers of high-scoring individuals. Science. 269, 41-45. Manis, F.R., Seidenberg. M.S.,Doi, L.M., McBride-Chang, C.. & Petersen, A. (1996). On the bases of two subtypes of developmental dyslexia. Cognition, 58, 157-195. Martin, F.. & Pratt, C. (2000). The Marfin and Pra# Nonword Reading Test. Melbourne: ACER. Maughan, B., Pickles, A., Hagell. A., Rutter, M..& Yule, W. (1996). Reading problems and antisocial behaviour: Developmental trends in cornorbidity. Journal of Child Psychology and Psychiatry, 37, 404-418

Australian journal of Psychology

- December 2000

I 48

lama R.M. Alexander and Frances Martin

Morrison, F.J., Smith, L., & Dow-Ehrensberger, M. (1995). Education and cognitive development: A natural experiment. Derelopmental Psychology, 31.789-799. Plaut. D.C., McClelland, J.L., Seidenberg. M.S.,& Patterson, K. (1996). Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review, 103, 56-115.

Raz, IS.. & Bryant, P. (1990). Social background. phonological awareness and children’s reading. British Journal of Developmental Psychology, 8,209-225.

Reynolds, C.A., Hewitt. J.K.. Erickson, M.T.. Silberg, J.L..Rutter. M., Simonoff, E., Meyer, J., & Eaves, L.J. (1996). The genetics of children’s oral reading performance. Journal of Child Psychology and Psychiatty, 37,425-434. Sanson, A., Prior, M., & Smart, D. (1996). Reading disabilities with and without behaviour problems at 7-8 years: Prediction from longitudinal

data from infancy to 6 years. Journal of Child Psychology and Psychiatry. 37.529-541.

Stanovich, K.E., Siegal, L.S.,& Gottardo, A. (1997). Converging evidence for phonological and surface subtypes of reading disability. Journal of Educational Psychology. 89, 114-127. Vogel, S.A. (1990). Gender differences in intelligence, language. visualmotor abilities, and academic achievement in students with leaming disabilities: A review of the literature. Journal of harning Disabifities, 23, 44-52. Whithunt. G.J.. & Lonigan. C.J. (1998). Child development and emergent literacy. Child Development, 69, 848-872. Woodcock, R.W.(1987). Woodcock Reading Mastery Tests - Revised: Exminer’s manual. Minnesota: American Guidance Service. Wright, B.D., & Stone. M.H. (1979). Best test design. Chicago: Mesa.

Australian Journalof Psychology

-December 2000