Document not found! Please try again

A cognitive processing model of reading ...

12 downloads 11180 Views 279KB Size Report
vocabulary the easiest. ..... candidates to master's programs in English studies. ... who seek to pursue their studies for a master's degree in state universi- ties.
Learning and Individual Differences 43 (2015) 100–105

Contents lists available at ScienceDirect

Learning and Individual Differences journal homepage: www.elsevier.com/locate/lindif

A cognitive processing model of reading comprehension in English as a foreign language using the linear logistic test model Purya Baghaei a,⁎, Hamdollah Ravand b,c a b c

English Department, Islamic Azad University, Mashhad Branch, Mashhad, Iran Vali-e-Asr University of Rafsanjan, Iran University of Jiroft, Iran

a r t i c l e

i n f o

Article history: Received 29 January 2015 Received in revised form 2 September 2015 Accepted 5 September 2015 Keywords: Reading comprehension Cognitive processing Linear logistic test model Q-matrix

a b s t r a c t Reading comprehension in a foreign language (FL) is a complex process with many underlying cognitive components. Many second language researchers have tried to explain reading comprehension in terms of taxonomies of subskills and processes. However, the nature of these components is not yet known. Previous research using exploratory and confirmatory factor analysis has yielded contradictory results. The purpose of this study is to investigate the underlying cognitive components and processes of FL reading comprehension using linear logistic test model (LLTM; Fischer, 1973). For the purpose of the present study, the data of 400 applicants taking an advanced high-stakes reading comprehension test were used. The cognitive processes underlying the test were derived. The derived processes were reading for details, making inferences, reading for main idea, syntax, and vocabulary. Linear logistic test model showed that making inferences is the hardest process to employ and vocabulary the easiest. The implications for teaching and testing reading comprehension are discussed. © 2015 Elsevier Inc. All rights reserved.

1. Introduction Reading comprehension in a foreign language is a complex process with many underlying cognitive components. Many researchers have tried to explain reading comprehension in terms of taxonomies of subskills (Alderson & Lukmani, 1989; Hughes, 2003; Jang, 2009; Lumley, 1993; Munby, 1978). However, the nature of these subskills is not yet known. We still do not know ‘whether separable comprehension subskills exist, and what such subskills might consist of and how they might be classified’ (Alderson, 2000, p. 10). Evidence for empirical separability of subskills supports the commonplace practice of dividing language skills into smaller units as a basis for syllabus design and test development and validation. Some researchers argue that reading is a single global construct (Carver, 1992; Rost, 1993). Alderson (2005), based on the results obtained from the piloting of different DIALANG components, concluded that subskills do not contribute to distinguishing between different levels of reading language proficiency hence they are inseparable. Schedl, Gordon, Carey, and Tang (1996) studied the dimensionality of the reading section of the TOEFL (Test of English as Foreign Language) and concluded that different items which were supposed to measure different reading subskills do not explain differences in item difficulty or test dimensionality. Qualitative studies which seek to record content experts' agreement on the subskills that are needed to answer reading items have also failed ⁎ Corresponding author. E-mail addresses: [email protected] (P. Baghaei), [email protected] (H. Ravand).

http://dx.doi.org/10.1016/j.lindif.2015.09.001 1041-6080/© 2015 Elsevier Inc. All rights reserved.

to show consensus among experts (Alderson & Lukmani, 1989). Conversely, there are some other researchers who found evidence and argued for the divisibility of reading comprehension (e.g., Anderson, Bachman, Perkins, & Cohen, 1991, Bachman, Davidson, & Milanovic, 1996, Carr & Levy, 1990, Davis, 1968, Drum, Calfee, & Cook, 1981, Grabe, 1991, 2000, Koda, 2007, Lumley, 1993, Nevo, 1989, Weir, Huizhong, & Yan, 2000). Lennon (1962) argues that it is only possible to reliably measure four subskills in reading ability: word knowledge, comprehension of explicitly stated meaning, comprehension of implicit/inferential meaning, and appreciation. However, Carroll's (1993) factor analytic studies of cognitive tests identified four factors in reading: (general) reading comprehension, special reading comprehension, reading decoding and reading speed. Hudson (1996) suggested that reading comprehension involves processing skills such as local textual comprehension, global textual comprehension, and inference-making. In much the same vein, Weir and Porter (1996) conceptualized reading comprehension as having four categories of processes and skills: (1) local careful reading, (2) global careful reading, (3) local expeditious reading, and (4) global expeditious reading. Freedle and Kostin (1991, 1992, 1993), using multiple regression, identified a set of item and text characteristics that predicted the difficulty of reading comprehension items of Scholastic Aptitude Test (SAT), Graduate Record Examination (GRE), and the Test of English as a Foreign Language (TOEFL). Freedle and Kostin (1991) found that eight variables predicted 58% of item difficulty variance on the SAT. In their next study (1992) they found that seven variables explained 41% of the variation in item difficulties on GRE. Finally, they found that 11

P. Baghaei, H. Ravand / Learning and Individual Differences 43 (2015) 100–105

item characteristics accounted for 58% of the variance in item difficulties in the TOEFL (Freedle & Kostin, 1993). The current general idea among applied linguists is that reading comprehension in a foreign language is composed of several distinct subskills but there is no consensus on the nature of these subskills as different researchers have identified different subskills for second language reading comprehension with or without empirical evidence (Alderson & Lukmani, 1989; Hughes, 2003; Lumley, 1993; Munby, 1978). Previous studies on the cognitive processes underlying reading comprehension have mostly been either with qualitative or correlational methods (e.g., factor analysis or multiple regression). Correlational approaches such as multiple regression or factor analysis are, according to Gorin (2005) and Sonnleitner (2008), limited in the following ways: (1) high correlations between cognitive processes and item difficulty does not guarantee causality, and (2) correlational methods are strongly affected by the range of item difficulties. For example, with items of similar difficulties the cognitive processes in question would not account for the small differences in item difficulties whereas item sets of wide range of difficulty would lead to high correlations with the difficulty-causing cognitive processes. Multiple regression studies, as noted by Buck, Tatsuoka, and Kostin (1997), are further limited in that these studies use item scores to predict item difficulties according to the processes/subskills involved in answering the items. The problem is that the predictors are too many and the items being coded for each predictor (i.e., process) are too few. Cohen and Cohen (1983) suggest about 30 items should be coded for each predictor for stable results in multiple regression. In a study with 10 item predictors, for example, there should be at least 300 items while in the multiple regression studies, which often involve more than 10 attributes, rarely are there more than 100 items. On the contrary, IRT-based methods such as LLTM enable researchers to model cognitive operations and estimate item difficulties in a non-correlative manner. A neglected methodological approach in foreign language reading research for understanding reading comprehension is cognitive processing models such as LLTM and cognitive diagnostic modeling (CDM). In these models the sources of item difficulty are identified and parameterized to test hypotheses about the cognitive components that underlie item solving. Determining sources of item difficulty not only explicates validity of uses and interpretation of scores at the item level but also is useful for rule-based item generation and prediction of item difficulty on the basis of its cognitive features. Latent component cognitive processing models such as the LLTM or component latent trait model (CLTM, Embretson, 1984) have sporadically been used to empirically study sources of item difficulty in first and FL reading (Embretson & Wetzel, 1987; Gorin, 2005; Sonnleitner, 2008). The need to break down learning concepts into smaller manageable units or ‘learning quanta’ for optimal teaching and learning has long been recognized in education (Fischer, 1973; Taber, 2004). Parameterizing these smaller chunks helps in understanding areas of difficulty and in devising remedial programs to help struggling learners. One psychometric model to accomplish this goal is Fischer's (1973) LLTM. LLTM is an extension of the Rasch model (Rasch, 1960/1980) which imposes linear constraints on the difficulty parameter. The model assumes that the overall difficulty of an item, estimated with the standard Rasch model, is the sum of the difficulty of the cognitive processes that are needed to solve the item. The model estimates the difficulty of those processes and tells us whether we have been successful in reconstructing item difficulty parameters (estimated by the standard Rasch model) with the difficulty of the cognitive processes (Baghaei & Kubinger, 2015). Identifying cognitive operations and processes which are sources of item difficulty through LLTM, allows researchers to: (1) understand the nature of the construct and its underlying components, (2) construct new items with predetermined item difficulty parameters without the need to pilot them to estimate their difficulties (Fischer & Pendl,

101

1980), (3) predict the difficulty of items which have unique combinations of the estimated cognitive components. This is particularly helpful in item banking and adaptive testing, and (4) develop appropriate teaching activities that deal with the specific cognitive operations and processes. It must be noted that the concept of subskills or processes under LLTM is different from that under factor analysis. While in factor analysis the subskills are hypothesized to be separate dimensions of ability this is not the case in LLTM as a prerequisite for LLTM is that the unidimensional Rasch model should hold first. Therefore, cognitive operations in LLTM are not independent dimensions of ability, unlike those recognized in factor analysis; otherwise the unidimensional Rasch model would not have fitted the data in the first place. Bejar (1983) states that unidimensionality does not mean that a single operation is at work when test-takers answer the items. In fact, there might be several operations and processes involved but as long as they work in unison unidimensionality would hold. The processes under LLTM, while being psychologically separate, work in unison and are heavily interconnected. Research in the area of foreign language abilities shows that correlational methods fail to distinguish such processes as the final conclusion of most such studies is that language subskills are not discriminable. For instance, in a recent study Goh and Aryadoust (2014) in the context of listening comprehension in English as a second language using CFA found that their postulated listening subskills were highly correlated “making the models inadmissible because discriminant validity has been violated” (p. 16). Only when the higher-order aggregate-level CFA was employed they managed to show the separability of the postulated subskills. They further continue that “….subskills … were empirically divisible, should the right modeling approach be used” (p. 19). It is, therefore, emphasized that the theoretical status of subskills under LLTM is that of simultaneous parallel processes which are highly interdependent possibly, as suggested by Goh and Aryadoust (2014), due to the existence of a higher-order general factor. We argue that the model is superior to correlational methods in the study of foreign language subskills since language processing entails parallel processing of interconnected or associative neural networks in the brain (Bechtel & Abrahamsen, 1991). We believe that LLTM is an answer to the long-running debate in the field of language testing on the dimensionality of language constructs (McNamara, 1996). 2. Previous applications of the LLTM LLTM has been applied to educational and psychological contexts to identify two categories of processes which might contribute to item difficulty: construct-related processes and construct-irrelevant processes. Kubinger (2008, & 2009) pointed to some possible applications of LLTM to identify the effect of some construct-irrelevant processes such as item position effects, speeded presentation of items, contentspecific learning, and item response format. Embretson and Wetzel (1987) employed LLTM to test a model of multiple-choice (MC) paragraph comprehension items. They proposed a model with two processing stages for paragraph comprehension: (1) a text representation process which referred to textual characteristics such as the number of proposition and arguments in passages and (2) a decision process with three events: translating the visual stimuli of the alternatives into a meaningful representation, locating the relevant text for evaluating alternatives, and falsifying or confirming alternatives. The results of their study showed that decision processes contribute to item difficulty considerably more than text representation processes and, therefore, reading comprehension item difficulty depends more on response decision processes than the paragraph. They conclude that two uncorrelated factors are involved in MC reading items: verbal ability, i.e., understanding the text and reasoning ability, i.e., the ability to select the correct alternative. Gorin (2005) investigated variations in item difficulty due to experimentally manipulating four

102

P. Baghaei, H. Ravand / Learning and Individual Differences 43 (2015) 100–105

item characteristics: propositional density and syntax, negative wording and passive voice, information order, and response alternatives. Contrary to the findings of the study by Embretson and Wetzel (1987) she found no effect for propositional density, information order, and response alternatives. More recently, Sonnleitner (2008), informed by Embretson and Wetzel's model, used LLTM to study the effect of two groups of characteristics (i.e., input-related and response-related) on item difficulties of a German reading comprehension test. He found that item difficulties cannot be explained solely based on input-related characteristics. From the 11 characteristics which had significant effects on item difficulties, four turned out to be response-related. As mentioned above the taxonomies of reading comprehension subskills have been validated with multiple regression and factor analytic models. However, in this study the aim is to investigate the underlying cognitive components of reading comprehension in a foreign language with a cognitive processing model. The assumption is that there are a number of cognitive processes which contribute to test item difficulty. That is, the difficulty of items is a function of the operations needed to solve them. Identifying these operations, parameterizing them and determining their contribution to item difficulty explains what reading comprehension in a foreign language is and broadens our understanding of the FL reading construct which in turn helps in devising teaching and testing methods.

3. Methodology 3.1 Instrument and participants The test analyzed in this study is the reading comprehension section of the Iranian National University Entrance Examination (INUEE), a four-option multiple-choice high-stakes test held annually to admit candidates to master's programs in English studies. The test is an advanced assessment designed for candidates holding a bachelor's degree who seek to pursue their studies for a master's degree in state universities. The test is composed of four sections of grammar (10 items), vocabulary (20 items), cloze (10 items), and reading comprehension (20 items). The candidates are supposed to answer the test in 60 min. The 20 reading comprehension items and a sample of 400 candidates (67% females and 33% males) who took the test in 2012 were selected for this study. The main part of the sample — 97% of the students — responded on each of the 20 items. Two percent of the students had one missing response and the maximum number of missing item responses was 19 missing items in one observation. The missing responses were left as missing data because no information was available why the students omitted these items. Because of the extremely low percentage of missing, they should not have seriously affected the results. As to the reading passages, the first one was a relatively long passage of 720 words on natural selection theory followed by 7 questions. The second passage, about 524 words, discussed wastes and the precautions taken by governments against its harmful effects and was followed by 6 comprehension questions and the third passage consisted of 584 words and discussed the change in the family structure and function over centuries, followed by 7 questions.

3.2 Analysis 3.2.1 Q-matrix specification The Q-matrix used in this study was adopted from Ravand (in press). He investigated the application of the generalized deterministic input, noisy “and” gate model (G-DINA; de la Torre, 2011) to the reading comprehension section of the INUEE. Unfortunately there is no any information available on the theoretical model, if any, that informs the construction of the reading comprehension items of this examination, therefore Ravand took the following steps to ensure that the subskills identified were valid: (a) the author drew up an initial list of reading subskills from the reading theories and literature (e.g., Grabe, 2009, Weir, Hawkey, Green, & Devi, 2009), (b) six university instructors with at least three years of experience in teaching and testing reading comprehension were invited to brainstorm on the possible attributes measured by each item on the test. According to Lee and Sawaki (2009, p.176), to develop the Q-matrix for tests where a detailed cognitive model of task performance is not available, “brainstorming about possible attributes that elaborate on an existing test specification might serve as a good point of departure”. Attributes on which at least two thirds (i.e., four) of the coders agreed were included into the Q-matrix. A Fleiss Kappa agreement rate of .59 indicated a moderate agreement among the coders (Landis & Koch, 1977a, 1977b), and (c) the Q-matrix was empirically validated and revised using the procedure proposed by de la Torre and Chiu (2010). The proposed procedure makes general rather than compensatory/noncompensatory assumptions about the relationships between subskills and the probability of the correct answer. According to the Q-matrix construction phase of the study, there were five attributes underlying performance on the reading comprehension section of the INUEE: reading for details, reading for inference, reading for main idea (henceforth referred to as Detail, Inference, and Main Idea, respectively), syntax, and vocabulary, as shown and defined in Table 1. For a detailed process of Q-matrix development and revision refer to Ravand (in press).

3.2.2 LLTM analysis A prerequisite for LLTM is that the standard Rasch model should fit the data first (Fischer, 1973; see also Baghaei & Kubinger, 2015). eRm package (Mair, Hatzinger, & Mair, 2014) in R (2012) was used to run the Rasch model and the LLTM analyses. Andersen's (1973) likelihood ratio (LR) test with the mean of raw scores as the partitioning criterion showed that the 20 items do not fit the Rasch model, χ2 = 49.52, df = 19; p = b 0.001. Graphical model check showed that three items lie far from the 45° line. In this approach, items are calibrated separately within two subsamples and after bringing them onto a common scale are cross-plotted against each other. If the data fit the Rasch model we expect the items to fall close to a 45° line. Items falling off the line are misfitting items. After deleting the three items Rasch model was fitted again. Andersen's LR test showed that the 17 remaining items fit the Rasch model with mean of raw scores a splitting criterion, χ2 = 20.32, df = 16; p = .206. Furthermore, Andersen's LR test was run with the median of raw scores and gender as splitting criteria. In both cases the 17 items fitted the Rasch model, χ2 = 21.50, df = 16, p = .16 (median), χ2 =

Table 1 Definition of subskills. Attribute

Description

Reading for details

Locating specific information by matching syntactic and/or lexical information or identifying a paraphrase or synonym of the literal meaning of a word, phrase or sentence in the relevant part of the sentence Using information not directly stated in the passage (e.g., background knowledge or topical knowledge) to draw inferences and conclusions Using grammar, punctuation, parts of speech, etc. to understand sentence meaning and structure Using contextual clues, phonological, and vocabulary knowledge to determine the meaning of a word or phrase Identifying what the most important idea in a passage is or what the passage is about

Generating inferences Syntax Vocabulary Reading for main idea

P. Baghaei, H. Ravand / Learning and Individual Differences 43 (2015) 100–105 Table 2 Item difficulty parameters, standard errors and fit values. Item

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Estimate

SE

−.23 −.25 1.08 .47 −.59 −1.69 −.17 −.45 −.02 .35 −.81 −.54 1.12 .83 .60 .83 −.51

Outfit

.11 .11 .13 .12 .10 .11 .11 .10 .11 .11 .10 .10 .14 .13 .12 .13 .10

Infit

MNSQ

MNSQ

1.17 1.06 1.01 .95 1.06 .93 1.10 .98 .79 .96 .94 .94 .89 .81 . 90 .85 1.03

1.11 1.07 .96 .96 1.02 .98 1.05 .96 .88 .97 .95 .95 .94 .89 1.02 .96 .99

19.84, df = 15; p = .178 (gender). The Cronbach's alpha reliability of the test with 17 items was .74. The moderate reliability is due to the small number of items. Table 2. shows the difficulty parameters for the 17 items, their standard errors and their infit and outfit mean square (Wright & Stone, 1979) values. The infit and outfit values are within acceptable range of .70–1.30 (Bond & Fox, 2007). LLTM was applied to the data with the Q-matrix containing the five processes assigned to the 17 items. Table 3 shows the easiness of the five processes, their standard errors, and their 95% confidence intervals. The easiness estimates for the operations , like item easiness parameters, show the difficulty of each cognitive process. The table also shows that the contribution of the processes to item difficulty is significantly different from 0, p b 0.05. Inference-making is the hardest process to use and vocabulary the easiest. Positive easiness parameters indicate that the process makes the item easier rather than harder. As the table shows vocabulary with a positive easiness parameter estimate makes items easier. Reading for detail is the second hardest process to employ, followed by main idea and syntax. The correlation coefficient between LLTM-reconstructed and RM-based item parameters was 0.75. A likelihood ratio test was used to compare the fit of the LLTM with that of the RM. Results showed that the RM fits significantly better than LLTM, χ2 = 309, df = 11, p b 0.01. 4. Discussion In this study an attempt was made to explain the cognitive processes underlying an advanced foreign language reading comprehension test. Five processes were hypothesized to be involved in answering the items, namely, reading for details, making inferences, understanding main ideas, syntax, and vocabulary. Making inferences turned out to be the most difficult process and vocabulary the easiest. It seems that for advanced EFL learners vocabulary is not a source of difficulty in reading comprehension. These findings are in line with those of Svetina, Gorin, and Tatsuoka (2011) who

Table 3 Basic parameter estimates with their standard errors and confidence intervals. Basic parameters

Easiness

SE

95% CI

Detail Inference Main idea Syntax Vocabulary

−.59 −1.00 −.43 −.27 .54

.08 .08 .09 .09 .07

(−.77|−.42) (−1.17|−.84) (−.62|−.24) (−.46|−.08) (.40|.68)

103

used rule-space methodology to study the cognitive components of reading in English as a second and first language. Their finding showed that the most difficult reading skill for both readers in first language and second language is understanding implicit ideas and the easiest is word meaning. This hierarchy of difficulty of L2 reading attributes concurs with previous research (Grabe & Stoller, 2002; Lumley, 1993). Harding, Alderson, and Brunfaut (2015) argued that “it is probably reasonable to accept that both L1 and L2 readings involve a number of different levels of ability” (p. 4). According to Harding et al. (2015), syntax and vocabulary are lower-level attributes and understanding the main idea, making inferences, and understanding specific details are higherlevel L2 reading processes. Understanding the main idea of a reading passage involves knowledge of vocabulary, grammar, discourse, and employing different cognitive processes (Pressley, 2002). In a similar vein, inference-making is a complex attribute hence difficult to master (Long, Seely, Oppy, & Golding, 1996). Inference-making involves understanding both the literal and implied meanings of a text. Both main idea and inference-making were identified as the most difficult subskills by Grabe (2009) because they involved higher-level processing of the information in the passages. The subskills found in the present study concur with theory and research on reading comprehension. From the view point of the major components of reading comprehension proposed by Grabe (2009), we found that syntactic knowledge, vocabulary knowledge, inferencemaking, main idea comprehension, and recall of relevant details were consequential processes. Although Grabe's classification of the subcomponents of reading includes 14 categories, he acknowledges that constraints of standardized assessment limit the number of the components that can be captured in reading assessment tasks. The findings of the present study challenge Rost's (1993) suggestion that the subskills may become intermingled and undifferentiated at high proficiency levels. Our findings suggest that the subskills can be separated even at advanced proficiency levels if IRT-based noncorrelational methods are employed. Identifying the processes which underlie reading comprehension is essential for explicating the nature of reading and helping teachers develop effective methods of teaching. Interventions to help struggling readers should be guided by the reading processes (Rapp, Broek, McMaster, Kendeou, & Espin, 2007). LLTM can provide explanatory information on why test takers respond as they do. It decomposes tasks into strategies, processes, and knowledge required to perform successfully in each task thereby help teachers to replace student's faulty strategies (Embretson, 1983). Identification of the processes also aids construct validation and item construction. If, as noted by Messick (1989), the target of educational measurement is to make inferences, even if only tacitly, about test takers' mental or psychological processes, LLTM can aid fulfill the target. Having identified the components that make reading comprehension items difficult and calibrated their difficulties, language testers can construct items with desired levels of difficulty without having to pilot the items. Likelihood ratio test showed that the Rasch model fits significantly better than LLTM. Better fit of the Rasch model compared to LLTM has been observed by other researchers too (Fischer, 2005; Fischer & Formann, 1982). The reason is that large samples and a few parameters are used to compare the models (Fischer & Formann, 1982) and such model comparison tests “ought not to be over-rated” (p. 412). The correlation between the item parameters estimated by the Rasch model and those reconstructed by LLTM was .75, which means that 56% of item difficulty can be explained by our reading model postulated in terms of the five reading subskills. This amount of variance explanation is substantial considering the reading literature (Embretson & Wetzel, 1987) and the small number of cognitive processes postulated in the study. Reading is an extremely complex skill and examinees can employ processes which have not been hypothesized by the test developers.

104

P. Baghaei, H. Ravand / Learning and Individual Differences 43 (2015) 100–105

The poorer fit of the LLTM compared to the RM in this study may be due to any or both of the following reasons: (1) more text comprehension or construct-relevant processes should have been included into the model, and (2) response decision processes or construct-irrelevant processes should have also been modeled. Embretson and Wetzel (1987) showed that in reading comprehension tests response decision processes i.e., cognitive operations involved in selecting the correct alternative in MC items, affect MC item difficulty more than text representation processes. In the present study, only text comprehension processes were modeled. Sonnleitner (2008) found that LLTM could not predict item difficulties of his reading comprehension data when only text comprehension processes were included in the model. When both types of processes were included, he found a high correlation (r = .99) between the LLTM-reconstructed item parameters and the Rasch model item parameters. Inclusion of response decision processes, which is a reasoning component uncorrelated with text comprehension (Embretson & Wetzel, 1987), would have accounted for a much higher portion of the variance in item difficulties. Another plausible explanation for the relatively low correlation between the Rasch item difficulties and the cognitive processes identified is that cognitive models such as LLTM usually assume uniform response processes for all the respondents. It might be more viable to assume “different response strategies of different groups of respondents for the same set of tasks” (Rupp & Templin, 2008, p., 236). Multiple response strategies can be captured by the CDM model proposed by de la Torre and Douglas (2008), wherein different solution strategies are represented via different Q-matrices and also by the approach suggested by Embretson (1997), where different solution strategies are represented through a weighted mixture of conjunctive Rasch models. After all, test takers might have utilized cognitive processes different from those envisaged by the expert judges. Furthermore, it is unrealistic to assume that the postulated cognitive operations or basic parameters explain all the variance in item difficulty. There is always some random item and person effects or noise in the data which cannot be attributed to faults in the postulated cognitive model (Zeuch, Holling, & Kuhn, 2011). Application of random effects LLTM (Janssen, Schepers, & Peres, 2004; Van den Noortgate, De Boeck, & Meulders, 2003) can demonstrate to what extent random item and person effects influence item difficulty and solution processes. 5. Limitations and further research Limitations of the study are related to either the psychometric model (i.e., LLTM) or the design of the study. LLTM is very sensitive to Q-matrix misspecifications however there is no empirical method of Q-matrix validation especially designed to match the approach. Therefore, Qmatrix development is a completely subjective process usually carried out by teachers and content experts, which make the process prone to error. The LLTM version employed in the present study does not include an error term pointing to possible misspecifications in the Q-matrix. The conventional LLTM is “like a regression model that explains all the variance, which is almost always rejected” (De Boeck et al., 2011). The random-effect extension of the LLTM has been proposed (Janssen et al., 2004; Van den Noortgate et al., 2003) that includes an error term which allows for imperfect predictions. The same concerns have been echoed by Green and Smith (1987), enumerating limitations of the conventional LLTM: (1) It is not possible to include all the cognitive operations which are involved in problem solving in the model and one runs the risk of focusing on observable aspects of the items instead of the actual processes and strategies that examinees use to arrive at the solutions. (2) The model assumes that the difficulty of the items is the linear combination of cognitive operation (CO) difficulties. This assumption may not be warranted in light of our knowledge of CO's for solving items. (3) The model assumes that the same CO's are used by all examinees, while different examinees may

use different ways to arrive at the solutions. In the present study, only some input-related processes were modeled. Future studies should model both types of parameters to obtain a fuller picture of the cognitive operations involved in foreign language reading comprehension. This study conducted LLTM on only one form of an advanced reading comprehension test. The study should be replicated with test forms other than multiple-choice and with other samples similar to the participants of the present study before general claims can be made about the cognitive processes underlying reading comprehension performance. Identification of the cognitive operations underlying performance on the reading comprehension section of the INUEE was carried out through expert and student judgment. However, a think-aloud verbal protocol analysis of test takers' responses would have resulted in more authentic determination of the attributes required to perform successfully on the test.

6. Conclusion The study shows the viability of the five postulated solution processes involved in answering reading comprehension items and their power to explain item complexity. Generating inferences has the highest impact on item difficulty, while reliance of items on vocabulary makes items easier for advanced students. The findings help explain FL reading comprehension, develop programs for struggling readers, and optimize item construction and automatic rule-based item generation in large item pools.

References Alderson, J. (2000). Assessing reading. New York: Cambridge University Press. Alderson, J.C. (2005). Diagnosing foreign language proficiency: The interface between learning and assessment. London: Continuum. Alderson, J.C., & Lukmani, Y. (1989). Cognition and reading: Cognitive levels as embodied in test questions. Reading in a foreign language, 5. (pp. 253–270). Andersen, E.B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140. Anderson, N.J., Bachman, L., Perkins, K., & Cohen, A. (1991). An exploratory study into the construct validity of a reading comprehension test: Triangulation of data sources. Language Testing, 8, 41–66. Bachman, L.F., Davidson, F., & Milanovic, M. (1996). The use of test method characteristics in the content analysis and design of EFL proficiency tests. Language Testing, 13, 125–150. Baghaei, P., & Kubinger, K.D. (2015). Linear logistic test modeling with R. Practical assessment, research & evaluation, 20, 1–11 Available online http://pareonline.net/ getvn.asp?v=20&n=1. Bechtel, W., & Abrahamsen, A. (1991). Connectionism and the mind: An introduction to parallel processing in networks. Oxford, England: Basil Blackwell. Bejar, I.I. (1983). Achievement testing: Recent advances. CA Sage: Beverly Hills. Bond, T.G., & Fox, C.M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences (2nd ed.). Mahwah, NJ: Erlbaum. Buck, G., Tatsuoka, K., & Kostin, I. (1997). The subskills of reading: Rule-space analysis of a multiple-choice test of second language reading comprehension. Language Learning, 47, 423–466. Carr, T.H., & Levy, B.A.E. (1990). Reading and its development: Component skills approaches. San Diego, CA: Academic Press. Carroll, J.B. (1993). Human cognitive abilities. Cambridge, UK: Cambridge University Press. Carver, R.P. (1992). What do standardized tests of reading comprehension measure in terms of efficiency, accuracy, and rate? Reading Research Quarterly, 27, 347–359. Cohen, J., & Cohen, P. (1983). Applied multiple regression: Correlational analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum. Davis, F.B. (1968). Research in comprehension in reading. Reading Research Quarterly, 4, 499–545. De Boeck, P., Bakker, M., Zwitser, R., Nivard, M., Hofman, A., Tuerlinckx, F., & Partchev, I. (2011). The estimation of item response models with the lmer function from the lme4 package in R. Journal of Statistical Software, 39, 1–28. de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179–199. de la Torre, J., & Douglas, J. (2008). Model evaluation and multiple strategies in cognitive diagnosis: an analysis of fraction subtraction data. Psychometrika, 73, 595–624. Drum, P.A., Calfee, R.C., & Cook, L.K. (1981). The effects of surface structure variables on performance in reading comprehension tests. Reading Research Quarterly, 16, 486–514. Embretson, S. (1984). A general latent trait model for response processes. Psychometrika, 49, 175–186. Embretson, S.E. (1983). Construct validity: Construct representation and nomothetic span. Psychological Bulletin, 93, 179–197.

P. Baghaei, H. Ravand / Learning and Individual Differences 43 (2015) 100–105 Embretson, S.E. (1997). Multicomponent response models. In W.J. van der Linden, & R.L. Hambleton (Eds.), Handbook of modern item response theory (pp. 305–321). New York: Springer. Embretson, S.E., & Wetzel, C.D. (1987). Component latent trait models for paragraph comprehension tests. Applied Psychological Measurement, 11, 175–193. Fischer, G.H. (1973?). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374. Fischer, G.H., & Formann, A.K. (1982). Some applications of logistic latent trait models with linear constraints on the parameters. Applied Psychological Measurement, 4, 397–416. Fischer, G.H., & Pendl, P. (1980). Individualized testing on the basis of the dichotomous Rasch model. In L.J.T. van der Kamp, W.F. Langerak, & D.N.M. de Gruijter (Eds.), Psychometrics for educational debates (pp. 171–188). New York: Wiley. Fischer, G.H. (2005). Linear logistic test models. Encyclopedia of Social Measurement, 2, 505–514. Freedle, R., & Kostin, I. (1991). The prediction of SAT reading comprehension item difficulty for expository prose passages. ETS research report RR-91–29. Princeton, NJ: Educational Testing Service. Freedle, R., & Kostin, I. (1992). The prediction of GRE reading comprehension item difficulty for expository prose passages for each of three item types: main ideas, inferences, and explicit statements. ETS research report RR-91-59. Princeton, NJ: Educational Testing Service. Freedle, R., & Kostin, I. (1993). The prediction of TOEFL reading item difficulty: Implications for construct validity. Language Testing, 10, 133–170. Goh, C.C., & Aryadoust, V. (2014). Examining the notion of listening subskill divisibility and its implications for second language listening. The Intl. Journal of Listening, 28, 1–25. Gorin, J.S. (2005). Manipulating processing difficulty of reading comprehension questions: The feasibility of verbal item generation. Journal of Educational Measurement, 42, 351–373. Grabe, W. (1991). Current developments in second language reading research. TESOL Quarterly, 25, 375–406. Grabe, W. (2000). Reading research and its implications for reading assessment. In A. Kunnan (Ed.), Fairness and validation in language assessment (studies in language testing 9 (pp. 226–262). Cambridge: Cambridge University Press. Grabe, W. (2009). Reading in a second language: Moving from theory to practice. Cambridge, England: Cambridge University Press. Grabe, W., & Stoller, F.L. (2002). Teaching and researching reading. New York: Pearson Education. Green, K.E., & Smith, R.M. (1987). A comparison of two methods of decomposing item difficulties. Journal of Educational and Behavioral Statistics, 12, 369–381. Harding, L., Alderson, C., & Brunfaut, T. (2015). Diagnostic assessment of reading and listening in a second or foreign language: elaborating on diagnostic principles. Language Testing, 32, 317–336. Hudson, T. (1996). Assessing second language academic reading from a communicative competence perspective: Relevance for TOEFL 2000. Princeton, NJ: Educational Testing Service. Hughes, A. (2003). Testing for language teachers (2nd ed.). New York: Cambridge University Press. Jang, E.E. (2009). Cognitive diagnostic assessment of L2 reading comprehension ability: Validity arguments for fusion model application to language assessment. Language Testing, 26, 031–073. Janssen, R., Schepers, J., & Peres, D. (2004). Models with item and item group predictors. In P. de Boeck, & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 189–210). New York: Springer. Koda, K. (2007). Reading and language learning: Crosslinguistic constraints on second language reading development. Language Learning, 57, 1–44. Kubinger, K.D. (2008). On the revival of the Rasch-model based LLTM: From constructing tests using item generating rules to measuring item administration effects. Psychological Science Quarterly, 50, 311–327. Kubinger, K.D. (2009). Applications of the linear logistic test model in psychometric research. Educational and Psychological Measurement, 69, 232–244. Landis, J.R., & Koch, G.G. (1977a). An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, 33, 363–374.

105

Landis, J.R., & Koch, G.G. (1977b). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174. Lee, Y. -W., & Sawaki, Y. (2009). Cognitive diagnosis approaches to language assessment: An overview. Language Assessment Quarterly, 6, 172–189. Lennon, R.T. (1962). What can be measured? The Reading Teacher, 15, 326–337. Long, D.L., Seely, M.R., Oppy, B.J., & Golding, J.M. (1996). The role of inferential processing in reading ability. In B.K. Britton, & A.C. Graesser (Eds.), Models of understanding text (pp. 189–214). Hillsdale, NJ: Lawrence Erlbaum Associates. Lumley, T. (1993). The notion of subskills in reading comprehension tests: An EAP example. Language Testing, 10, 211–234. Mair, P., Hatzinger, R., & Mair, M.J. (2014). eRm: Extended Rasch modeling [Computer software]. R package version 0.15–4http://CRAN.R-project.org/package=eRm McNamara, T.F. (1996). Measuring second language performance. London: Longman. Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18, 5–11. Munby, J. (1978). Communicative syllabus design. Cambridge: Cambridge University Press. Nevo, N. (1989). Test-taking strategies on a multiple-choice test of reading comprehension. Language Testing, 6, 199–215. Pressley, M. (2002). Comprehension strategies instruction: A turn-of-the-century status report. In C.C. Block, & M. Pressley (Eds.), Comprehension instruction: Research-based best practices (pp. 11–27). New York: Guilford Press. Rapp, D.N., Broek, P.V.D., McMaster, K.L., Kendeou, P., & Espin, C.A. (2007). Higher-order comprehension processes in struggling readers: A perspective for research and intervention. Scientific Studies of Reading, 11, 289–312. Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests (Expanded Ed.). Chicago, IL.: University of Chicago Press Originally published 1960, Pædagogiske Institut, Copenhagen. Ravand, H. (2015). Applications of a cognitive diagnostic model to a high-stakes reading comprehension test. Journal of Psychoeducational Assessment in press. Rost, D.H. (1993). Assessing different components of reading comprehension: Fact or fiction? Language Testing, 10, 79–92. Rupp, A.A., & Templin, J.L. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement, 6, 219–262. Schedl, M., Gordon, A., Carey, P.A., & Tang, K.L. (1996). An analysis of the dimensionality of TOEFL reading comprehension items. TOEFL research report no. 53. Princeton, NJ: Educational Testing Service. Sonnleitner, P. (2008). Using the LLTM to evaluate an item-generating system for reading comprehension. Psychology Science Quarterly, 50, 345–362. Svetina, D., Gorin, J.S., & Tatsuoka, K.K. (2011). Defining and comparing the reading comprehension construct: A cognitive-psychometric modeling approach. International Journal of Testing, 11, 1–23. Taber, K.S. (2004). Learning quanta: Barriers to stimulating transitions in student understanding of orbital ideas. Science Education, 89, 94–116. de la Torre, J., & Chiu, C. -Y. (2010). A general method of empirical Q-matrix validation using the G-DINA model discrimination index. Paper presented at the annual meeting of the National Council on Measurement in Education, Denver, CO. Van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28, 369–386. Weir, C.J., & Porter, D. (1996). The multi-divisible or unitary nature of reading: The language tester between Scylla and Charybdis. Reading in a Foreign Language, 10, 1–19. Weir, C., Hawkey, R., Green, A., & Devi, S. (2009). In L. Taylor (Ed.), The relationship between the academic reading construct as measured by IELTS and the reading experiences of students in their first year of study at a British University. IELTS Research Reports, 9. (pp. 15–190). Canberra: IELTS Australia: Pty Ltd & the British Council. Weir, C., Huizhong, Y., & Yan, J. (2000). An empirical investigation of the componentiality of L2 reading in English for academic purposes. 12, . Cambridge: Cambridge University Press. Wright, B.D., & Stone, M.H. (1979). Best test design. Chicago: MESA Press. Zeuch, N., Holling, H., & Kuhn, J.T. (2011). Analysis of the Latin square task with linear logistic test models. Learning and Individual Differences, 21, 629–632.