4 Nov 2017 - 2. Three keystones. â« Power of educational assessment (purposes and decisions). â« Network of stable mea
How can invariant measurement based on Rasch models inform educational assessments? George Engelhard, Jr. The University of Georgia Presentation at the International Conference on Educational Measurement, Evaluation and Assessment in November 2017 (Abu Dhabi, United Arab Emirates)
November 4, 2017 1
Three keystones
Power of educational assessment (purposes and decisions) Network of stable measures (invariant measurement) Rasch measurement theory
2
Three influential mentors
Benjamin Bloom Benjamin D. Wright Georg Rasch
3
1 - Influential mentors Professor Benjamin Bloom
4
Bloom (1970, p. 26)
It is no great exaggeration to compare the power of testing on human affairs with the power of atomic energy. Both are capable of great positive benefit to all of mankind and both contain equally great potential for destroying mankind. If mankind is to survive, we must continually search for the former and seek ways of controlling or limiting the later. Assessment as powerful technology 5
Examples
We assess what we value … reading and mathematics stressed, and less attention to science and social studies in the United States Promotion and graduation tests Certification and licensure tests
Assessment is where the rubber meets the road … 6
2 - Influential mentors Professor Ben Wright
7
Examples
Objective and invariant measurement Using the Rasch model to solve measurement problems Theory into practice
8
Wright (1968, p. 87) First, the calibration of measuring instruments must be independent of those objects that happen to be used for the calibration. Second, the measurement of objects must be independent of the instrument that happens to be used for the measuring.
Science is impossible without an evolving network of stable measures (Wright, 1997, p. 33) 9
3 - Influential mentors Georg Rasch
Specific objectivity Rasch Model
exp(θ n − δ i1) φ ni1 = 1 + exp(θ n − δ i1) 10
Rasch’s motivation
11
Rasch’s motivation
12
Three keystones
Power of educational assessment (purposes and decisions) Network of stable measures (invariant measurement) Rasch measurement theory
13
Overview
I. Educational assessment
II. Invariant measurement
Purposes and decisions Scores
Item-invariant measurement of persons Person-invariant calibration of items Invariant continuum
III. Rasch measurement theory
Wright Map Item and person fit Person response functions
14
I. Educational Assessments
The purposes of educational assessments are to make decisions on the basis of scores. Macro to micro focus (international, national, state, school, teacher, or student) Operational definition of the educational outcomes that we value as a society
What is reading? What is mathematics? What is science? What is English language proficiency?
15
Definition of a score … the term score is used generically in its broadest sense to mean any coding or summarization of observed consistencies or performance regularities on a test, questionnaire, observation procedure, or other assessment devices such as work samples, portfolios, and realistic problem simulations. Messick (1995, p. 741) 16
Assessment of individual students
Focus on individual students Purposes of classroom assessments Before instruction: readiness, placement During instruction: formative, diagnostic After instruction: summative, strategic
What do students know, what can students do, and what should students learn next … ? 17
Test Standards
(AERA, APA, & NCME, 2014)
Validity refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests (p. 11) Standard 1.0 Clear articulation of each intended test score interpretation for a specified use should be set forth, and appropriate validity evidence in support of each intended interpretation should be provided (p. 23)
18
Validity Validity is not a property of the test or assessment as such, but rather of the meaning of the test scores. These scores are a function not only of the items or stimulus conditions, but also of the persons responding as well as the context of the assessment. Messick (1995, p. 741)
19
Scores
Scores are a function not only of the items, but also of the persons responding as well as the context of the assessment. Scores = f(persons, items, context)
Context: Purpose and decisions based on scores
Student motivation
20
Persons
Scores Items
Low/Easy
Context
High/Hard 21
II. Invariant measurement The scientist is usually looking for invariance whether he knows it or not (Stevens, 1951, p. 20) The scientist seeks measures that will stay put while his back is turned (Stevens, 1951, p. 21)
22
Albert Einstein (Physics) Einstein, however, was not truly a relativist … beneath all of his theories, including relativity, was a quest for invariants … and the goal of science was to discover it (Isaacson, 2007, p. 3) Relativity/invariance 23
Invariant measurement
24
“Invariance of item and person measures remains the exception rather than the rule … the context-dependent nature of estimates in human science research … seems to be the antithesis of the invariance we expect across thermometers and temperatures” (Bond & Fox, 2015, p. 85)
25
Science is impossible without an evolving network of stable measures (Wright, 1997, p. 33)
Wright Map as part of a “road map” to teaching and learning No one way: origin, path and destination will vary for each person Learning maps (dynamic learning maps and learning progressions)
Maps! 26
III. Rasch measurement theory Wright Map: Latent continuum (line) Item and person fit Person response functions
exp(θ n − δ i1) φ ni1 = 1 + exp(θ n − δ i1) 27
The latent continuum (line): Wright map
Low/Easy
High/Hard 28
29
Metametrics: Lexiles https://lexile.com/tools/lexile-map/
30
θ
Continuum with Item sets A, B and C A
B
C Low /Easy
High/Hard
Items Item Sets
1
2
3
Score
A: Hard
1
0
0
1
B: Medium
1
1
0
2
C: Easy
1
1
1
3 31
Guttman and Rasch
32
θ
Continuum with three response patterns with scores of 3 …
???
Low /Easy
High/Hard
Items Person
1
2
3
4
5
6
Score
A
1
1
1
0
0
0
3
B
0
1
1
0
1
0
3
C
0
0
0
1
1
1
3 33
Bring the person back into measurement Identify unusual response patterns Different ways of getting a “3”
What are the implications for the decisions made for these students who received a “3” in different ways?
34
35
36
Four Components of Scores
Theta: Location on the line SEM: Probabilistic uncertainty Person fit: Validity of response pattern Visual display for unusual response patterns Crossing person response functions Residual analyses
37
Learning about Rasch measurement theory
[In press]
38
Summary
Assessment is a technology that defines what we value in education Assessment systems define a stable system of measures to represent the constructs Test scores are used to make decisions about individual students, and we should consider Four components of scores: Theta, SEM, person fit, visual display Validity of response pattern
39
Final Word Professor Ben Wright
What is the construct?
Where is the Wright map?
Is the Wright map a valid representation of construct? 40