Using and Improving Coding Guides For and By

0 downloads 0 Views 6MB Size Report
short text responses play a crucial part in educational assessment .... e.g. bulllies took his money and they also dunked him in a nearby stream ...... Linguistics (Ed.), Proceedings of the Tenth Workshop on Innovative Use of NLP for Building ...
Using and Improving Coding Guides For and By Automatic Coding of PISA Short Text Responses There and Back Again Fabian Zehner1,3 , Frank Goldhammer2,3 , and Christine Sälzer1,3 ASSESS 2015; Atlantic City, NJ; Nov 14, 2015

1

Technical University of Munich, 2 German Institute for International Educational Research (DIPF), 3 Centre for International Student Assessment (ZIB) e.V.

Introduction

Automatic Coding

Outline

1

Introduction

2

Automatic Coding

3

Methods

4

Results

5

Discussion

Methods

Results

Discussion

References

Introduction

Automatic Coding

Methods

Results

Discussion

Automatic Coding

– short text responses play a crucial part in educational assessment – technologies for automatic evaluation are progressing in the last two decades

References

Introduction

Automatic Coding

Methods

Results

Discussion

References

Automatic Coding

– short text responses play a crucial part in educational assessment – technologies for automatic evaluation are progressing in the last two decades – but most systems rely on relatively large data ,→ different research groups are striving to train models with less but most informative data (Basu, Jacobs, & Vanderwende, 2013; Dronen, Foltz, & Habermehl, 2014; Ramachandran & Foltz, 2015; Sukkarieh & Stoyanchev, 2009; Zesch, Heilman, & Cahill, 2015)

Introduction

Automatic Coding

Methods

Results

Discussion

Automatic Coding ct’d The Term Automatic Coding Coding

instead of

Scoring, Grading, ...

References

Introduction

Automatic Coding

Methods

Results

Discussion

Automatic Coding ct’d The Term Automatic Coding instead of Coding allows neutral, nominal categories Extraversion vs. Conscientiousness I take time out for others. Extraversion

Conscientiousness

Neuroticism Agreeableness

Openness to experience

Scoring, Grading, ...

References

Introduction

Automatic Coding

Methods

Results

Discussion

Automatic Coding ct’d The Term Automatic Coding instead of Coding Scoring, Grading, ... allows neutral, nominal categories refer to a natural order of categories Extraversion vs. Conscientiousness

grade A is better than grade B

I take time out for others.

The story is about a girl falling into and wandering through a fantasy world.

Extraversion

Conscientiousness

Neuroticism Agreeableness

Openness to experience

A > B > C > D > E > F

References

Introduction

Automatic Coding

Methods

Results

Discussion

Automatic Coding ct’d The Term Automatic Coding Coding allows neutral, nominal categories

Scoring, Grading, ... refer to a natural order of categories



Extraversion vs. Conscientiousness

grade A is better than grade B

I take time out for others.

The story is about a girl falling into and wandering through a fantasy world.

Extraversion

Conscientiousness

Neuroticism Agreeableness

Openness to experience

A > B > C > D > E > F

References

Introduction

Automatic Coding

Methods

Results

Discussion

Automatic Coding ct’d The Term Automatic Coding Coding allows neutral, nominal categories

Scoring, Grading, ... refer to a natural order of categories



Extraversion vs. Conscientiousness

grade A is better than grade B

I take time out for others.

The story is about a girl falling into and wandering through a fantasy world.

Extraversion

Conscientiousness

Neuroticism Agreeableness

A > B > C > D > E > F

Openness to experience

– not only a matter of terminology but also of methodology – e.g., consider regression

References

Introduction

Automatic Coding

Methods

Results

Discussion

References

Automatic Coding & Coding Guides The article on the opposite page appeared in a Japanese newspaper in 1996. Refer to it to answer the questions below.

– coding guides for human coding often at hand ◦ comprise reference responses: prototypes for their coding ,→ offer the possibility to start model training

Question 2: BULLYING

R118Q02- 0 1 8 9

Why does the article mention the death of Kiyoteru Okouchi?

................................................................................................................................... ...................................................................................................................................

BULLYING SCORING 2

QUESTION INTENT: Developing an Interpretation: linking local and global cohesion Full credit Code 1:

Relates the bullying-suicide incident to public concern and / or the survey OR refers to the idea that the death was associated with extreme bullying. Connection may be explicitly stated or readily inferred.

To explain why the survey was conducted. To give the background to why people are so concerned about bullying in Japan. He was a boy who committed suicide because of bullying. To show how far bullying can go. It was an extreme case. He hanged himself and he left a note saying that he was bullied in many hurtul ways. e.g. bulllies took his money and they also dunked him in a nearby stream many times. [A description of the extremity of the case.] This is mentioned because they feel it is important to try and stop bullying and for parents and teachers to keep a close eye on the children because they might do the same thing if it goes on for too long without help. [A very long winded way of saying that the incident showed how much public awareness needed to be raised.]

No credit Code 0:

Vague or inaccurate answer, including suggestion that the mention of Kiyoteru Okouchi is sensationalist.

He was a Japanese school boy. There are many cases like this all over the world. It’s just to grab your attention. Because he was bullied. [Seems to be answering the question, “why did he commit suicide?”, not why is it mentioned in the article, so fails to define connection. Not implicit enough.] Because the extent of bullying gone unnoticed. [Can’t make sense of it. confuses cause and effect.]

Code 8: Code 9:

Off task. Missing.

source: http://www.oecd.org/pisa/38709396.pdf [2015-11-10], p. 60

ReleasedPISAItems_Reading.doc

Page 60

Introduction

Automatic Coding

Methods

Results

Discussion

References

Automatic Coding & Coding Guides The article on the opposite page appeared in a Japanese newspaper in 1996. Refer to it to answer the questions below.

– coding guides for human coding often at hand ◦ comprise reference responses: prototypes for their coding ,→ offer the possibility to start model training

– coding guides often need to be adapted to empirical data ...

Question 2: BULLYING

R118Q02- 0 1 8 9

Why does the article mention the death of Kiyoteru Okouchi?

................................................................................................................................... ...................................................................................................................................

BULLYING SCORING 2

QUESTION INTENT: Developing an Interpretation: linking local and global cohesion Full credit Code 1:

Relates the bullying-suicide incident to public concern and / or the survey OR refers to the idea that the death was associated with extreme bullying. Connection may be explicitly stated or readily inferred.

To explain why the survey was conducted. To give the background to why people are so concerned about bullying in Japan. He was a boy who committed suicide because of bullying. To show how far bullying can go. It was an extreme case. He hanged himself and he left a note saying that he was bullied in many hurtul ways. e.g. bulllies took his money and they also dunked him in a nearby stream many times. [A description of the extremity of the case.] This is mentioned because they feel it is important to try and stop bullying and for parents and teachers to keep a close eye on the children because they might do the same thing if it goes on for too long without help. [A very long winded way of saying that the incident showed how much public awareness needed to be raised.]

1. if response types had not been considered 2. to distinguish similar responses with different codes No credit Code 0:

Vague or inaccurate answer, including suggestion that the mention of Kiyoteru Okouchi is sensationalist.

He was a Japanese school boy. There are many cases like this all over the world. It’s just to grab your attention. Because he was bullied. [Seems to be answering the question, “why did he commit suicide?”, not why is it mentioned in the article, so fails to define connection. Not implicit enough.] Because the extent of bullying gone unnoticed. [Can’t make sense of it. confuses cause and effect.]

Code 8: Code 9:

Off task. Missing.

source: http://www.oecd.org/pisa/38709396.pdf [2015-11-10], p. 60

ReleasedPISAItems_Reading.doc

Page 60

Introduction

Automatic Coding

Methods

Results

Discussion

References

Automatic Coding & Coding Guides The article on the opposite page appeared in a Japanese newspaper in 1996. Refer to it to answer the questions below.

– coding guides for human coding often at hand ◦ comprise reference responses: prototypes for their coding ,→ offer the possibility to start model training

– coding guides often need to be adapted to empirical data ...

Question 2: BULLYING

R118Q02- 0 1 8 9

Why does the article mention the death of Kiyoteru Okouchi?

................................................................................................................................... ...................................................................................................................................

BULLYING SCORING 2

QUESTION INTENT: Developing an Interpretation: linking local and global cohesion Full credit Code 1:

Relates the bullying-suicide incident to public concern and / or the survey OR refers to the idea that the death was associated with extreme bullying. Connection may be explicitly stated or readily inferred.

To explain why the survey was conducted. To give the background to why people are so concerned about bullying in Japan. He was a boy who committed suicide because of bullying. To show how far bullying can go. It was an extreme case. He hanged himself and he left a note saying that he was bullied in many hurtul ways. e.g. bulllies took his money and they also dunked him in a nearby stream many times. [A description of the extremity of the case.] This is mentioned because they feel it is important to try and stop bullying and for parents and teachers to keep a close eye on the children because they might do the same thing if it goes on for too long without help. [A very long winded way of saying that the incident showed how much public awareness needed to be raised.]

1. if response types had not been considered 2. to distinguish similar responses with different codes ,→ both can be supported by automatic systems No credit Code 0:

Vague or inaccurate answer, including suggestion that the mention of Kiyoteru Okouchi is sensationalist.

He was a Japanese school boy. There are many cases like this all over the world. It’s just to grab your attention. Because he was bullied. [Seems to be answering the question, “why did he commit suicide?”, not why is it mentioned in the article, so fails to define connection. Not implicit enough.] Because the extent of bullying gone unnoticed. [Can’t make sense of it. confuses cause and effect.]

Code 8: Code 9:

Off task. Missing.

source: http://www.oecd.org/pisa/38709396.pdf [2015-11-10], p. 60

ReleasedPISAItems_Reading.doc

Page 60

Introduction

Automatic Coding

Methods

Results

Discussion

Automatic Coding & Coding Guides – coding guides for human coding often at hand ◦ comprise reference responses: prototypes for their coding ,→ offer the possibility to start model training

– coding guides often need to be adapted to empirical data ... 1. if response types had not been considered 2. to distinguish similar responses with different codes ,→ both can be supported by automatic systems

The article on the opposite page appeared in a Japanese newspaper in 1996. Refer to it to answer the questions below.

Question 2: BULLYING

R118Q02- 0 1 8 9

Why does the article mention the death of Kiyoteru Okouchi?

................................................................................................................................... ...................................................................................................................................



BULLYING SCORING 2

QUESTION INTENT: Developing an Interpretation: linking local and global cohesion Full credit Code 1:

Concept Coding Guide trains Automatic System

Relates the bullying-suicide incident to public concern and / or the survey OR refers to the idea that the death was associated with extreme bullying. Connection may be explicitly stated or readily inferred.

To explain why the survey was conducted. To give the background to why people are so concerned about bullying in Japan. He was a boy who committed suicide because of bullying. To show how far bullying can go. It was an extreme case. He hanged himself and he left a note saying that he was bullied in many hurtul ways. e.g. bulllies took his money and they also dunked him in a nearby stream many times. [A description of the extremity of the case.] This is mentioned because they feel it is important to try and stop bullying and for parents and teachers to keep a close eye on the children because they might do the same thing if it goes on for too long without help. [A very long winded way of saying that the incident showed how much public awareness needed to be raised.]

No credit Code 0:

Vague or inaccurate answer, including suggestion that the mention of Kiyoteru Okouchi is sensationalist.

He was a Japanese school boy. There are many cases like this all over the world. It’s just to grab your attention. Because he was bullied. [Seems to be answering the question, “why did he commit suicide?”, not why is it mentioned in the article, so fails to define connection. Not implicit enough.] Because the extent of bullying gone unnoticed. [Can’t make sense of it. confuses cause and effect.]

Code 8: Code 9:

Off task. Missing.

ReleasedPISAItems_Reading.doc

Page 60

Automatic System improves Coding Guide

References

Introduction

Automatic Coding

Methods

Employed Automatic System

Results

Discussion

(Zehner, Sälzer, & Goldhammer, 2015)

Example: Starting with a short text response ... A

girl

falling

into

and

wandering

through

a

fentasy

world .

References

Introduction

Automatic Coding

Methods

Employed Automatic System

Results

Discussion

(Zehner, Sälzer, & Goldhammer, 2015)

Example: Starting with a short text response ... A

girl

falling

into

and

wandering

through

a

fentasy

world ./

References

Introduction

Automatic Coding

Methods

Employed Automatic System

Results

Discussion

(Zehner, Sälzer, & Goldhammer, 2015)

Example: Starting with a short text response ... a

girl

falling

into

and

wandering

through

a

fentasy

world ./

References

Introduction

Automatic Coding

Employed Automatic System

Methods

Results

Discussion

(Zehner, Sälzer, & Goldhammer, 2015)

Example: Starting with a short text response ... [a] [girl] [falling] [into] [and] [wandering] [through] [a] [fentasy] [world]/ .

References

Introduction

Automatic Coding

Employed Automatic System

Methods

Results

Discussion

(Zehner, Sälzer, & Goldhammer, 2015)

Example: Starting with a short text response ... [a] [girl] [falling] [into] [and] [wandering] [through] [a] [fantasy] [world]/ .

References

Introduction

Automatic Coding

Employed Automatic System

Methods

Results

Discussion

(Zehner, Sälzer, & Goldhammer, 2015)

Example: Starting with a short text response ... [a] [girl] [falling] [into] [and] [wandering] [through] [a] [fantasy] [world]/ .

References

Introduction

Automatic Coding

Employed Automatic System

Methods

Results

Discussion

(Zehner, Sälzer, & Goldhammer, 2015)

Example: Starting with a short text response ... [a] [girl] [fall///// ing] [into] [and] [wander//// ing] [through] [a] [fantasy] [world]/ .

References

Introduction

Automatic Coding

Methods

Employed Automatic System

Results

Discussion

References

(Zehner, Sälzer, & Goldhammer, 2015)

Example: Starting with a short text response ... [a] [girl] [fall///// ing] [into] [and] [wander//// ing] [through] [a] [fantasy] [world]/ .

... to a numerical representation of its semantics ... (LSA; Deerwester, Dumais, Furnas, & Landauer, 1990) [girl] ↓ ! −.03 .04

[fall] ↓ ! −.11 .23

[wander] ↓ ! .06 −.73

[fantasy] ↓ ! −.16 −.02

[world] ↓ ! −.37 .04

. . .

. . .

. . .

. . .

. . .

.21

.00

−.10

.81

−.51

} 

−.13



 −.09      ..   .

.08

Introduction

Automatic Coding

Methods

Employed Automatic System

Results

Discussion

References

(Zehner, Sälzer, & Goldhammer, 2015)

Example: Starting with a short text response ... [a] [girl] [fall///// ing] [into] [and] [wander//// ing] [through] [a] [fantasy] [world]/ .

... to a numerical representation of its semantics ... (LSA; Deerwester, Dumais, Furnas, & Landauer, 1990) [girl] ↓ ! −.03 .04

[fall] ↓ ! −.11 .23

[wander] ↓ ! .06 −.73

[fantasy] ↓ ! −.16 −.02

[world] ↓ ! −.37 .04

. . .

. . .

. . .

. . .

. . .

.21

.00

−.10

.81

−.51

} 

−.13



 −.09      ..   .

.08

Zehner, Sälzer, & Goldhammer, 2015, p. 4

... up to the automatic code

Introduction

Automatic Coding

Methods

Employed Automatic System

Results

Discussion

References

(Zehner, Sälzer, & Goldhammer, 2015)

Example: Starting with a short text response ... [a] [girl] [fall///// ing] [into] [and] [wander//// ing] [through] [a] [fantasy] [world]/ .

... to a numerical representation of its semantics ... (LSA; Deerwester, Dumais, Furnas, & Landauer, 1990) [girl] ↓ ! −.03 .04

[fall] ↓ ! −.11 .23

[wander] ↓ ! .06 −.73

[fantasy] ↓ ! −.16 −.02

[world] ↓ ! −.37 .04

. . .

. . .

. . .

. . .

. . .

.21

.00

−.10

.81

−.51

} 

−.13



 −.09      ..   .

.08

Zehner, Sälzer, & Goldhammer, 2015, p. 4

... up to the automatic code

Introduction

Automatic Coding

Methods

Results

Discussion

Integrating Coding Guides

1. number of clusters via sum of within-variances (without annotated data)

References

1559 1653

5.85.9 5.25.25.6 5.15.15.1 4.74.84.9 4.44.74.7 4.24.34.4 4.1 4.1 4 4 4.1 3.83.83.9 4 3.73.73.73.7 −0.0956

−0.0785

−0.1055

−0.1709

−0.4084

−0.0242

−0.0491

−0.0586

−0.0218

−0.1792

6.5 6.36.4 6 6.1

Clusters

34.7 31.9 27.6 26.5 18.7 17 13.4 12.5 12.4 12.2 11.9 11.7 10.6 10.4 9.210 8.7 8.6 7.27.5 7 7.2

60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6

−7.6905

−2.7729

−4.2654

−1.1581

−7.7516

−1.774

−3.5487

−0.9249

42.4

4

−5.8848

−11.0371

Results

−0.0606

−0.2067

−0.3561

−0.1892

−1.0792

−0.225

−0.324

−0.8679

−0.445

−0.1624

−1.0739

−0.2798

−0.0527

Methods

−0.1708

−0.4784

−0.0978

−0.1417

−0.202

2135

Automatic Coding

−0.1214

−0.0215

−0.0304

−0.0132

−0.2611

−0.0048

−0.1607

−0.0913

−0.0566

−0.0364

−0.0277

−0.0443

−0.048

−0.0022

−0.0559

−0.1362

−0.0044

−0.0203

−0.0024

−0.0012

−0.0065

1945 2039

"Rest"−Component 1751 1844

Introduction Discussion References

Integrating Coding Guides

1. number of clusters via sum of within-variances 53.4

59.3

2

Introduction

Automatic Coding

Methods

Results

Discussion

References

Integrating Coding Guides

1. number of clusters via sum of within-variances 2. process reference responses analogously to empirical responses 3. project reference responses into the semantic space and assign them to the most similar clusters

Introduction

Automatic Coding

Methods

Results

Discussion

References

Integrating Coding Guides

1. number of clusters via sum of within-variances 2. process reference responses analogously to empirical responses 3. project reference responses into the semantic space and assign them to the most similar clusters, evaluation: – frequency distribution of reference responses across clusters

Introduction

Automatic Coding

Methods

Results

Discussion

References

Integrating Coding Guides

1. number of clusters via sum of within-variances 2. process reference responses analogously to empirical responses 3. project reference responses into the semantic space and assign them to the most similar clusters, evaluation: – frequency distribution of reference responses across clusters

– Conflict I: cluster without reference response

Introduction

Automatic Coding

Methods

Results

Discussion

References

Integrating Coding Guides

1. number of clusters via sum of within-variances 2. process reference responses analogously to empirical responses 3. project reference responses into the semantic space and assign them to the most similar clusters, evaluation: – frequency distribution of reference responses across clusters

– Conflict I: cluster without reference response, use k = 1 empirical responses

Introduction

Automatic Coding

Methods

Results

Discussion

References

Integrating Coding Guides

1. number of clusters via sum of within-variances 2. process reference responses analogously to empirical responses 3. project reference responses into the semantic space and assign them to the most similar clusters, evaluation: – frequency distribution of reference responses across clusters

– Conflict I: cluster without reference response, use k = 1 empirical responses – Conflict II: cluster with different reference responses

Introduction

Automatic Coding

Methods

Results

Integrating Coding Guides

Discussion

References

π 2

0

Centroid

π 2

π0

Centroid

1. number of clusters via sum of within-variances 2. process reference responses analogously to empirical responses 3. project reference responses into the semantic space and assign them to the most similar clusters, evaluation: – frequency distribution of reference responses across clusters – distribution of response distances to their cluster centroid within clusters; ·~ y ∆~c ,~y = arccos( |~c~c|∗|~ y| ) – Conflict I: cluster without reference response, use k = 1 empirical responses – Conflict II: cluster with different reference responses

Introduction

Automatic Coding

Methods

Results

Integrating Coding Guides

Discussion

References

π 2

0

Centroid

π 2

π0

Centroid

1. number of clusters via sum of within-variances 2. process reference responses analogously to empirical responses 3. project reference responses into the semantic space and assign them to the most similar clusters, evaluation: – frequency distribution of reference responses across clusters – distribution of response distances to their cluster centroid within clusters; ·~ y ∆~c ,~y = arccos( |~c~c|∗|~ y| ) – Conflict I: cluster without reference response, use k = 1 empirical responses – Conflict II: cluster with different reference responses – Conflict III: compulsorily assigned reference responses

Introduction

Automatic Coding

Methods

Results

Integrating Coding Guides

Discussion

References

π 2

0

Centroid

π 2

π0

Centroid

1. number of clusters via sum of within-variances 2. process reference responses analogously to empirical responses 3. project reference responses into the semantic space and assign them to the most similar clusters, evaluation: – frequency distribution of reference responses across clusters – distribution of response distances to their cluster centroid within clusters; ·~ y ∆~c ,~y = arccos( |~c~c|∗|~ y| ) – Conflict I: cluster without reference response, use k = 1 empirical responses – Conflict II: cluster with different reference responses – Conflict III: compulsorily assigned reference responses need to be omitted

Introduction

Automatic Coding

Methods

Results

Discussion

Materials and Data – PISA 2012 (15-year olds and ninth-graders in Germany) – 8 items assessing reading, 1 item math and science each

Item 1·Explain Protagonist’s Feeling 2·Evaluate Statement 3·Interpret the Author’s Intention 4·List Recall 5·Evaluate Stylistic Element 6·Verbal Production 7·Select and Judge 8·Explain Story Element 9·Math 10·Science Total a b

Domain reading reading reading reading reading reading reading reading math science

Aspecta Correct B 83% C 43% B 10% A 59% C 56% B 80% C 68% B 69% M 35% S 58% 56%

n 4,152 4,234 4,234 4,223 4,234 4,152 4,152 4,223 4,205 4,181 41,990

Wordsb 12.3 (4.6) 15.6 (9.0) 12.5 (6.3) 5.6 (3.0) 14.7 (6.2) 12.4 (6.9) 13.6 (7.0) 14.4 (5.5) 14.0 (6.8) 11.1 (5.2) 12.6 (6.1)

A = Access & Retrieve, B = Integrate & Interpret, C = Reflect & Evaluate, M = Uncertainty & Data, S = Explain Phenomena Scientifically according to PISA framework (OECD, 2013) average word count for non-empty responses (SD)

References

Introduction

Automatic Coding

Methods

Results

Discussion

Materials and Data – PISA 2012 (15-year olds and ninth-graders in Germany) – 8 items assessing reading, 1 item math and science each

4152

Item  1·Explain Protagonist’s Feeling  2·Evaluate Statement 3·Interpret the Author’s Intention 4·List Recall 5·Evaluate Stylistic Element 6·Verbal Production 7·Select and Judge 8·Explain Story Element 9·Math 10·Science Total a b

4181

Domain reading reading reading reading reading reading reading reading math science

4205

4223

Aspecta Correct B 83% C 43% B 10% A 59% C 56% B 80% C 68% B 69% M 35% S 58% 56%

4234

n 4,152 4,234 4,234 4,223 4,234 4,152 4,152 4,223 4,205 4,181 41,990

Wordsb 12.3 (4.6) 15.6 (9.0) 12.5 (6.3) 5.6 (3.0) 14.7 (6.2) 12.4 (6.9) 13.6 (7.0) 14.4 (5.5) 14.0 (6.8) 11.1 (5.2) 12.6 (6.1)

A = Access & Retrieve, B = Integrate & Interpret, C = Reflect & Evaluate, M = Uncertainty & Data, S = Explain Phenomena Scientifically according to PISA framework (OECD, 2013) average word count for non-empty responses (SD)

References

Introduction

Automatic Coding

Methods

Results

Discussion

Materials and Data – PISA 2012 (15-year olds and ninth-graders in Germany) – 8 items assessing reading, 1 item math and science each

0%

20%

Item  1·Explain Protagonist’s Feeling 2·Evaluate Statement  3·Interpret the Author’s Intention 4·List Recall 5·Evaluate Stylistic Element  6·Verbal Production 7·Select and Judge 8·Explain Story Element 9·Math 10·Science Total a b

40%

Domain reading reading reading reading reading reading reading reading math science

60%

80%

Aspecta Correct B 83% C 43% B 10% A 59% C 56% B 80% C 68% B 69% M 35% S 58% 56%

100%

n 4,152 4,234 4,234 4,223 4,234 4,152 4,152 4,223 4,205 4,181 41,990

Wordsb 12.3 (4.6) 15.6 (9.0) 12.5 (6.3) 5.6 (3.0) 14.7 (6.2) 12.4 (6.9) 13.6 (7.0) 14.4 (5.5) 14.0 (6.8) 11.1 (5.2) 12.6 (6.1)

A = Access & Retrieve, B = Integrate & Interpret, C = Reflect & Evaluate, M = Uncertainty & Data, S = Explain Phenomena Scientifically according to PISA framework (OECD, 2013) average word count for non-empty responses (SD)

References

Introduction

Automatic Coding

Methods

Results

Discussion

Materials and Data – PISA 2012 (15-year olds and ninth-graders in Germany) – 8 items assessing reading, 1 item math and science each

4.8

5.4

6.0

6.6

7.2

7.8

8.4

Item 1·Explain Protagonist’s Feeling 2·Evaluate Statement 3·Interpret the Author’s Intention  4·List Recall 5·Evaluate Stylistic Element 6·Verbal Production 7·Select and Judge 8·Explain Story Element 9·Math 10·Science Total a b

9.0

9.6

10.2

Domain reading reading reading reading reading reading reading reading math science

10.8

11.4

12.0

12.6

13.2

13.8

14.4

Aspecta Correct B 83% C 43% B 10% A 59% C 56% B 80% C 68% B 69% M 35% S 58% 56%

15.0

15.6

n 4,152 4,234 4,234 4,223 4,234 4,152 4,152 4,223 4,205 4,181 41,990

Wordsb 12.3 (4.6) 15.6 (9.0) 12.5 (6.3) 5.6 (3.0) 14.7 (6.2) 12.4 (6.9) 13.6 (7.0) 14.4 (5.5) 14.0 (6.8) 11.1 (5.2) 12.6 (6.1)

A = Access & Retrieve, B = Integrate & Interpret, C = Reflect & Evaluate, M = Uncertainty & Data, S = Explain Phenomena Scientifically according to PISA framework (OECD, 2013) average word count for non-empty responses (SD)

References

Introduction

Automatic Coding

Methods

Results

Discussion

Materials and Data – PISA 2012 (15-year olds and ninth-graders in Germany) – 8 items assessing reading, 1 item math and science each

Item  1·Explain Protagonist’s Feeling 2·Evaluate Statement  3·Interpret the Author’s Intention 4·List Recall 5·Evaluate Stylistic Element  6·Verbal Production 7·Select and Judge  8·Explain Story Element 9·Math 10·Science Total a b

Domain reading reading reading reading reading reading reading reading math science

Aspecta Correct B 83% C 43% B 10% A 59% C 56% B 80% C 68% B 69% M 35% S 58% 56%

n 4,152 4,234 4,234 4,223 4,234 4,152 4,152 4,223 4,205 4,181 41,990

Wordsb 12.3 (4.6) 15.6 (9.0) 12.5 (6.3) 5.6 (3.0) 14.7 (6.2) 12.4 (6.9) 13.6 (7.0) 14.4 (5.5) 14.0 (6.8) 11.1 (5.2) 12.6 (6.1)

A = Access & Retrieve, B = Integrate & Interpret, C = Reflect & Evaluate, M = Uncertainty & Data, S = Explain Phenomena Scientifically according to PISA framework (OECD, 2013) average word count for non-empty responses (SD)

References

Introduction

Automatic Coding

Methods

Results

Discussion

Materials and Data – PISA 2012 (15-year olds and ninth-graders in Germany) – 8 items assessing reading, 1 item math and science each

Item 1·Explain Protagonist’s Feeling  2·Evaluate Statement 3·Interpret the Author’s Intention 4·List Recall  5·Evaluate Stylistic Element 6·Verbal Production  7·Select and Judge 8·Explain Story Element 9·Math 10·Science Total a b

Domain reading reading reading reading reading reading reading reading math science

Aspecta Correct B 83% C 43% B 10% A 59% C 56% B 80% C 68% B 69% M 35% S 58% 56%

n 4,152 4,234 4,234 4,223 4,234 4,152 4,152 4,223 4,205 4,181 41,990

Wordsb 12.3 (4.6) 15.6 (9.0) 12.5 (6.3) 5.6 (3.0) 14.7 (6.2) 12.4 (6.9) 13.6 (7.0) 14.4 (5.5) 14.0 (6.8) 11.1 (5.2) 12.6 (6.1)

A = Access & Retrieve, B = Integrate & Interpret, C = Reflect & Evaluate, M = Uncertainty & Data, S = Explain Phenomena Scientifically according to PISA framework (OECD, 2013) average word count for non-empty responses (SD)

References

Introduction

Automatic Coding

Methods

Results

Discussion

Analyses

general setup: 300 LSA dimensions, arccosine, Ward, spelling correction

Analysis I – illuminates needed changes in the coding guide – with regard to conflicts I, II and III

References

Introduction

Automatic Coding

Methods

Results

Discussion

Analyses

general setup: 300 LSA dimensions, arccosine, Ward, spelling correction

Analysis I – illuminates needed changes in the coding guide – with regard to conflicts I, II and III Analysis II followed two interests 1. performance cg vs. man (operationalized as κh:c and λh:c )

References

Introduction

Automatic Coding

Methods

Results

Discussion

Analyses

general setup: 300 LSA dimensions, arccosine, Ward, spelling correction

Analysis I – illuminates needed changes in the coding guide – with regard to conflicts I, II and III Analysis II followed two interests 1. performance cg vs. man (operationalized as κh:c and λh:c ) 2. performance and number of clusters (∝ coding effort)

References

Introduction

Automatic Coding

Methods

Results

Discussion

Results – Analysis: Coding Guide Improvement

Item #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total

Number of Clusters Ref. Resp. 52 17 48 21 70 31 7 17 53 32 53 21 46 15 31 17 46 12 55 15 461 198

Conflict I II III 39 (75%) 1 7 (41%) 34 (71%) 1 5 (24%) 50 (71%) 1 7 (23%) 1 (14%) 2 3 (18%) 32 (60%) 2 7 (22%) 39 (74%) 0 4 (19%) 35 (76%) 0 3 (20%) 22 (71%) 1 4 (24%) 37 (80%) 1 1 (8%) 44 (80%) 2 1 (7%) 333 (72%) 11 42 (21%)

Note. Conflict I: clusters without reference response, II: clusters with contradicting reference responses, III: reference responses without empirical correspondence

References

Introduction

Automatic Coding

Methods

Results

Discussion

Results – Analysis: Coding Guide Improvement

Item #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total

Number of Clusters Ref. Resp. 52 17 48 21 70 31 7 17 53 32 53 21 46 15 31 17 46 12 55 15 461 198

Conflict I II III 39 (75%) 1 7 (41%) 34 (71%) 1 5 (24%) 50 (71%) 1 7 (23%) 1 (14%) 2 3 (18%) 32 (60%) 2 7 (22%) 39 (74%) 0 4 (19%) 35 (76%) 0 3 (20%) 22 (71%) 1 4 (24%) 37 (80%) 1 1 (8%) 44 (80%) 2 1 (7%) 333 (72%) 11 42 (21%)

Note. Conflict I: clusters without reference response, II: clusters with contradicting reference responses, III: reference responses without empirical correspondence

References

Introduction

Automatic Coding

Methods

Results

Discussion

Results – Analysis: Coding Guide Improvement

Item #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total

Number of Clusters Ref. Resp. 52 17 48 21 70 31 7 17 53 32 53 21 46 15 31 17 46 12 55 15 461 198

Conflict I II III 39 (75%) 1 7 (41%) 34 (71%) 1 5 (24%) 50 (71%) 1 7 (23%) 1 (14%) 2 3 (18%) 32 (60%) 2 7 (22%) 39 (74%) 0 4 (19%) 35 (76%) 0 3 (20%) 22 (71%) 1 4 (24%) 37 (80%) 1 1 (8%) 44 (80%) 2 1 (7%) 333 (72%) 11 42 (21%)

Note. Conflict I: clusters without reference response, II: clusters with contradicting reference responses, III: reference responses without empirical correspondence

References

Introduction

Automatic Coding

Methods

Results

Discussion



● ● ● ● ●

● ●











● ●

● ●



● ● ● ●



90 140

Human−Computer−Agreement (λh:c)

cg







2 40 1.00 .95 .90 .85 .80 .75 .70 .65 .60 .55 .50 .45 .40 .35 .30 .25 .20 .15 .10 .05 .00





Human−Computer−Agreement (κh:c)



1.00 .95 .90 .85 .80 .75 .70 .65 .60 .55 .50 .45 .40 .35 .30 .25 .20 .15 .10 .05 .00

200 260 320 Number of Clusters

380

440

500









● ●









●●

●●●●



● ●



1.00 .95 .90 .85 .80 .75 .70 .65 .60 .55 .50 .45 .40 .35 .30 .25 .20 .15 .10 .05 .00







● ●

































● ●







● ●



1·Expl. Protagonist's Feeling 2·Evaluate Statement 3·Int. the Author's Intention 4·List Recall 5·Evaluate Stylistic Element 6·Verbal Production 7·Select and Judge 8·Explain Story Element 9·Math 10·Science







2 40

Human−Computer−Agreement (κh:c)

1.00 .95 .90 .85 .80 .75 .70 .65 .60 .55 .50 .45 .40 .35 .30 .25 .20 .15 .10 .05 .00

Human−Computer−Agreement (λh:c)

man

Results – Analysis II: Performance

90 140

200 260 320 Number of Clusters

380

440

500

● ● ●



● ●





● ● ● ●

● ● ●

● ●













● ●

● ●●



● ●







● ●





● ●●●● ●



●● ●

●● ●●





1·Expl. Protagonist's Feeling 2·Evaluate Statement 3·Int. the Author's Intention 4·List Recall 5·Evaluate Stylistic Element 6·Verbal Production 7·Select and Judge 8·Explain Story Element 9·Math 10·Science

● ●

2 40

● ●

90 140 ●





200 260 320 Number of Clusters



380

440

500

2 40

90 140

200 260 320 Number of Clusters

380

440

500

References

Introduction

Automatic Coding

Methods

Results

Discussion

– great potential for improvement of coding guides

Discussion

References

Introduction

Automatic Coding

Methods

Results

Discussion

References

Discussion

– great potential for improvement of coding guides – cg-approach: empirical performance showed unreliable variation up to 100 clusters, from this point not too much deviation from the original man-system

Introduction

Automatic Coding

Methods

Results

Discussion

References

Discussion

– great potential for improvement of coding guides – cg-approach: empirical performance showed unreliable variation up to 100 clusters, from this point not too much deviation from the original man-system – k = 1 7→ probably too much impact by chance in big clusters – hence, we recommend k = 3 or 5 – but systematic analyses how to balance k and the number of clusters are still to be done

Introduction

Automatic Coding

Methods

Results

Discussion

References

References

Basu, S., Jacobs, C., & Vanderwende, L. (2013). Powergrading: A clustering approach to amplify human effort for short answer grading. Transactions of the Association for Computational Linguistics, 1, 391–402. Deerwester, S., Dumais, S. T., Furnas, G. W., & Landauer, T. K. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407. Dronen, N., Foltz, P. W., & Habermehl, K. (2014). Effective sampling for large-scale automated writing evaluation systems. arXiv preprint arXiv:1412.5659. Fedorov, V. V. (1972). Theory of optimal experiments. New York: Academic Press. OECD. (2013). PISA 2012 assessment and analytical framework: Mathematics, reading, science, problem solving and financial literacy. OECD Publishing. Ramachandran, L., & Foltz, P. (2015). Generating reference texts for short answer scoring using graph-based summarization. In Association for Computational Linguistics (Ed.), Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 207–212). Scott, D. W. (1992). Multivariate density estimation: Theory, practice, and visualization. New York, NY: Wiley. Sukkarieh, J. Z., & Stoyanchev, S. (2009). Automating Model Building in c-rater. In Proceedings of the 2009 Workshop on Applied Textual Inference (pp. 61–69). Zehner, F., Sälzer, C., & Goldhammer, F. (2015). Automatic coding of short text responses via clustering in educational assessment. Educational and Psychological Measurement. Retrieved from http://epm.sagepub.com/content/early/2015/06/06/0013164415590022 Zesch, T., Heilman, M., & Cahill, A. (2015). Reducing annotation efforts in supervised short answer scoring. In Association for Computational Linguistics (Ed.), Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 124–132).

Introduction

Automatic Coding

Methods

Results

Discussion

References

Thank you for your attention

[email protected]

Introduction

Automatic Coding

Methods

Appendix

Results

Discussion

References

Introduction

Automatic Coding

Methods

Results

Discussion

How to Sample new Empirical Prototypes

– Conflict I and partly III: need to sample empirical responses as new prototypes for the coding guides

References

Introduction

Automatic Coding

Methods

Results

Discussion

How to Sample new Empirical Prototypes

– Conflict I and partly III: need to sample empirical responses as new prototypes for the coding guides

Which Responses are Prototypes for their Clusters?

References

Introduction

Automatic Coding

Methods

Results

Discussion

References

Dronen et al., 2014, p. 7

How to Sample new Empirical Prototypes

– Conflict I and partly III: need to sample empirical responses as new prototypes for the coding guides

Which Responses are Prototypes for their Clusters? – regression: optimal design algorithms very effective (Dronen et al., 2014; e.g., Fedorov exchange, Fedorov, 1972)

Introduction

Automatic Coding

Sampling Prototypes

Clustering, a Different Story – types known 7→ prototypes required

Methods

Results

Discussion

References

Introduction

Automatic Coding

Methods

Results

Discussion

Sampling Prototypes

π 2

0

Clustering, a Different Story – types known 7→ prototypes required – often, responses close to the centroid assumed as prototypes (Zehner et al., 2015; Zesch et al., 2015)

References

Centroid

π 2

π0

Centroid

Introduction

Automatic Coding

Methods

Results

Discussion

Sampling Prototypes



● ● ● ● ● ●



● ●









● ● ● ●

● ●●

● ● ● ● ● ● ● ●●● ● ● ●

● ● ●

● ● ●● ● ●

● ● ●

● ●

● ● ●











● ● ●

● ●







●● ●

● ●

● ●

● ● ● ● ● ● ● ● ●



●●





● ●●









● ●

● ●













● ●



● ●

● ●









● ●















● ●







● ●





● ● ● ●



● ● ●





Clustering, a Different Story – types known 7→ prototypes required – often, responses close to the centroid assumed as prototypes (Zehner et al., 2015; Zesch et al., 2015)

– list heuristic in Ramachandran and Foltz (2015): sorted by highest similarity and most connections 7→ densest region

References

Introduction

Automatic Coding

Methods

Results

Discussion

Sampling Prototypes



● ● ● ● ● ●



● ●









● ● ● ●

● ●●

● ● ● ● ● ● ● ●●● ● ● ●

● ● ●

● ● ●● ● ●

● ● ●

● ●

● ● ●











● ● ●

● ●







●● ●

● ●

● ●

● ● ● ● ● ● ● ● ●



●●





● ●●









● ●

● ●













● ●



● ●

● ●









● ●















● ●







● ●





● ● ● ●



● ● ●





Clustering, a Different Story – types known 7→ prototypes required – often, responses close to the centroid assumed as prototypes (Zehner et al., 2015; Zesch et al., 2015)

– list heuristic in Ramachandran and Foltz (2015): sorted by highest similarity and most connections 7→ densest region – kernel density estimates optimal, but not feasible here (hyperdimensionality), approximation: – dense regions comprise many responses with relatively low pairwise distances

References

Introduction

Automatic Coding

Methods

Results

Discussion

Sampling Prototypes



● ● ● ● ● ●



● ●









● ● ● ●

● ●●

● ● ● ● ● ● ● ●●● ● ● ●

● ● ●

● ● ●● ● ●

● ● ●

● ●

● ● ●











● ● ●

● ●







●● ●

● ●

● ●

● ● ● ● ● ● ● ● ●



●●





● ●●









● ●

● ●













● ●



● ●

● ●









● ●















● ●







● ●





● ● ● ●



● ● ●





Clustering, a Different Story – types known 7→ prototypes required – often, responses close to the centroid assumed as prototypes (Zehner et al., 2015; Zesch et al., 2015)

– list heuristic in Ramachandran and Foltz (2015): sorted by highest similarity and most connections 7→ densest region – kernel density estimates optimal, but not feasible here (hyperdimensionality), approximation: – dense regions comprise many responses with relatively low pairwise distances – constitutes the definition of kde: smallest region with the highest number of responses (cf. Scott, 1992)

References

Introduction

Automatic Coding

Methods

Results

Employment of the Theoretical Framework

How to Determine a Cluster Code – codes of the assigned reference responses determine the cluster code

Discussion

References

Introduction

Automatic Coding

Methods

Results

Discussion

References

Employment of the Theoretical Framework π 2

0 How to Determine a Cluster Code – codes of the assigned reference responses determine the cluster code Centroid

– reference responses with ∆c~i ,~y ≥ x¯i + 1.6sdi are omitted (compulsorily assigned)

π 2

π0

Centroid

Introduction

Automatic Coding

Methods

Results

Discussion

Employment of the Theoretical Framework

How to Determine a Cluster Code – codes of the assigned reference responses determine the cluster code – reference responses with ∆c~i ,~y ≥ x¯i + 1.6sdi are omitted (compulsorily assigned) – in case the reference responses’ codes are ... ◦ the same: cluster code = reference responses’ code

References

Introduction

Automatic Coding

Methods

Results

Discussion

References

Employment of the Theoretical Framework π 2

π 2

π0 0 How to Determine a Cluster Code – codes of the assigned reference responses determine the cluster code Centroid

Centroid

– reference responses with ∆c~i ,~y ≥ x¯i + 1.6sdi are omitted (compulsorily assigned) – in case the reference responses’ codes are ... ◦ the same: cluster code = reference responses’ code ◦ different: • cluster flagged for manual inspection • cluster code = majority of code • new empirical response in case of ties

π

Introduction

Automatic Coding

Methods

Results

Discussion

Employment of the Theoretical Framework

How to Determine a Cluster Code – codes of the assigned reference responses determine the cluster code – reference responses with ∆c~i ,~y ≥ x¯i + 1.6sdi are omitted (compulsorily assigned) – in case the reference responses’ codes are ... ◦ the same: cluster code = reference responses’ code ◦ different: • cluster flagged for manual inspection • cluster code = majority of code • new empirical response in case of ties

– in case there is no reference response assigned 7→ sample a new empirical one

References

Introduction

Automatic Coding

Methods

Results

Discussion

Employment of the Theoretical Framework How to Determine a Cluster Code – codes of the assigned reference responses determine the cluster code – reference responses with ∆c~i ,~y ≥ x¯i + 1.6sdi are omitted (compulsorily assigned) – in case the reference responses’ codes are ... ◦ the same: cluster code = reference responses’ code ◦ different: • cluster flagged for manual inspection • cluster code = majority of code • new empirical response in case of ties

– in case there is no reference response assigned 7→ sample a new empirical one Empirical Responses as New Reference Responses – k = 1 responses that are nearest to the centroid

References

Introduction

Automatic Coding

Methods

Results

Discussion

Employment of the Theoretical Framework How to Determine a Cluster Code – codes of the assigned reference responses determine the cluster code – reference responses with ∆c~i ,~y ≥ x¯i + 1.6sdi are omitted (compulsorily assigned) – in case the reference responses’ codes are ... ◦ the same: cluster code = reference responses’ code ◦ different: • cluster flagged for manual inspection • cluster code = majority of code • new empirical response in case of ties

– in case there is no reference response assigned 7→ sample a new empirical one Empirical Responses as New Reference Responses – k = 1 responses that are nearest to the centroid – no manual coding needed in this study because the data already were annotated

References

Introduction

Automatic Coding

Methods

Results

Discussion

Analyses general setup: 300 LSA dimensions, arccosine, Ward, spelling correction

Analysis I – illuminates needed changes in the coding guide – with regard to conflicts I, II and III

References

Introduction

Automatic Coding

Methods

Results

Discussion

Analyses general setup: 300 LSA dimensions, arccosine, Ward, spelling correction

Analysis I – illuminates needed changes in the coding guide – with regard to conflicts I, II and III Analysis III – shows empirical evidence which empirical responses should be sampled as new prototypes Analysis II followed two interests 1. performance cg vs. man (operationalized as κh:c and λh:c )

References

Introduction

Automatic Coding

Methods

Results

Discussion

Analyses general setup: 300 LSA dimensions, arccosine, Ward, spelling correction

Analysis I – illuminates needed changes in the coding guide – with regard to conflicts I, II and III Analysis III – shows empirical evidence which empirical responses should be sampled as new prototypes Analysis II followed two interests 1. performance cg vs. man (operationalized as κh:c and λh:c )

References

Introduction

Automatic Coding

Methods

Results

Discussion

Analyses general setup: 300 LSA dimensions, arccosine, Ward, spelling correction

Analysis I – illuminates needed changes in the coding guide – with regard to conflicts I, II and III Analysis III – shows empirical evidence which empirical responses should be sampled as new prototypes Analysis II followed two interests 1. performance cg vs. man (operationalized as κh:c and λh:c )

References

Introduction

Automatic Coding

Methods

Results

Discussion

Analyses general setup: 300 LSA dimensions, arccosine, Ward, spelling correction

Analysis I – illuminates needed changes in the coding guide – with regard to conflicts I, II and III Analysis III – shows empirical evidence which empirical responses should be sampled as new prototypes Analysis II followed two interests 1. performance cg vs. man (operationalized as κh:c and λh:c )

References

Introduction

Automatic Coding

Methods

Results

Discussion

Analyses general setup: 300 LSA dimensions, arccosine, Ward, spelling correction

Analysis I – illuminates needed changes in the coding guide – with regard to conflicts I, II and III Analysis III – shows empirical evidence which empirical responses should be sampled as new prototypes Analysis II followed two interests 1. performance cg vs. man (operationalized as κh:c and λh:c ) 2. performance and number of clusters (∝ coding effort)

References

Introduction

Automatic Coding

Methods

Results

Discussion

Results –0.9Analysis III: Sampling Prototypes 0.8 1.0 0.9

0.6 0.8 Distance (radian)

Distance (radian)

0.7

0.5 0.7 0.4 0.6 0.3 0.2

0.5 0.4 0.3

0.1 0.2 0.0 0.1 0.0

1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18

Index of Ordered Pairwise Distance Note. Two exemplary figures of increasingly ordered pairwise distances of responses within one cluster (item 10, cluster 3). Each line constitutes one response, the black, dashed ones are the five responses closest to the cluster centroid.

References

Introduction

Automatic Coding

Methods

Results

Discussion

Results –0.9Analysis III: Sampling Prototypes 0.8

0.6 0.5 Distance (radian)

Distance (radian)

0.7 0.9

0.4 0.3

0.8 0.7 0.6 0.5 0.4

0.2 0.3 0.1 0.2 0.0 0.1 0.0

1

2 1

3 2

4 3

5 4

6

7

8

9

10 11 12 13 14 15 16 17 18

5 Index 6 of 7 Ordered 8 9 Pairwise 10 11 12 13 14 15 16 17 18 Distance Index of Ordered Pairwise Distance

Note. Two exemplary figures of increasingly ordered pairwise distances of responses within one cluster (item 10, cluster 19). Each line constitutes one response, the black, dashed ones are the five responses closest to the cluster centroid.

References

Introduction

Automatic Coding

Methods

Results

Discussion

Results –0.9Analysis III: Sampling Prototypes 0.8 1.0 0.9

0.6 0.8 Distance (radian)

Distance (radian)

0.7

0.5 0.7 0.4 0.6 0.3 0.2

0.5 0.4 0.3

0.1 0.2 0.0 0.1 0.0

1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18

Index of Ordered Pairwise Distance Note. Two exemplary figures of increasingly ordered pairwise distances of responses within one cluster (item 10, cluster 3). Each line constitutes one response, the black, dashed ones are the five responses closest to the cluster centroid.

References

Suggest Documents