Presentations Slides

0 downloads 0 Views 4MB Size Report
Jul 7, 2009 - to Afrikaans “fisika” (sisxxss). LD=9 to Italian “fisica” ..... Like-sounding consonants mapped onto same symbol. ▫ e.g., b, f, p, v. ➢ Lexicon-free.
Presentations Slides July 7th Brighton, UK -----------------------------------------------------------------------------------------------------

8/07/09

Adriana
Berlanga,
Francis
Brouns,

 Peter
van
Rosmalen,
Kamakshi
Rajagopal,
 Marco
Kalz,
&
Slavi
Stoyanov


Natural Language Processing in support of Learning: Metrics, Feedback and Connectivity

Open
Universiteit
Nederland


AI-ED 2009 July 7th 2009, Brighton, UK

Outline

Lifelong Learning

•  Background & LTfLL Language Technologies for Lifelong Learning

•  Positioning of the learner in a domain •  Providing formative feedback on a learners Conceptual Development –  Approach –  Showcases –  Future work

•  Questions

3

4

Survey: ‘critical’ support activities •  Assessment of student work –  Formative feedback (including plagiarism)

•  Answering questions –  Routing questions –  Formulating personalised answer

•  Monitoring progress –  Drop out prevention; personal advice

Arts et al.

•  Supporting groups and communities –  Selecting and creating groups –  Providing overviews & feedback to activities 6

Van Rosmalen et al. (2008)

1

8/07/09

Inspired
to
LTfLL
(www.lGll‐project.org):


LTfLL - Themes

‐
FP7‐TEL:
a
3
year
project
2008‐2011
 ‐
11
partners
(8
countries,
6
languages)


Theme 2 support feedback services

LTfLL
ObjecVve
 To
create
a
set
of
next‐generaVon
support
and
advice
 services
that
will
enhance
individual
and
collabora7ve
 building
of
competences
and
knowledge
creaVon
in
 educaVonal
as
well
as
organizaVonal
seXngs.



Theme 1 position of the learner in a domain

Theme 3 social and informal learning

The
project
makes
extensive
use
of
language
technologies
 and
cogni7ve
models
in
the
services.

 7


8

Positioning

Theme 1: Positioning •  Determine learner’s knowledge in a domain (given a specific context e.g. in support of Assessment of Prior Learning or with regard to a specific topic, competence or learning goal)

To determine in a (semi-) automatic way learner’s prior knowledge –by analyzing her Portfolio and the domain of study– to recommend learning materials or courses to follow

Locate best suitable learning materials or courses to follow

To provide formative feedback with regard to the learner’s profile in the domain of study and recommend remedial actions to overcome conceptual gaps

Provide formative feedback and recommend remedial actions

9

10

Formative feedback •  Services will offer semi-automatic measurement of conceptual development within a particular expertise area •  Diagnosing conceptual development

EXPERTISE DEVELOPMENT: KNOWLEDGE PROCESSES

FORMATIVE FEEDBACK 11

–  Person’s knowledge of a domain by looking on how s/he organizes the concepts of such domain –  Novice vs. expert approach 12

2

8/07/09

The approach: Novice vs. Expert Novices and experts differ in •  How they express the concepts underlying a domain •  How they discriminate relevant from nonrelevant information •  And how they use and relate the concepts to one another 13

Expertise Level

Knowledge Structure

Learning

Problem solving Reasoning process

Novice

Networks (incomplete and loosely linked)

Knowledge Long chains of accretion, detailed Evidence from:reasoning steps integration and validation through •  Medicine networks

Intermediate

Networks (tightly linked and integrated)

Encapsulation

Expert

Illness scripts

Experienced expert

Memory traces of previous cases

Step by step process

–  Networks, encapsulations, scripts Reasoning through Big steps (but •  Health sciences encapsulated still one at network; the time) –  Networks, scripts abbreviated •  Business administration Illness script for Illness script Groups of steps –  Networks, scripts formation activation and activated as instantiation a whole •  Law –  Networks, encapsulation +/-, … Instantiated scripts

Automatic reminding

Expertise Level

Knowledge Structure

Learning

Problem solving Reasoning process

Novice

Networks (incomplete and loosely linked)

Knowledge accretion, integration and validation

Long chains of detailed reasoning steps through networks

Step by step process

Intermediate

Networks (tightly linked and integrated)

Encapsulation

Reasoning through encapsulated network; abbreviated

Big steps (but still one at the time)

Expert

Illness scripts

Illness script for formation

Illness script activation and instantiation

Groups of steps activated as a whole

Experienced expert

Memory traces of previous cases

Instantiated scripts

Automatic reminding

Boshuizen et al., 2004; Nievelstein, 2004

“Expert” Model •  Defines the expected set of concepts and relations that represent the domain of knowledge at a specific point in time of the development of a learner. •  It is not absolute •   Derive it (semi-)automatically 16

Boshuizen et al., 2004; Nievelstein, 2004

“Expert” Model

“Expert” Model

•  Defines the expected set of concepts and relations that represent the domain of knowledge at a specific point in time of the development of a learner. Y2 •  It is not absolute Y1 •   Derive it (semi-)automatically

1.  ‘Archetypical expert’ model, state-of the art information (e.g., scientific literature) 2.  ‘Theoretical expert’ model, documents of a particular course or context (e.g., course material, tutor notes, presentations) 3.  ‘Emerging expert’ model, concepts and the relations a group of people (co-workers, peers...) use to describe a domain

relative

absolute 17

18

3

8/07/09

Measuring conceptual development Knowledge elicitation •  measure
the
 learner’s
 understanding
of
the
 relaVonships
among
 a
set
of
concepts.
 •  Methods
:concept
 maps,
think
aloud,
 card
sorVng,
word
 associaVon

Knowledge representation

Exploring the approach: Investigating the use of different ‘expert’ models

Evaluation of the representation

•  Define
representaVons
of
the
•  RelaVve
to
some
 elicited
knowledge
that
 standard
 reflect
underlying

data
 •  compare
cogni7ve
 organizaVon.
 structures
of
 •  Methods:
cluster
analysis,
 experts
and
novices
 tree
construcVons,
 dimensional
representaVons,
 path

finder
nets

1.  Theoretical expert model –  Formal education –  Medical students, course and tutor materials –  Leximancer and Pathfinder

2.  Emergent expert model –  Informal learning –  Employees –  Leximancer

19

Exploring the approach: Investigating the use of different ‘expert’ models 1.  Theoretical expert model –  Formal education or tutor discontinuous? –  Medical students, Continuous course and materials –  Leximancer and Pathfinder •  Gaps and transitions

2.  Emergent expert model –  Arts et al –  Informal learning –  Employees –  Leximancer

–  Prince –  Boshuizen, Schmidt

20

Theoretical Expert Model (Leximancer and Pathfinder) Knowledge elicitation

•  A think aloud protocol to elicit students’ knowledge. •  The think aloud protocols were transcribed

Knowledge representation

Evaluation representation

•  Leximancer was •  Pathfinder to used to generate compare concept maps for cognitive novices (think alouds) structures & theoretical expert novices & model (tutor notes, model, identify learning materials) similarities and differences


21

22

Initial findings Verification. Output discussed with an expert: •  The concept maps differ on the level of detail.

Generation of expert and student concept maps Leximancer

23

–  Student’s concept map: detailed concepts (biology) –  Model: encapsulated concepts, panoramic view of the knowledge (the disease) 24

4

8/07/09

Theoretical Emergent Model

Initial findings

(Leximancer) Indicate procedural knowledge, mentioning how to solve a problem “the how”

Explain the reasons and conditions of a problem “the why”

Knowledge elicitation

•  A think aloud protocol to elicit employee’s knowledge. •  The think aloud protocols were transcribed

Knowledge representation

Evaluation representation

•  Leximancer was •  Leximancer to used to generate a compare single concept map of cognitive all (think alouds)) structures novices & model, identify similarities and differences


25

26

Feedback Report

Leximancer

27

Future work

  These are the concepts you mentioned the most ……   From your peers these are the most mentioned concepts ………   The differences are: ….   This means that you might find useful to •  Read this material •  Do this activity •  Contact this person

28

Questions?

•   emergent model (representation, number, quantitative metrics) •  Validation of the reliability and usability emerging expert map & report •  Design and develop service v.1 •  Pilot with medical students (English) 29

Question mark photo by Leo Reynolds. Licensed under Creative Commons.

30

5

8/07/09

Contact:
 [email protected]
 or
 [email protected]

 Project
website:
 www.lGll‐project.org
 PublicaVons:
DSpace
 dspace.ou.nl/simple‐search?query=LTfLL


Comparison of expert and student map Pathfinder 31

6

8/07/09

Introduc)on
 Lexical
similarity
metrics
for
vocabulary
 learning
modeling
in
Computer‐Assisted
 Language
Learning
(CALL)


•  The
L1
can
create
a
basis
for
learning
the
 vocabulary
of
an
L2:
the
L1
lexicon
helps
the
 learner
to
infer
the
meanings
of
words
in
L2


Ismael
ÁVILA
and
Ricardo
GUDWIN
 University
of
Campinas


•  Techniques
to
compare
the
word‐level
distance
 between
L1
and
L2
are
necessary
to
model
this
 cross‐linguisLc
influence
(incl.
quanLtaLvely)


Introduc)on


Introduc)on


•  With
this
metric
an
ITS
can
anLcipate
which
L2
 words
are
more
easily
learned
due
to
transfers
 from
L1
and
which
ones
produce
interferences


•  We
present
here
a
technique
for
measuring
 lexical
similarity
in
terms
of
its
effect
on
the
 learners’
perceptual
ability
in
recognizing
L2
 words
with
the
help
of
L1
lexicon


•  The
ITS
can
use
this
metric
to
iniLalize
the
LM
 or
to
sequence
the
lexical
units
in
terms
of
 their
easiness
to
a
parLcular
L1‐audience


Lexical
similarity


Lexical
similarity


•  Lexical
similariLes
may
be
due
to:


•  The
similarity
level
has
two
main
parallel
 dimensions:
orthographic
and
phoneLc.
Each
 of
them
may
vary
from
a
level
of
“no
 similarity”
to
a
level
of
“absolute
match”.


•  Regardless
of
their
origins,
these
similariLes
 affect
the
language
learning
process
and
have
 to
be
considered
by
the
ITS


•  DirecLon
(en)
↔
DirecLon
(fr)
 •  House
(en)
↔
Haus
(de)
 •  Casa
(it)
↔
Casa
(pt)


•  Common
origin:
e.g.
Spanish
“corazón”
and
 Portuguese
“coração”

 •  Borrowings:
e.g.
Japanese
“arigato”
and
 Portuguese
“obrigado”

 •  Coincidences:
e.g.
Greek
“oikia”
and
Tupi
“oca”


1

8/07/09

Methods
to
measure
string
distance


Methods
to
measure
string
distance



•  Levenshtein
distance
uses
the
minimum
number
 of
inserLons,
deleLons
and
leaer
subsLtuLons
 to
transform
one
string
into
another:
 LD(s1,
s2)
=
min
(nins
+
ndel
+
nsubst)


•  The
Levenshtein
distance
leads
to
slightly
beaer
 classificaLon
accuracy
but
the
Feature
distance
 allows
for
much
faster
searching.


•  Feature
distance
is
given
by
the
number
of
 features
(usually
N‐grams,
substrings
of
N
 consecuLve
leaers)
in
which
two
strings
differ:
 FD(s1,
s2)
=
max
(N1
+
N2)
–
m(s1
+
s2)


•  To
account
for
the
fact
that
one
leaer
change
is
 more
relevant
in
short
words
than
in
long
ones,
 normalized
versions
of
LD
have
been
used.


Lexical
similarity
&
language
proximity



Lexical
Similarity:
perceptual
aspects


Where:
N1
and
N2
are
the
number
of
N‐grams
in
s1
and
s2

 and
m(s1
+
s2)
is
the
number
of
matching
N‐grams


•  An
automated
method
avoids
the
subjecLvity
 that
is
inherent
in
human‐made
comparisons:
 e.g.
Gala
(el)
↔
Leche
(es)
 •  We
want
to
measure
effecLve
similarity,
not
 linguisLc
kinship,
for
similarity,
even
accidental,
 is
what
maaers
for
learning
easiness.


•  A
wriaen
or
printed
word
is
a
visual
sLmulus
in
 the
first
place.
 •  Word
recogniLon
is
easier
aher
fixaLon
of
the
 lehmost
than
the
rightmost
leaer
of
a
word
 (the
iniLal
in
many
languages).
 •  FixaLon
on
the
lehmost
leaer
makes
the
whole
 word
fall
in
the
right
visual
half‐field,
in
direct
 connecLon
to
the
dominant
leh
hemisphere.


Lexical
Similarity:
perceptual
aspects


Lexical
Similarity:
semio)c
aspects


•  Word
processing
accuracy
and
speed
depend
on
 two
factors:



•  IntuiLve
word
recogniLon
factors
are
used
as
a
 common
sense
technique
when
we
create
 abbreviaLons:
tks
(thanks),
pg
(page),
cmd
 (command)
or
ctrl
(control).


•  PercepLbility
of
the
individual
leaers
as
a
funcLon
 of
the
fixaLon
locaLon
 •  The
extent
to
which
the
most
visible
leaers
isolate
 the
target
word
from
its
competitors


•  The
lehmost
leaers
have
a
special
role
in
word
 recogniLon
(isolaLon
from
compeLtors).
 •  Reading
and
word
recogniLon
are
not
simply
 based
on
orthographic
informaLon,
but
involve
 the
acLvaLon
of
phonological
codes.


•  Matching
iniLals
and
consonants
is
more
likely
 to
enable
word
recogniLon
than
matching
the
 same
number
(same
LD)
of
other
leaers
 without
the
iniLal
or
with
vowels
included:
 (resp.
tak,
ae,
oma,
coto).


2

8/07/09

Lexical
Similarity:
semio)c
aspects
 •  The
recogniLon
of
an
L2
word
due
to
a
similarity
 with
correlated
L1
words
is
an
inference
based
 on
diagrammaLc
(iconic)
features.
 •  This
“intersymbolic
iconicity”
explains
all
the

 recogniLons
based
on
similarity,
regardless
of
 their
cause:
common
origin,
borrowings
or
 simple
coincidence.


Lexical
Similarity:
semio)c
aspects
 Slon (cz) ??? Elefant (dn) Elefante (pt)

The
proposed
LS
metric


The
proposed
LS
metric


•  In
our
technique
we
assign
more
value
to
the
 diagrammaLc
role
of
consonants
than
to
other
 matchings
and
emphasize
the
role
of
iniLals.


•  Weights
are
adjusted
so
that
the
maximum
 similarity
is
1
(totally
matching
words)
and
the
 minimum
is
0
(totally
different
words).


The
equaLon
for
intersymbolic
similarity
is:


•  It
may
be
necessary
to
normalize
consonants
 and
clusters
to
a
same
notaLon:
for
instance,
 “š”,
“ŝ”
and
“sch”
to
“sh”.
 •  The
comparisons
of
the
consonant
or
vowel
 sequences
consider
leaer
groupings
such
as
 “cntrl”
or
“oo”.


IS
=
α(γ1I
+
γ2C
+
γ3V)
+
βP















(1)
 Where: 
IS:
intersymbolic
similarity
(maximum
=1,
minimum
=
0)
 
I:
iniLals
 
C:
consonants
 
V:
vowels
 
P:
phonemes
(can
be
decomposed
as
the
orthographical
part:
γ4I
+
γ5C
+
γ6V)
 
α:
weight
of
the
orthographical
similarity
(adjusted
according
to
the
context)
 
β:
weight
of
the
phoneLc
similarity
(adjusted
according
to
the
context)
 
γn:
weights
of
factors
of
similarity
(e.g.
γ1=0.4;
γ2=0.4;
γ3
=0.2)
 
α
+
β
=
1
and
γ1
+
γ2
+
γ3
=
1
and
γ4
+
γ5
+
γ6
=
1


The
proposed
LS
metric
 Example:
The
intersymbolic
similariLes
of
the
Italian
 word
“tempo”
respecLvely
to
speakers
of
Portuguese,
 Spanish,
English,
German
and
Finnish
are:
 L1
(tempo)→L2
(tempo):


IniLals:
t=t;
Consonants:
tmp=tmp;
Vowels:
eo=eo
 
IS
=
0.6*(0.4*1+0.4*1+0.2*1)+0.4*1
=
1
 L1
(tempo)→L2
(Lempo):

IniLals:
t=t;
Consonants:
tmp=tmp;
Vowels:
eo≈ieo
 
IS
=
0.6*(0.4*1+0.4*1+0.2*0.66)+0.4*0.9
=
0.92
 L1
(tempo)→L2
(Lme):
 
IniLals:
t=t;
Consonants:
tmp≈tm;
Vowels:
eo≠ie
 IS
=
0.6*(0.4*1+0.4*0.66+0.2*0)+0.4*0.4
=
0.48
 L1
(tempo)→L2
(Zeit):
 
IniLals:
t≈Z(ts);
Consonants:
tmp≈Zt;
Vowels:
eo≈ei
 
IS
=
0.6*(0.4*0.5+0.4*0.16+0.2*0.33)+0.4*0.2
=
0.28
 L1
(tempo)→L2
(aika):
 
IniLals:
t≠a;
Consonants:
tmp≠k;
Vowels:
eo≠aia





 
IS
=
0.6*(0.4*0+0.4*0+0.2*0)+0.4*0
=
0


The
proposed
LS
metric
 Original
word:
“physics”
 






transformaLons
 
 
to
Czech
“fizyka” 
(sisssss) 
LD=13
 
 
to
Polish
“fyzika” 
(sixsxss) 
LD=9
 
 
to
Afrikaans
“fisika”
 
(sisxxss)
 
LD=9
 
 
to
Italian
“fisica”
 
(sisxxxs) 
LD=7
 
 
to
French
“physique”
 
(xxxxxssi) 
LD=5
 The
results
for
intersymbolic
similarity
are:
 
IS1
=
0.6*(0.4*0.8
+
0.4*0.65
+
0.2*0.8)
+
0.4*0.8
=
0.764
 
IS2
=
0.6*(0.4*0.8
+
0.4*0.65
+
0.2*0.9)
+
0.4*0.8
=
0.776
 
IS3
=
0.6*(0.4*0.8
+
0.4*0.72
+
0.2*0.8)
+
0.4*0.8
=
0.781
 
IS4
=
0.6*(0.4*0.8
+
0.4*0.80
+
0.2*0.8)
+
0.4*0.8
=
0.800
 
IS5
=
0.6*(0.4*1.0
+
0.4*0.90
+
0.2*0.9)
+
0.4*0.8
=
0.884


3

8/07/09

The
proposed
LS
metric


Conclusions


•  Whereas
LD
measured
distances
ranging
from
 5
to
13,
the
IS
produced
similar
scores
for
the
 five
L2
words,
arguably
because
the
technique
 can
capture
the
fact
that
all
words
are
more
or
 less
recognizable
based
on
the
original
word.


•  We
believe
that
the
IS
captures
the
crucial
 features
that
make
a
word
more
easily
 recognizable
by
learners.
 •  We
can
assume
that
there
is
a
threshold
below
 which
the
recogniLon
will
no
longer
be
 possible
(based
on
IS).
 •  A
field
study
is
being
designed
to
invesLgate
 how
this
threshold
relates
to
the
lexicon
of
 each
subject’s
L1
and
to
other
known
L2s.



•  Conversely,
an
opposite
situaLon
in
which
two
 words
produce
smaller
LD,
but
score
worse
on
 IS,
would
be:
“glamour”
(en)
and
“amour”
(fr),
 whose
LD=2
is
smaller,
but
whose
IS=0.52
 indicates
less
actual
similarity.


Conclusions
 •  This
technique
is
aimed
to
offer
a
pracLcal
 word‐level
similarity
metric
to
compare
words
 from
different
languages
so
that
this
measure
 can
be
used
as
an
input
to
iniLalize
the
LM
or
 to
evaluate
word‐level
errors
in
the
context
of
 CALL
applicaLons.
It
is
not
aimed
to
replace
 other
formalisms,
neither
to
create
new
 computaLonal
treatments
of
lexical
rules.


4

8/07/09

Outline

Cohesion, Semantics and Learning in Reflective Dialog

 

Motivation: why study cohesion? - 

A way to study Interactivity in tutorial dialog

- 

Previous work: automatic “lexical” cohesive ties  

Arthur Ward, John Connelly, Sandra Katz, Diane Litman, Christine Wilson Learning Research and Development Center University of Pittsburgh

now try more sophisticated measure

 

Tag Definitions: Set of “semantic” cohesive ties

 

Corpus: Pre/post-tests & transfer questions

 

Applying the Tags

 

Results - 

Abstraction & Specialization important for learning  

And transfer 2

Cohesive Ties

Interactivity in Tutorial Dialog  

 

Human tutorin g

- 

Maybe be c ause it is interactive (Chi et al. 2001, 2008; Graesser et.al 1995) What specific interactive mechanisms help? -  Other

Counted coh e sive ties between tutor & student  

3

 

Current work - 

-   

Man u a lly tag for cohesive ties not automatically identifiable In a different corpus

Like before, focus on when tut o r and student refer to each other's contributions 5 - 

4

Correlated with learning, Automatically computable

But missed many of Halliday & Hasan's cohesive devices

Cohesive Ties  

(Ward & Litman 2006, 2008)‫‏‬

Repetition of words, w ord stems, hyponym/hypernyms (identified using WordNet)‫‏‬ - 

ways to study interactivity in dialog

(Halliday & Hasan 1976)‫‏‬

Repetition of words, use of pronouns, ellipsis, etc...

Previous work - 

is very effective (Bloom 1984; Cohen Kulik & Kulik 1982) Why?

 

Measurable using “cohesive ties”  

 

 

Cohesion: how a text “hangs together”

Cohesion Tag Set  

Exact:

word or word stem repetition

 

Synonym:

two words with similar meanings

 

Paraphrase:

phrase repetition w/substitution

 

Pronoun:

pronominal reference (“she” “it”)‫‏‬

 

Superordinate-class: more general referring term

 

Class-member:

 

Collocation:

complementarity (“up-down”)‫‏‬

 

Negation:

direct contradiction

more specific referring term

6

Lexical ties (eg word repetition, like before)‫‏‬

1

8/07/09

The Corpus  

Reflec ti v e tutoring dialogs with a human tutor (Katz et. al 2003)‫‏‬ - 

 

The Corpus

 

After problem solving in Andes (vanLehn et.al 2005)‫‏‬

Study procedure: - 

16 Students solved 12 physics problems each

- 

Answered 3-8 reflection questions  

 

Resulting corpus has 953 reflective dialogs - 

2,218 student turns

- 

2,136 tutor turns

Counter-balanced pre & post-tests  

9 quantitative mechanics questions

 

27 qualitative physics questions

- 

Exampel:“Supposeh temaxm i ume tnsoinh tath tebungeecordcoudl mania tni wh tioutsnappnigwas700N.Whatwoudl 7 happe n

Cohesion Tag Example

 

similar to Andes problems

- 

new questions, not like Andes problems

- 

“far transfer” questions

Students learned significantly by both measures

8

Cohesion Tag Example

9

Cohesion Tag Example

10

Cohesion Tag Example

11

12

2

8/07/09

Cohesion Tag Example

Cohesion Tag Example

13

14

Cohesion Tag Example

Cohesion Tag Example

15

16

Tagging the Corpus  

-   

 

Final Tagging Example

Training: 518 student & turns  

Refining tag definitions

Initial tagging pass - 

Lexical features only

- 

Spans agreed by discussion

 

Final tagging pass - 

Re-evaluated 3 tags, using contextual features:  

- 

 

“superordinate-class,” “class-member,” “collocation”

Eliminated ties that didn't make sense  

Mis-matched topics or referents

 

didn't seem to involve knowledge construction

-  2nd  

T: “Good, that's right. What about in the horizontal directions? for example the 'x' direction on your diagram?” In first pass: tagged lexical relations - 

without reference to semantic context

- 

“down” is a specific “direction”  

tagger re-tagged random 10% Kappa = .57

S: “yes, because gravity pulls the firecracker down and gives it motion in the `y` direction.

17

so tag down-direction as “superordinate-class” 18

3

8/07/09

Final Tagging Example  

 

 

Analysis

S: “yes, because gravity pulls the firecracker down and gives it motion in the `y` direction.

 

Linear Model for each cohesion tag - 

T: “Good, that's right. What about in the horizontal directions? for example the 'x' direction on your diagram?”

 

pre-test score

 

Standardized math score

 

Tag count

- 

- 

In second pass: - 

notice that student already used “direction”

- 

Tutor did not do new generalization  

Predict post-test score from:

- 

- 

remove the tag 19

because correlated with post-test score useful predictor of learning in Andes normalized by #of student or tutor turns

Separate models for:  

high pre-testers, low pre-testers, all students

 

qual (“near”), quant (“far”) & all questions

Analysis  

Results

Linear Model for each cohesion tag - 

Example for “student superordinate-class” tag  

All students, all questions

21

“T:” = Tutor “S:” = Student “Super-Ord” = superordinate-class

Discussion  

 

20

 

Current work suggests cohesion also correlates in new corpus

- 

abstraction/specialization seem to be important cohesive mechanisms in tutoring

- 

- 

 

Example - 

S: “No the force the airbag exerts back on the man after he goes into is one.”

- 

T: “The airbag force and the force of the person on the airbag is such a pair. good. All forces come in such pairs! What is the 'reaction force' for the driver's weight?”

Overlapping spans: - 

“force”-”forces” : exact

no results for “exact” in this corpus

- 

“force the airbag exerts” - “airbag force”: paraphrase

span identification is the hardest part

- 

“force the airbag exerts back on the man” -”pair”: superordinate class

“semantic” ties correlate  

22

Span Identification is Hard

Previous work showed that automatic measures of cohesion correlated with learning

- 

“Class-mem” = class-member

23

24

4

8/07/09

Span Identification is Hard  

 

Span Identification is Hard

Example

 

Example

- 

S: “No the force the airbag exerts back on the man after he goes into is one.”

- 

S: “No the force the airbag exerts back on the man after he goes into is one.”

- 

T: “The airbag force and the force of the person on the airbag is such a pair. good. All forces come in such pairs! What is the 'reaction force' for the driver's weight?”

- 

T: “The airbag force and the force of the person on the airbag is such a pair. good. All forces come in such pairs! What is the 'reaction force' for the driver's weight?”

Overlapping spans:

 

Overlapping spans:

- 

“force”-”forces” : exact

- 

“force”-”forces” : exact

- 

“force the airbag exerts” - “airbag force”: paraphrase

- 

“force the airbag exerts”-“airbag force”: paraphrase

- 

“force the airbag exerts back on the man” -”pair”: superordinate class

- 

“force the airbag exerts back on the man” -”pair”: superordinate class

25

Span Identification is Hard  

 

Span Identification is Hard

Example

 

- 

S: “No the force the airbag exerts back on the man after he goes into is one.”

- 

T: “The airbag force and the force of the person on the airbag is such a pair. good. All forces come in such pairs! What is the 'reaction force' for the driver's weight?”

Example - 

S: “No the fo r ce the airbag exerts back on the man after he goes into is one.”

- 

T: “The airbag force and the force of the person on the airbag is such a pair. good. All forces co m e in such pairs! What is the 'reaction force' for the driver's weight?”

Overlapping spans: - 

“force”-”forces” : exact

- 

“force the airbag exerts” - “airbag force”: paraphrase

- 

“force the airbag exerts back on the man” -”pair”: superordinate class

 

Overlapping spans

27

- 

Spans often don't correspond to syntactic structures

- 

words often participate in >1 span

- 

Spans are sometimes split (those forces)‫‏‬

Future Work  

 

maybe don't need accurate spans?

Could improve student models by detecting student abstraction Could improve tutoring by including more tutor abstraction/specialization at appropriate places - 

28

Thanks

Investigate automatic detection - 

 

26

 

Learning Research & Development Center

 

ONR N000140710039

 

The ITSpoke group

 

Pam Jordan

what's an appropriate place?

29

30

5

8/07/09

Intelligent Tutoring Systems

Speling Mistacks & Typeos: Can Your ITS Handle Them? Adam M. Rennera Philip M. McCarthyb Chutima Boonthumc Danielle S. McNamaraa aUniversity

of Memphis, Psychology / Institute for Intelligent Systems bUniversity of Memphis, English / Institute for Intelligent Systems cHampton University, Computer Science

ITS User-Language   Contains high rate of typographical & grammatical errors   Not a new issue in NLP

  Traditional spellchecking not suitable (e.g., MS Word, email)   ITSs necessitate automatic corrections   Why2-Atlas (VanLehn et al., 2002)   CIRCSIM-Tutor (Elmi & Evens, 1998)   Many more just ignore errors

  NLP tools thought resistant to errors          

LSA (Landauer et al., 2007) – semantic overlap across two whole texts Short responses? Responses with multiple errors? NLP tools trained on edited text When used in ITS, similarity assessment inevitably affected

User-Language Paraphrase Corpus   1998 target sentence/student response pairs   Paraphrase attempts by high school students   During interactions with iSTART (McNamara, Levinstein, & Boonthum, 2004)

  Paraphrases evaluated on widely used computational indices   Latent semantic analysis (LSA; Landauer, McNamara, Dennis, & Kintsch, 2007)   Entailment (Rus et al, 2007)   Type-Token Ratio (TTR; Graesser, McNamara, et al., 2004)   Mean Edit Distance (MED; McCarthy et al., 2007)

  Provide assessment of user input   Guided feedback based on user’s response   Many ITSs use conversational dialogue   NLP for assessment and determines feedback   Input matched to benchmark   Assessed for similarity

  Assessment limited to proficiency of user   High school students or younger   Make typing errors/spelling mistakes

  What the student intended

Problems with Evaluating User-Language   Lack of “colloquial” paraphrase corpora   Microsoft Research Paraphrase Corpus (Dolan, Quirk, & Brockett, 2004) –  Only binary rating (is/is not a paraphrase)

  Echo Chamber (Brockett & Dolan, 2005)   Paraphrase Game (Chklovski, 2005)

  Limitations in “cleaning” ITS input   Datasets artificially created (Fossati & Di Eugenio, 2008)   Target populations are relatively proficient –  Why2-Atlas: College undergraduates –  CIRCSIM-Tutor: 1st year medical students

  Use lexicons; computationally expensive

Research Questions   How are established computational indices affected by the types of errors found in typed user-language?   Do user errors affect NLP assessment and feedback produced by an established ITS?   Does correcting user errors improve the capacity for ITS assessment to correspond to human ratings?

  Paraphrases also evaluated by trained experts on 10 dimensions w/ Likert ratings

1

8/07/09

iSTART Evaluation Process

iSTART   High school students (U.S. grades 9-12)   Reading strategy training   Paraphrasing, Elaboration, Making Bridging Inferences, Comprehension Monitoring

  Paraphrase the following:   Over two thirds of the heat generated by a resting human is created by organs of the thoracic and abdominal cavities and the brain.   a lot of heat made bya lazy person is made by systems of your stomack and thinking box.

  Based on match between paraphrase and target sentence   Respond to or remove Frozen expression   e.g., I think this is saying…

  Word & Soundex matching against benchmark for length, relevance, & similarity   Irrelevant (IRR) – too few words match   Too short (SH) – response is shorter than specified threshold   Too similar (SIM1) – length and word match is close to benchmark

  Word match & LSA cosines for quality   Adequate paraphrase (SIM2)   Better than a paraphrase (OK) Detailed formulae – McNamara, Boonthum, et al. (2007)

Soundex

Procedure

  Compensates for misspellings (Christian, 1998)   Vowels removed   Like-sounding consonants mapped onto same symbol

  Identified, coded, & corrected all errors   Based on validated models of grammar (e.g., Foster & Vogel, 2004)

  Interrater agreement for subset (n = 200)   Kappa = .70, p < .001   Single rater coded entire corpus

  e.g., b, f, p, v

  Lexicon-free   Word frequency problem   Students make more mistakes on new or uncommon words

  83% of responses contained some form of error   52% had some form of spelling error   63% of spelling errors were internal to target sentence

Error types & frequencies Spelling (internal) Spelling (external) Capitalization S-V Agreement Article agreement Preposition agreement Determiner agreement Spacing Punctuation Conjunction agreement Possessive agreement Extra/omitted/substitute

665 (33%) 386 (19%) 1157 (58%) 367 (18%) 75 (4%) 53 (3%) 59 (3%) 174 (9%) 344 (17%) 43 (2%) 71 (4%) 230 (12%)

Results   Significant effect of error correction on computational similarity indices   Partial Eta2 = –  LSA –  Entailment –  TTR –  MED

.178 .268 .240 .111

  Spelling internal accounts for large portion of variance   Adjusted R2 = –  LSA –  Entailment –  TTR –  MED

.35 .45 .46 .17

2

8/07/09

Results

Example Target Sentence: An increase in temperature of a substance is an indication that it has gained heat energy. Student response: increace in tempiture has gaind heat energy. Revised response: Increase in temperature has gained heat energy.

LSA .54 → .90 Entailer .41 → .78 TTR .86 → .62 MED .78 → .60

Results   Compared iSTART feedback’s correspondence to human ratings of Paraphrase Quality   Removed cases that required no correction or were entirely garbage   n = 328

  Separate ANOVAs for original and corrected        

Dependent – Paraphrase Quality Fixed Factor – iSTART response Original paraphrases, F (5, 1636) = 53.324, p < .001 Corrected paraphrases, F (5, 1636) = 58.543, p < .001

  Table 1: Crosstabulation of iSTART responses to user paraphrases iSTART response – corrected Too Too Better Good Similar Short Irrelevant Frozen Total Better 691 45 37 4 0 0 777 Good 12 194 98 0 0 0 304 iSTART response Too Similar 7 7 527 0 0 0 541 original paraphrase Too Short 11 0 1 206 2 1 221 Irrelevant 6 0 0 6 120 7 139 Frozen 0 0 0 0 0 16 16 Total 727 245 663 216 122 24 1998   Cramer’s V = .849, p < .001   Marginal Homogeneity (MH) = 5.892, p < .001

Results   Separate pairwise comparisons of Paraphrase Quality Original Mean Diff. SE Sig.a Irrelevant .152 .402 1 Too short -.776 .370 .581 Too Sim -1.955 .363 < .001 Good -2.071 .366 < .001 Better -1.897 .361 < .001 Irrelevant Too short -.918 .209 < .001 Too Sim -2.107 .196 < .001 Good -2.223 .203 < .001 Better -2.0249 .192 < .001 Too short Too Sim -1.189 .115 < .001 Good -1.305 .127 < .001 Better -1.131 .111 < .001 Too similar Good -.116 .103 1 Better .058 .082 1 Good Better .174 .097 1 a Adjustment for multiple comparisons: Bonferroni. Frozen

Corrected Mean Diff. SE .081 .361 -.922 .299 -2.176 .288 -2.421 .297 -2.106 .288 -1.002 .245 -2.257 .231 -2.502 .242 -2.187 .231 -1.255 .112 -1.500 .133 -1.185 .111 -.245 .107 .070 .077 .315 .106

Sig.a 1 .032 < .001 < .001 < .001 .001 < .001 < .001 < .001 < .001 < .001 < .001 .331 1 .044

Discussion   ITS feedback algorithms may be optimized if user-language can be filtered prior to processing   Misclassification OK for motivation   Accuracy not OK: simple rewording can pass for good paraphrase; paraphrase can pass for better

  Established NLP approaches not as robust to user-language as believed   Response length not enough to wash out individual errors   ULPC represents types & amount of errors real students make

  Most variance accounted for by internal misspellings   Provides direction for future research   Automatic spelling corrections only for words in the benchmark   Will be silent & computationally light

Thank you!   We would like to thank: Vasile Rus Ben Duncan John Myers Rebekah Guess   Research supported by:

IIS-0735682

R305A080589

3

=%%9&*/.&(0$$(K$/0$#%&'()&*$+#+(,-$+()#9&02$ %#,'%#*.&X#$(0$.#1.$*/.#2(,&3/.&(0$;'&02$5/0)(+$ 60)#1&02 !

!"#$#%&'()&*$+#+(,-$+#./%"(,$&0$.#1.$ */.#2(,&3/.&(0$4&."$5/0)(+$60)#1&02

Q,(+$#%&'()#'$.($*(0*#%.' !

7/00$8&2&9#$:(/,#/;$9$?"/9&

!

@:=,.$A$B;.&0$CD/,&'$EF>D:>G$Q!LM$.#1.W+&0&02$*(0.#'.

!

5#';9.'

!

D#,'%#*.&X#'

J#K.LMN$D/,&'$9#$OO$P;&0$OLLM

!"#$%&'()$K,(+$C:&03.+/0N$YMEEG

!"#$%&'()$K,(+$C:&03.+/0N$YMEEG

"

#

=%%9&*/.&(0$$(K$/0$#%&'()&*$+#+(,-$+()#9&02$ %#,'%#*.&X#$(0$.#1.$*/.#2(,&3/.&(0$;'&02$5/0)(+$ 60)#1&02

>KK#*.$$(K$K,#T;#0*-$(K$#%&'()#'$(0$#*"($C:&03.+/0N$ YMEEG !

Q,(+$#%&'()#'$.($*(0*#%.' !

!

S#/0$/0)$X/,&/0*#$(K$#*"($&0*,#/'#$4&."$K,#T;#0*-

!

$

=$K/+(;'$+()#9$(K$#%&'()&*$+#+(,-R$S6H>58=$O "

J#'*,&%.&(0

"

I&+;9/.&(0$(K$."#$#KK#*.$(K$K,#T;#0*-$#%&'()#'

=$U(,)$8#*.(,$+()#9R$5/0)(+$60)#1&02 "

J#'*,&%.&(0

"

!#1.$*/.#2(,&3/.&(0$/0)$."#$/%%9&*/.&(0$(K$."#$)&'.,&V;.&(00/9$ "-%(."#'&'

!

!"#$J>Q!LM$.#1.W+&0&02$*(0.#'.

!

5#';9.'

!

D#,'%#*.&X#'

%

=$+()#9$(K$U(,)$8#*.(,$R$5/0)(+$60)#1&02 !

=$+()#9$(K$U(,)$8#*.(,$R$5/0)(+$60)#1&02

!"#$*(++(0$%,&0*&%9#'$V#"&0)$U(,)$8#*.(,'R$ !

6+%9#+#0.&02$)&'.,&V;.&(0/9$"-%(."#'&'

!

J#/9&02$B/,2#$*(,%(,/

!

U(,Z&02$(0$/$*(0.#1.$4&0)(4$$

!

! !

!

@,#/.#$/$+/.,&1$'$C*(1(#GN$$*(0./&0&02$+,*-./0-1234 "

*$&'$."#$0;+V#,$(K$)(*;+#0.'$(,$*(0.#1.'$ *(,,#'%(0)&02$.($."#$*(,%;'

[;&9)&02$/$+/.,&1$."/.$"(9)$."#$;'#'$(K$4(,)'$&0$K;0*.&(0$ (K$."#&,$*(0.#1.' 5#);*&02$."#$+/.,&1

"

#N$."#$0;+V#,$(K$)&+#0'&(0'$CH$]$YLLLG

"

60)#1WX#*.(,$/,#$'%/,'#$/0)$,/0)(+9-$2#0#,/.#)$ X#*.(,'^$!"#-$*(0'&'.$&0$K#4$0;+V#,'$(K$$_Y$/0)$WY$ $

\'&02$X#*.(,&/9$+#."()'$.($+/0&%;9/.#$4(,)'$(,$2,(;%'$ (K$4(,)'$

/0)$!"#$%&$'()*(+

&

'

=$+()#9$(K$U(,)$8#*.(,$R$5/0)(+$60)#1&02

!

=$+()#9$(K$U(,)$8#*.(,$R$5/0)(+$60)#1&02 Q(,$#/*"$)(*;+#0.$(K$."#$*(,%;'N$#/*"$.&+#$/$.#,+$2(

@,#/.#$/$+/.,&1$5(62(1(#7$*(0./&0&02$."#$2-48/0-12349 "

2((&'$."#$0;+V#,$(K$.#,+'$*(+%('&02$."#$*(,%;'

"

!"#$%,(*#''$&'$&0*,#+#0./9^$!($'./,.$."#$+/.,&1$

/%%#/,'$&0$/$)(*;+#0.$*; #

=**;+;9/.#$."#(+,*-.(0-1234(*(,,#'%(0)&02$.($."#$ )(*;+#0.(*($.($."#$2-48(0-1234$*(,,#'%(0)&02$.($."#$

*(+%&9/.&(0N$$/99$*#99'$X/9;#'`$/,#$&0&.&/9&3#)$.($:

.#,+$2

(

)*

=$+()#9$(K$U(,)$8#*.(,$R$5/0)(+$60)#1&02

=$+()#9$(K$U(,)$8#*.(,$R$5/0)(+$60)#1&02 !

=.$."#$#0)$(K$."#$%,(*#''N$ $2-48(0-12349$."/.$/%%#/,#)$&0$ '&+&9/,$*(0.#1.$"/X#$/**;+;9/.#)$'&+&9/,$+,*-.(0-12349
Q!LM$.#1.W+&0&02$*(0.#'.

!

5#';9.'

!

D#,'%#*.&X#'

J>Q!`LM$+/&0$./'Z$4/'$a%&0&(0$*/.#2(,&3/.&(0R I;VP#*.&X&.-FaVP#*.&X&.-$)#.#*.&(0$&0$+;9.&W9&02;/9$P(;,0/9$ *(,%(,/$CK,N$#0N$&.G

!

bLc$(K$."#$*(,%(,/$K(,$.,/&0&02

!

B&+&.#)$.&+#$.#'.$%#,&()$Cd$)/-'G

)"

)#

>KK#*.$$(K$K,#T;#0*-$(K$#%&'()#'$(0$#*"($C:&03.+/0N$ YMEEG

D,&0*&%9#'

!

=$Q,#0*"$!#1.$S&0&02$*(0.#'.$

!

!

!

!

!

[;&9)$/$'#+/0.&*$+#+(,-$K,(+$/99$."#$/X/&9/V9#$ #%&'()#' a,2/0&3#$#%&'()#'$&0$*/.#2(,&#'$K(99(4&02$%,&0*&%9#'$ (K$#%&'()&*$+#+(,-$+()#9' "

I%9&..&02$."#$*/.#2(,&#'$&0.($"(+(2#0#(;'$';VW */.#2(,&#'$,#2/,)&02$."#&,$2=>+1?@+2=

!"#$*,#/.#)$';VW*/.#2(,&#'$/,#$*(0'&)#,#)$/'$/$9(*/9$ #%&'()&*$+#+(,&#'

)$

S#/0$/0)$X/,&/0*#$(K$#*"($&0*,#/'#$4&."$K,#T;#0*-

)%

=''&20&02$/$*/.#2(,-

D,&0*&%9#'

)&

)'

=%%9&*/.&(0$$(K$/0$#%&'()&*$+#+(,-$+()#9&02$ %#,'%#*.&X#$(0$.#1.$*/.#2(,&3/.&(0$;'&02$5/0)(+$ 60)#1&02 !

5#';9.'

Q,(+$#%&'()#'$.($*(0*#%.' !

!

=$K/+(;'$+()#9$(K$#%&'()&*$+#+(,-R$S6H>58=$O "

J#'*,&%.&(0

"

I&+;9/.&(0$(K$."#$#KK#*.$(K$K,#T;#0*-$#%&'()#'

=>?@,4A3BAC32>?,5.F:G,AHI@JA K =>?@,4A3BAC:?,5-:35=>?@,4A3BAALM29,=>?@,4A3BAAF>@1258=$O "

J#'*,&%.&(0

"

I&+;9/.&(0$(K$."#$#KK#*.$(K$K,#T;#0*-$#%&'()#'

!

=%%9&*/.&(0$.($/$9/,2#,$(%&0&(0$*/.#2(,&3/.&(0$./'Z'R$ $ !5>@LMW[9(2$.,/*Z

=$U(,)$8#*.(,$+()#9R$5/0)(+$60)#1&02 "

J#'*,&%.&(0

"

!#1.$*/.#2(,&3/.&(0$/0)$."#$/%%9&*/.&(0$(K$."#$)&'.,&V;.&(00/9$ "-%(."#'&'

!

!"#$J>Q!LM$.#1.W+&0&02$*(0.#'.

!

5#';9.'

!

D#,'%#*.&X#'

!

!)

60.#,K/*&02$4&."$(."#,$U(,)WX#*.(,$+#."()'$CBI=N$ :=BN$eG

!!

D(''&V9#$/%%9&*/.&(0'$&0$>);*/.&(0 !

>);*/.&(0/9$,#'(;,*#'$+/0/2#+#0.R$ !

!

$

5#'(;,*#$,#.,&#X/9R$"#9%$;'#,'$.($)#.#,+&0#$."#$X/9;#$(K$ /0$#);*/.&(0$,#'(;,*#$CK/*.;/9$X'^$(%&0&(0G $

5#'(;,*#$*9/''&K&*/.&(0$&0$V(."$."#+/.&*$/0)$(%&0&(0$ )&+#0'&(0'$$

$

!

=''#''+#0.$(,$#''/-W'*(,&02R$ !

!"##$ %&'&()$ *+",)"-.$ /0&($ 1($ 23"(&.$ "#0$ 4)#&5$ 6)',+5$ !""#$%&'( )*+,-.,#/%'0( ',( %/,12$#-0/,-$3( 4'( ,'5,'0( /6'%( 7/34$8( 934'5-32.$ 4"#5$ ()5$ !"#$%& '$& (#$)*$+& '$& "),#-+$& '$& )./'*#*01& 2334& '-& '/5& 60-*))$& '$& #$7#$& 89:;.34.$ 77$ 8-&#$ 799:.$ ;",&5.$ !"##$%&'&()$*+",)"-$"#0$/0&($1($23"(&$(7/34$8(934'5-32(/34(,&'('"-0$4-%(8'8$#;( 8',/"&$#=( !""+-%/,-$3( ,$( ,'5,( %/,'2$#-