Aug 16, 2015 - System added href. Could be done: - As a post-process .... Email :
. Website:
Slides: bit.ly/kan-gcms15
Instructors, Learners and Machines: Learning instructor intervention from MOOC forums Muthu Kumar Chandrasekaran, Chencan Xu, Pengyu Li, Min-Yen Kan, Bernard C.Y. Tan, Kiruthika Ragupathi, & NUS-HCI Group
16 Aug 2015
Andrew Ng’s morning coffee
prologue
GCMS - Min-Yen Kan
2
16 Aug 2015
GCMS - Min-Yen Kan
Pictures Courtesy: 3 www.ige3.unige.ch, i.livescience.com & usatcollege.files.wordpress.com
16 Aug 2015
GCMS - Min-Yen Kan
1. Learning Instructor Intervention on MOOCs Teachers for Learners
2. Enabling Peer Annotations in MOOCs Learners for Learners
3. Automating Annotations in MOOCs Machine for Learners
4
16 Aug 2015
GCMS - Min-Yen Kan
1. LEARNING INSTRUCTOR INTERVENTION IN MOOCS Instructors for Learners
2. Enabling Peer Annotations in MOOCs Learners for Learners
3. Automating Annotations in MOOCs Machine for Learners
Chandrasekaran et al. (2015). Learning instructor intervention from MOOC forums: Early Results and Issues. Education Data Mining (EDM '15), Madrid, Spain.
5
16 Aug 2015 GCMS - Min-Yen Kan
6
16 Aug 2015
GCMS - Min-Yen Kan
Deliberate Practice: Problems with Scalability
7
MOOCs are at a huge scale and involve distance learning Discussion forums are respectively massive We need to do more with the resources we have
16 Aug 2015
GCMS - Min-Yen Kan
Successful Intervention
8
16 Aug 2015
GCMS - Min-Yen Kan
Scaling instructor intervention? Instructors cannot reply or even read every post on a MOOC forum Compelling pedagogical reasons to intervene, – But how much and when to intervene?
We propose a system to identify threads that merit an instructor’s attention! Practical Outcomes • Forum triage tools • Prescriptive guidelines for intervention
9
16 Aug 2015
GCMS - Min-Yen Kan
Freely Annotated Data!
10
16 Aug 2015
GCMS - Min-Yen Kan
Corpus
11
12
GCMS 16 A - ug Min-Yen 2015 Kan
D14 Corpus Forum type
All
Intervened
# threads
# posts
# threads
# posts
3,868
31,255
1,385
6,120
Lecture
2,392
13,185
1,008
3,514
Errata
326
1,045
134
206
Exam
822
6,285
405
1,721
Total
7,408
51,770
2,932
11,561
Homework
Data from 14 MOOCs (D14) from diverse subject areas with different numbers of threads and interventions. Feature study done using this corpus.
13
GCMS 16 A - ug Min-Yen 2015 Kan
D61 Corpus (scaled up) Forum type
All
Intervened
# threads Total
26,643
# posts
205,835
# threads
7,740
# posts
31,779
Data from 61 MOOCs (D61) is about 3 times larger. Our best set of features were tested on D61.
16 Aug 2015
GCMS - Min-Yen Kan
Classifier • Logistic regression classifier. • We use class weights w, to counter balance inherent class imbalance in this data. – Biases prediction towards majority class instances.
• Class weights are learned from the training set by greedily optimising for maximum F1 score.
14
16 Aug 2015
GCMS - Min-Yen Kan
New feature/marker: Forum type Encodes intervention priority as perceived by the instructor.
Ratio of intervened to non-intervened threads over D14 across the 4 forum types
15
16 Aug 2015
GCMS - Min-Yen Kan
New feature: Entity references to course materials
16
16 Aug 2015
GCMS - Min-Yen Kan
New feature: non-lexical references URLs Timestamps from videos
17
16 Aug 2015
GCMS - Min-Yen Kan
Other features • Unigrams (~98,000 unique terms) • Thread properties – Length: as #posts, comment, total; as # sentences. – Structure as average #comments / post.
• Affirmation of the original post by fellow students.
18
16 Aug 2015
19
GCMS - Min-Yen Kan
Forum type and other features improve significantly over unigrams Features
#
Precision
Recall
F1
1
Unigrams
41.98
61.39
45.58
2
1+forum type
41.36
69.13
48.01
3
2+lexical entity references
41.09
66.57
47.22
4
3+affirmations
41.20
68.94
47.68
5
4+thread_properties
42.99
70.54
48.86
6
5+# of sentences
43.08
69.88
49.77
7
6+non-lexical entity references
42.37
74.11
50.56
8
Ablating entity references
45.96
79.12
54.79
16 Aug 2015
GCMS - Min-Yen Kan
20
Predicting interventions is difficult. Performance varies widely. Intervention Ratio
Course
F1 Individual (20% test set)
F1 D14 (full course is test set)
ml-005
0.45
64.96
56.56
rprog-003
0.32
49.62
48.70
calc1-003
0.60
51.29
68.91
smac-001
0.17
25.00
33.26
compilers-004
0.02
14.28
4.91
maththink-004
0.49
63.56
63.29
medicalneuro-002
0.76
75.36
81.94
musicproduction-006
0.01
0.00
1.03
gametheory2-001
0.19
28.57
30.16
Average
0.36
41.59
45.54
Weighted Macro Avg
0.40
49.04
50.56
16 Aug 2015
GCMS - Min-Yen Kan
21
Does scaling up the corpus help? Corpus
P
R
F1
14 MOOCs
45.96
79.12
54.79*
61 MOOCs
42.80
76.29
50.96*
Varying intervention ratios makes training and test set distributions different * Uses the best performing feature set from the previous experiment: i.e., all except course refs
16 Aug 2015
GCMS - Min-Yen Kan
Limitations • • • •
Variation among courses on the # of threads Intervention decision may be subjective Simple baselines outperform learned models Previous results are not replicable
22
16 Aug 2015
GCMS - Min-Yen Kan
Diversity across courses
The # of threads and their intervention ratios in forums over D14
Diversity across different courses in volume of threads and interventions
23
16 Aug 2015
GCMS - Min-Yen Kan
Simple baselines work better F1 Individual courses (20% test set)
Course
F1 @100%R
F1 on D14 (full course is test set)
24
F1 @100%R
ml-005
64.96
63.79
72.35
61.83
rprog-003
49.62
47.39
48.55
49.31
calc1-003
51.29
74.83
70.63
75.33
smac-001
25.00
34.67
34.15
29.28
compilers-004
14.28
3.28
4.82
4.75
maththink-004
63.56
63.08
61.11
65.49
medicalneuro-002
75.36
88.66
78.06
85.67
musicproduction-006
0.00
4.35
1.09
1.72
gametheory2-001
28.57
45.16
27.12
30.56
Average
41.59
46.43
45.18
47.09
Weighted Macro Avg
49.04
51.51
54.79
53.22
16 Aug 2015
GCMS - Min-Yen Kan
Is intervention subjective?
Further, indicated by weak human annotator agreement among instructors (k=0.53).
25
16 Aug 2015
GCMS - Min-Yen Kan
26
Professor A Prefers not to intervene. Students use the forum for peer learning.
Photo credits: UCL Institute of Education. Used under Creative Commons License
16 Aug 2015
GCMS - Min-Yen Kan
27
Professor B Prefers to intervene as often as possible. To engage students and correct misconceptions.
Photo credits: UCL Institute of Education. Used under Creative Commons License
16 Aug 2015
GCMS - Min-Yen Kan
Variables that influence intervention • • • •
Course discipline and topic Time within the course Individual Instructor personality Availability
Working towards best practices for intervention
28
16 Aug 2015
29
GCMS - Min-Yen Kan
Future Work: Intervention framework roadmap Mitigates intervention subjectivity
Thread Ranking
Re-intervention
Role-based
Makes intervention decision at post-level
Optimises recommendations for instructor / TA
Real-time
GCMS - Min-Yen Kan
16 Aug 2015
30
Future Work: Annotation Plan Phase 0: Pilot
- Single Course
Phase 1: Small
- NUS MOOC data only
- Create novice annotation guideline
- Understand expert/ novice differences
- Test expert/novice annotation fidelity
- Refine novice annotation plans
Phase 2: Medium-scale - MOOC Consortium data spanning many disciplines - Run full scale novice crowdsourced annotations
Simplified Intervention Typology Peer Interventions • Feedback Request • Paraphrase • Juxtaposition • Refinement • Clarification • Completion
Instructor Interventions • Justification Request • Extension • Reasoning Critique • Integration / Summing up Replicable Annotatable by novice
Proposed by the team from a framework based on “Measuring the development of features of moral discussion” by M. W. Berkowitz and J. C. Gibbs, 1983, Merrill -Palmer Quarterly, 29, pp. 399-410, further refined by Teasley, 1999.
16 Aug 2015
Enabling implementation / model building
GCMS - Min-Yen Kan
31
GCMS - Min-Yen Kan
16 Aug 2015
Novice Annotation Can novices approximate expert annotation?
Working towards
– Other studies show mixed results, attributed to various factors
1. Students • Limited scalability, requires in-place annotation
2. Mechanical Turk • Use worldwide source of people’s spare time to annotate • Needs simple instructions that don’t take long to interpret • Must control for cheating
32
16 Aug 2015
GCMS - Min-Yen Kan
33
1. Learning Instructor Intervention on MOOCs Instructors for Learners
2. ENABLING PEER ANNOTATIONS FOR MOOCS Learners for Learners 3. Automating Annotations in MOOCs Machine for Learners
Monserrat et al. (2014) L.IVE: An Integrated Interactive Vide-based Learning Environment, ACM CHI 2014
16 Aug 2015
GCMS - Min-Yen Kan
34
Video
Current Platforms: Separated Learning
Forum
Assessment
16 Aug 2015
GCMS - Min-Yen Kan
L.IVE file descriptor
Outcome: Rich annotation possible by peers or instructors
35
16 Aug 2015
GCMS - Min-Yen Kan
1. Learning Instructor Intervention in MOOCs Instructors for Learners 2. Enabling Peer Annotations in MOOCs Learners for Learners
3. AUTOMATING ANNOTATIONS IN MOOCS Machine for Learners
3.1 NoteVideo 3.2 Automated Entity Linking Monserrat et al. (2013) NoteVideo: Facilitating Navigation of Blackboard-style Lecture Videos, ACM CHI 2013, 1139-1148
36
16 Aug 2015
GCMS - Min-Yen Kan
37
Distribution of Blackboard Activities … In a typical Khan Academy video
16 Aug 2015
GCMS - Min-Yen Kan
38
16 Aug 2015
GCMS - Min-Yen Kan
39
User Study (n=15)
Significantly better at 3 of 4 tasks
Error Distance comparable
16 Aug 2015
GCMS - Min-Yen Kan
40
Final: Design Implications Scrubber: Shows sequence / flow of visual action • Cannot determine information by random access • Small thumbnail • bigger thumbnail = bigger bandwidth Transcript: Allow search of text not easily identifiable in visual objects • Only highlights hits and still shows unrelated transcript • Mapping between text and visual object can not retrieved in a glance NoteVideo: Spatial layout of visual objects that facilitates random access • Sequence of play not always clear • Difficult to find information if there is no clear visual cue
16 Aug 2015
GCMS - Min-Yen Kan
1. Learning Instructor Intervention in MOOCs Instructors for Learners 2. Enabling Peer Annotations in MOOCs Learners for Learners
3. AUTOMATING ANNOTATIONS IN MOOCS Machine for Learners
3.1 NoteVideo 3.2 Automated Entity Linking
41
16 Aug 2015
GCMS - Min-Yen Kan
42
Automatic Entity Linking System added href
Could be done: - As a post-process - As the original poster is writing the post Appropriate section of “Module 3, Slide 5”
16 Aug 2015
GCMS - Min-Yen Kan
Problem Statement Mention recognition
Identify concrete entity mentions that appear in MOOC forums.
Unique identifier scheme
Add hyperlinks to a mentions using a designed scheme, which needs to be transparent and readable to humans.
Scheme resolution
Resolve a scheme instance to find the actual URL of the entity.
43
16 Aug 2015
GCMS - Min-Yen Kan
44
Current: Single Concrete Instances We currently identify single, concrete, within-course entities (SCI) Examples: ✓ Problem 7.8
✓ quiz 3
✓ module 13 ✓ slide 5 ✗ the video recommended by Prof
✗ Problem of overfitting ✗ Problems mentioned in last class Four main SCI entities: 1. Problem – a problem within a problem set, such as Practice problem 7.68, problem7.7 of text, Problem 3 of Quiz 1. 2. Quiz – a certain course quiz, such as Quiz 1, Quiz 2, Week3 quiz. 3. Lecture – a certain course lecture, such as Module 3, lecture 5, module23. 4. Slide – a course slide, such as slide 5, slide 10, slide 11.
16 Aug 2015
GCMS - Min-Yen Kan
45
Preliminary statistics From our manual annotation of two courses, we find ~20% of posts have entity mentions. Reg Exp # manually # verified matches checked correct Course 3d-motion 19 19 19 acoustics1-001 19 6 6 advancedchemistry-001 58 14 9 amnhearth-002 10 5 5 24 analyze-001 113 26 apstat-001 78 14 14 automata-002 111 11 11 bioinfomethods2-001 9 6 6 4 vlsicad-002 4 4 virtualassessment-001 24 7 5
We then used simple regular expressions (keyword + number) to match entity mentions. The precision was more than 90%.
16 Aug 2015
GCMS - Min-Yen Kan
Entity Mention Recognition
Keyword list
Pattern 1 keyword + number: Question Problem Quiz Exam Homework Assignment Week Module Video Lecture Slide
Pattern 2 lecture name:
46
16 Aug 2015
GCMS - Min-Yen Kan
47
Transparent Scheme Design Prefix
http:///mxr/
Middle
coursera/ml-002/
Platform Name Suffix
Course ID
lecture/4 or lecture/supervised_learning lecture/3/section/3 lecture/3/slide or lecture/4/section/3/slide lecture/3/slide/19 quiz/3, lecture/4/quiz quiz/3/question/4, lecture/4/quiz/question/5
Should be guessable by users Similar to bootstrapping conventions in #hashtags: e.g. #lecture5 Scheme still in progress
16 Aug 2015
GCMS - Min-Yen Kan
48
Scheme Resolution Designed scheme
Transform Function
Actual URL
1. Automated analysis the web structure and extract the actual URL 2. Crowdsource the resolution from students
A snapshot of the HTML source in Coursera
16 Aug 2015
GCMS - Min-Yen Kan
49
Delivery by Browser Extension Options: Hyperlink, Sidebar, Below post preview 14 http://wing.comp.nus.edu.sg/mxr/ coursera/ml/ lecture/14/section/4
4
16 Aug 2015
GCMS - Min-Yen Kan
Future Work – Scaling Up 1. Larger scale annotation / resolution 2. Investigate mention variation and ambiguity 3. Adapt to MOOC webpage design changes 4. Finer grained alignment 5. Integration with manual annotation tools
50
16 Aug 2015
GCMS - Min-Yen Kan
Finer Granularity – Content Based Alignment
51
16 Aug 2015
GCMS - Min-Yen Kan
Slides: bit.ly/kan-gcms15
Conclusion / Calling for MOOC Data Consortium Partners
epilogue
52
16 Aug 2015
The MOOC Data Consortium: Enabling reproducible large-scale research
Email :
[email protected] Website: wing.comp.nus.edu.sg/downloads/ moocdata
GCMS - Min-Yen Kan
53
Slides: bit.ly/kan-gcms15
Coursera has given their official support and recognition For researchers needing to study and replicate prior work Coursera’s Statement of Support “As
a platform for delivering world-class education and advancing the frontiers of online pedagogy, Coursera encourages the use of its platform to facilitate novel research across a broad range of disciplines, while concurrently protecting the privacy of learners. We support the described research focusing on forum activity and the proposal that this research span courses from across our partner institutions.”
16 Aug 2015
GCMS - Min-Yen Kan
54
Slides: bit.ly/kan-gcms15 Conclusion: Instructors, Learners, Machines Learning at scale means understanding individual courses, quirks – Non-reproducibility of results - a key issue stalling MOOC research
#convention before (system learned) customization Rich Interlinking of resources – Annotated by learners as well as machines Publications: • Chandrasekaran et al. (2015). Learning instructor intervention from MOOC forums: Early Results and Issues. Education Data Mining (EDM '15), Madrid, Spain. • Monserrat et al. (2014) L.IVE: An Integrated Interactive Vide-based Learning Environment, ACM CHI 2014 • Monserrat et al. (2013) NoteVideo: Facilitating Navigation of Blackboard-style Lecture Videos, ACM CHI 2013, 1139-1148