http://en.wikipedia.org/wiki/Official_Monster_Raving_Loony_Party. Expert ... Year. Score. Monster Raving Loony. 2010. -2
Methods for Collecting Large-Scale Non-Expert Text Coding
Drew Conway Kickstarter HQ June 26, 2013 Wednesday, June 26, 13
http://www.flickr.com/photos/cawley/3242403224/
y a w n o C Drew
Com Sci, math and stats
Wednesday, June 26, 13
More awesome Political Science
t s i t n e i c s l a c i t i l o p a m a I . . .
Political Science
Median Voter Theorem Theorem: In a majority rules system, the preference of the median voter will succeed
Assumption: The political/ideological preferences of voters can be projected onto a single numeric dimension
http://thomasmoreinstitute.wordpress.com/2010/04/28/the-uk-election-and-the-curse-of-the-median-voter/ Wednesday, June 26, 13
Median Voter Theorem
How do we calculate these numbers? http://voteview.com/blog/?p=564 Wednesday, June 26, 13
We make it up...
But, we have to! http://www.flickr.com/photos/estherlairlandesa/4649566079/ Wednesday, June 26, 13
A tale of two disciplines Physics
Build instrument
Political Science
Measure
Observe action
Infer
http://www.flickr.com/photos/becca02/6727193557/ http://en.wikipedia.org/wiki/File:Obama_Health_Care_Speech_to_Joint_Session_of_Congress.jpg Wednesday, June 26, 13
One thing we have a lot of: text Politicians
‣ Speeches ‣ Constituent communication
Parties
‣ Platform / manifestos ‣ Position statements
Countries
‣ Diplomatic cables ‣ Military declarations
Wednesday, June 26, 13
Expert Coding!
How expert coding (typically) works Expert
Code Book
1. Health & Safety: We propose to ban Self Responsibilty on the grounds that it may be dangerous to your health. 2. M.P’s Expenses: We propose that instead of a second home allowance M.P’s will have a caravan which will be parked outside the Houses of Parliament. This will make it
Party
Year
Score
easier as flipping a caravan is easier than flipping homes
Monster Raving Loony
2010
-2
3. Eurofit: The European Constitution which will be sorted out by going for a long Walk. “As everyone knows that walking is good for the constitution”
DATA!
Manifesto http://en.wikipedia.org/wiki/Official_Monster_Raving_Loony_Party
Wednesday, June 26, 13
What expert coding looks like
http://www.flickr.com/photos/uiowa/8047195100/ Wednesday, June 26, 13
What’s wrong with experts?
They’re slow
They’re biased
They’re expensive
They’re wrong
Wednesday, June 26, 13
Can we use nonexperts to code political manifestos? How can we measure the quality/validity of non-expert codings? Use Mechanical Turk to code many manifesto fragments. Wednesday, June 26, 13
Experimental approach Baseline data
Experimental design
Texts: 18 “big 3” British party manifestos 1987-2010
Hypothesis: Stronger filter on Turkers leads to better coding
Experts: 5 advanced poli. sci. graduate students + 2 tenured faculty
Filter: Use MT qualification test as gatekeeper
Coding: deliberately simple schema
Expert codings
Wednesday, June 26, 13
Three experiments No Qualification
LowThreshold
HighThreshold
Anyone in
4/6 Correct
5/6 Correct
MT codings
How do we think about coding a manifesto fragment?
Wednesday, June 26, 13
Example text coding HIT from the experiment
Wednesday, June 26, 13
How do we implement this (aka, the glue)? Expert codings
Random sample, as JSON
]
[{ ‘text_unit_id’: ..., ‘sentence_text’: ..., .... }, ...
S3
Dynamically generate HITs Statistical analysis of results
Scholarship, FTW!
Wednesday, June 26, 13
MT codings
MT Push HITs + retrieve results
EC2
Results of initial MT experiments Results Experiment
Kappa Statistic
Sentences
# MT Coders
% Agreement
k*
Std. Error
z
No Qual.
1,315
89
0.65
0.47
0.13
22.6
Low-Threshold
1,393
56
0.70
0.54
0.12
26.7
High-Threshold
1,250
23
0.62
0.41
0.13
18.3
* A k value between 0.4-0.6 is considered “moderate” agreement
Experiment
No Qual.
Low-Threshold
High-Threshold
Agreement by experiment Wednesday, June 26, 13
Expert Coding
MT % Agreement
Economic
0.77
Social
0.92
Neither
0.22
Economic
0.87
Social
0.98
Neither
0.20
Economic
0.77
Social
0.91
Neither
0.09
Agreement by expert-coding
Separating Social and Economic Sentences Results Experiment
Kappa Statistic
Sentences
# MT Coders
% Agreement
k*
Std. Error
z
Econ-only
942
15
0.62
0.23
0.10
4.28
Soc-only
955
32
0.60
0.17
0.09
0.95
* A k value between 0.4-0.6 is considered “moderate” agreement
Experiment
Economic-only
Social-only
Wednesday, June 26, 13
Expert Coding
MT % Agreement
Economic
0.92
Neither
0.28
Social
0.97
Neither
0.19
Non-experts have a very hard time with a “null” coding!
Conclusions The results provide considerable evidence to that crowd-sourcing is an effective alternative method to generating quantitative categorization from text. With a well-designed mechanism for collecting non-expert codings en masse, the results compare quite well to the results of multiple experts
‣
Along with revealing that non-experts have tremendous difficulty identifying “null” categories, there are many other important takeaways from this research Coder performance quality reaches a “stable” level; therefore, high-quality coders should be identified quickly and incentivized to continue to contribute Non-experts are also capable of picking up policy direction from text, as evidenced by shift manifestation of shifts in parties between 1987 and 1997
‣ ‣
Wednesday, June 26, 13
Joint work with...
Michael Laver NYU
Kenneth Bennoit LSE
Slava Mikhaylov UCL
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2260437
Wednesday, June 26, 13
Coder performance stability No Qualification
Low-threshold
High-threshold
Performance becomes very stable after approximately 20 HITs Wednesday, June 26, 13
Party shifts: economic
Wednesday, June 26, 13
Party shifts: social
Wednesday, June 26, 13