Methods for Collecting Large-Scale Non-Expert Text Coding

0 downloads 72 Views 2MB Size Report
http://en.wikipedia.org/wiki/Official_Monster_Raving_Loony_Party. Expert ... Year. Score. Monster Raving Loony. 2010. -2
Methods for Collecting Large-Scale Non-Expert Text Coding

Drew Conway Kickstarter HQ June 26, 2013 Wednesday, June 26, 13

http://www.flickr.com/photos/cawley/3242403224/

y a w n o C Drew

Com Sci, math and stats

Wednesday, June 26, 13

More awesome Political Science

t s i t n e i c s l a c i t i l o p a m a I . . .

Political Science

Median Voter Theorem Theorem: In a majority rules system, the preference of the median voter will succeed

Assumption: The political/ideological preferences of voters can be projected onto a single numeric dimension

http://thomasmoreinstitute.wordpress.com/2010/04/28/the-uk-election-and-the-curse-of-the-median-voter/ Wednesday, June 26, 13

Median Voter Theorem

How do we calculate these numbers? http://voteview.com/blog/?p=564 Wednesday, June 26, 13

We make it up...

But, we have to! http://www.flickr.com/photos/estherlairlandesa/4649566079/ Wednesday, June 26, 13

A tale of two disciplines Physics

Build instrument

Political Science

Measure

Observe action

Infer

http://www.flickr.com/photos/becca02/6727193557/ http://en.wikipedia.org/wiki/File:Obama_Health_Care_Speech_to_Joint_Session_of_Congress.jpg Wednesday, June 26, 13

One thing we have a lot of: text Politicians

‣ Speeches ‣ Constituent communication

Parties

‣ Platform / manifestos ‣ Position statements

Countries

‣ Diplomatic cables ‣ Military declarations

Wednesday, June 26, 13

Expert Coding!

How expert coding (typically) works Expert

Code Book

1. Health & Safety: We propose to ban Self Responsibilty on the grounds that it may be dangerous to your health. 2. M.P’s Expenses: We propose that instead of a second home allowance M.P’s will have a caravan which will be parked outside the Houses of Parliament. This will make it

Party

Year

Score

easier as flipping a caravan is easier than flipping homes

Monster Raving Loony

2010

-2

3. Eurofit: The European Constitution which will be sorted out by going for a long Walk. “As everyone knows that walking is good for the constitution”

DATA!

Manifesto http://en.wikipedia.org/wiki/Official_Monster_Raving_Loony_Party

Wednesday, June 26, 13

What expert coding looks like

http://www.flickr.com/photos/uiowa/8047195100/ Wednesday, June 26, 13

What’s wrong with experts?

They’re slow

They’re biased

They’re expensive

They’re wrong

Wednesday, June 26, 13

Can we use nonexperts to code political manifestos? How can we measure the quality/validity of non-expert codings? Use Mechanical Turk to code many manifesto fragments. Wednesday, June 26, 13

Experimental approach Baseline data

Experimental design

Texts: 18 “big 3” British party manifestos 1987-2010

Hypothesis: Stronger filter on Turkers leads to better coding

Experts: 5 advanced poli. sci. graduate students + 2 tenured faculty

Filter: Use MT qualification test as gatekeeper

Coding: deliberately simple schema

Expert codings

Wednesday, June 26, 13

Three experiments No Qualification

LowThreshold

HighThreshold

Anyone in

4/6 Correct

5/6 Correct

MT codings

How do we think about coding a manifesto fragment?

Wednesday, June 26, 13

Example text coding HIT from the experiment

Wednesday, June 26, 13

How do we implement this (aka, the glue)? Expert codings

Random sample, as JSON

]

[{ ‘text_unit_id’: ..., ‘sentence_text’: ..., .... }, ...

S3

Dynamically generate HITs Statistical analysis of results

Scholarship, FTW!

Wednesday, June 26, 13

MT codings

MT Push HITs + retrieve results

EC2

Results of initial MT experiments Results Experiment

Kappa Statistic

Sentences

# MT Coders

% Agreement

k*

Std. Error

z

No Qual.

1,315

89

0.65

0.47

0.13

22.6

Low-Threshold

1,393

56

0.70

0.54

0.12

26.7

High-Threshold

1,250

23

0.62

0.41

0.13

18.3

* A k value between 0.4-0.6 is considered “moderate” agreement

Experiment

No Qual.

Low-Threshold

High-Threshold

Agreement by experiment Wednesday, June 26, 13

Expert Coding

MT % Agreement

Economic

0.77

Social

0.92

Neither

0.22

Economic

0.87

Social

0.98

Neither

0.20

Economic

0.77

Social

0.91

Neither

0.09

Agreement by expert-coding

Separating Social and Economic Sentences Results Experiment

Kappa Statistic

Sentences

# MT Coders

% Agreement

k*

Std. Error

z

Econ-only

942

15

0.62

0.23

0.10

4.28

Soc-only

955

32

0.60

0.17

0.09

0.95

* A k value between 0.4-0.6 is considered “moderate” agreement

Experiment

Economic-only

Social-only

Wednesday, June 26, 13

Expert Coding

MT % Agreement

Economic

0.92

Neither

0.28

Social

0.97

Neither

0.19

Non-experts have a very hard time with a “null” coding!

Conclusions The results provide considerable evidence to that crowd-sourcing is an effective alternative method to generating quantitative categorization from text. With a well-designed mechanism for collecting non-expert codings en masse, the results compare quite well to the results of multiple experts



Along with revealing that non-experts have tremendous difficulty identifying “null” categories, there are many other important takeaways from this research Coder performance quality reaches a “stable” level; therefore, high-quality coders should be identified quickly and incentivized to continue to contribute Non-experts are also capable of picking up policy direction from text, as evidenced by shift manifestation of shifts in parties between 1987 and 1997

‣ ‣

Wednesday, June 26, 13

Joint work with...

Michael Laver NYU

Kenneth Bennoit LSE

Slava Mikhaylov UCL

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2260437

Wednesday, June 26, 13

Coder performance stability No Qualification

Low-threshold

High-threshold

Performance becomes very stable after approximately 20 HITs Wednesday, June 26, 13

Party shifts: economic

Wednesday, June 26, 13

Party shifts: social

Wednesday, June 26, 13

Suggest Documents