Context-Aware Sentiment Detection

2 downloads 0 Views 1MB Size Report
Feb 10, 2011 - ... sentiment terms and ambiguous terms. ▻ Uses context terms for disambiguation. ▷ Derived from online reviews (Amazon, TripAdvisor).
Context-Aware Sentiment Detection Albert Weichselbraun February 10, 2011

Agenda

I

Motivation On the Need for Contextualization I Indicators for Missing Context I

I

Method Context-Aware Sentiment Detection I Creation of Contextualized Sentiment Lexicons I Example I Cross-corpus Contextualized Sentiment Lexicons I

I

Evaluation

I

Outlook and Conclusions

2 / 28

Motivation

I

I

Pang et. al (2002): state of the art machine learning approaches do not unfold their full potential when applied to sentiment detection Lexicon-based approach I I I

no labeled training corpus necessary applicable across domains throughput

Motivation

3 / 28

On the Need for Contextualization

Positive The repair of my car was satisfying. This movie’s plot is unpredictable. The long peace brought wealth and safety to the people.

Negative I had many complaints after my camera’s repair. The breaks of this car are unpredictable. This peace is a lie.

Motivation

4 / 28

Frequency Diagram

Motivation

5 / 28

Frequency Diagram

Motivation

6 / 28

Frequency Diagram

Motivation

7 / 28

Frequency Diagram

Motivation

8 / 28

Method - Contextualized Sentiment Lexicon

I

Use a contextualized sentiment lexicon I I I

I

Based on ordinary sentiment lexicons Contains stable sentiment terms and ambiguous terms Uses context terms for disambiguation

Derived from online reviews (Amazon, TripAdvisor)

Method

9 / 28

Method - Context Aware Sentiment Detection Refined Sentiment Detection X stotal = n(ti )[s(ti ) + s 0 (ti |c)]

with

(1a)

ti ∈doc

( −1.0 n(ti ) = +1.0

if ti has been negated otherwise.

(1b)

Context Detection c = {c1 , ...cn } Q p(C+ ) · ni=1 p(ci |C+ ) Qn p(C+ |c) = i=1 p(ci )

(2a) (2b)

Method

10 / 28

Method - Contextualized Sentiment Lexicon

Method

11 / 28

Method - Contextualized Sentiment Lexicon

I

Identify ambiguous terms (s 0 ) → frequency diagrams σi

≥ v

(3)

µi + σ i

≥ w

(4a)

µi − σ i

≤ −w

(4b)

I

Learn context terms (c) for disambiguation → conditional probabilities

I

Recalculate the sentiment value of the contextualized sentiment terms

Method

12 / 28

Example The service staff was friendly. They accomplished the repair of my car’s motor very quickly. After driving it for another three months I can say that the motor is as reliable as it was before.

positive context terms reliable long-lasting affordable pick-up-service replacement-car cooperative ...

negative context terms slowly re-do unreliable waiting expensive cheater ...

Method

13 / 28

Method - Context Lookup

The service staff was friendly. They accomplished the repair of my car’s motor very quickly. After driving it for another three months I can say that the motor is as reliable as it was before.

repair Context Term (ci ) reliable friendly quickly

P(C+ |ci ) 0.80 0.70 0.65

⇒ repair is used in a positive context → positive sentiment

Method

14 / 28

Method - Real World Examples

Ambiguous Term busy complaint cool

SVorig

Example

1 -1 -1

expensive

-1

quality

1

better cost

1 -1

The hotel is located on a busy road. My only complaint would be the service. Our room felt like a really cool European apartment with a rooftop terrace. The room was one of the more expensive hotels in Vienna but still excellent. Poor quality copies with one edge always dark. Let’s hope they work better. Toner cost is way behind competitors.

Method

15 / 28

Cross-corpus Contextualized Sentiment Lexicons Helpful

Harmful

1.0

0.8 0.6

0.5

1.0

1.0 0.0

0.5

C'-

0.8 0.0

1.0

0.6

-0.5

0.4

0.4 0.2

0.4 0.0

0.8 0.0 -1.0

Sentiment Value

0.0

0.0

t/C'Relative Frequency

0.6

C-

-0.5

Sentiment Value

0.8

1.0

-1.0

0.2

1.0

0.6

0.5

t/C-

0.4

Relative Frequency

0.0

Sentiment Value

0.2

-0.5

C'+

Relative Frequency

1.0

t/C'+

0.2

0.8 0.6 0.4 0.2 -1.0

Relative Frequency

t

C+

0.0

Relative Frequency

1.0

t/C+

-1.0

-0.5

0.0

0.5

Sentiment Value

1.0

-1.0

-0.5

0.0

0.5

1.0

Sentiment Value

Method

16 / 28

Cross-corpus Contextualized Sentiment Lexicons

Three-step process I

Determine the helpfulness of all context terms

I

Discard harmful context terms

I

Merge remaining context terms into a large contextualized lexicon

Method

17 / 28

Cross-corpus Contextualized Sentiment Lexicons Step 1 - Determine the helpfulness of context terms

Helpful

Helpful

Neutral

Neutral

Harmful

Harmful

Method

18 / 28

Cross-corpus Contextualized Sentiment Lexicons Step 2 - Discard harmful context terms

Harmful

Helpful

Harmful

Helpful Neutral

Neutral

Method

19 / 28

Cross-corpus Contextualized Sentiment Lexicons Step 3 - Merging

Method

20 / 28

Evaluation - Approach

Evaluations 1. Comparison to a baseline – Do we outperform a lexicon-based approach which does not consider context? 2. Intra-domain sentiment detection – Does the removal of unstable sentiment terms has a positive effect? 3. Cross-domain sentiment detection – Determine the cross-domain performance of a generic contextualized sentiment lexicon. 4. Comparison to a machine learning approach – Intra-domain and cross-domain performance.

Evaluation

21 / 28

Evaluation - Setting

I I

Method: 10-fold cross validation Test corpora: I I I

Equal number of positive and negative reviews. Amazon: 2,500 reviews TripAdvisor: 1,800 reviews

Evaluation

22 / 28

Evaluation - Context Aware Sentiment Detection Corpus: Amazon R

Baseline P F1

Context Aware R P F1

Pos

0.80

0.64

0.71

0.75

0.75

0.74

Neg

0.53

0.74

0.62

0.71

0.79

0.73

Corpus: TripAdvisor R

Baseline P F1

Context Aware R P F1

Pos

0.96

0.60

0.74

0.97

0.66

0.79

Neg

0.34

0.90

0.49

0.46

0.95

0.61 Evaluation

23 / 28

Evaluation - Intra-Domain Performance

Test corpus: Amazon Domain-specific (Amazon) R P F1 Pos Neg

0.75 0.71

0.75 0.79

0.74 0.73

Generic P F1

R 0.77 0.67

0.72 0.77

0.74 0.72

Test corpus: TripAdvisor Domain-specific (TripAdvisor) R P F1 Pos Neg

0.97 0.46

0.66 0.95

0.79 0.61

R 0.89 0.66

Generic P F1 0.74 0.87

0.81 0.75

Evaluation

24 / 28

Evaluation - Cross Domain Performance

Test corpus: Amazon Domain-specific (TripAdvisor) R P F1 Pos Neg

0.76 0.58

0.67 0.73

0.71 0.64

Generic P F1

R 0.77 0.67

0.72 0.77

0.74 0.72

Test corpus: TripAdvisor Domain-specific (Amazon) R P F1 Pos Neg

0.84 0.58

0.69 0.8

0.75 0.66

R 0.89 0.66

Generic P F1 0.74 0.87

0.81 0.75

Evaluation

25 / 28

Evaluation - Na¨ıve Bayes (NLTK)

Test TripAdvisor Amazon TripAdvisor Training Amazon

⊕ ⊕

87 89 75 60

32 70 81 77

Evaluation

26 / 28

Summary

I

Lexicon-based approaches: I I I I

I

Simple, no labelled data required Applicable across domains High throughput Can serve as a baseline

Machine Learning approaches: I I

Powerful, but domain-specific Require labelled training data

→ The introduced approach combines these advantages (cross-domain, high throughput, high performance)

Evaluation

27 / 28

Outlook and Conclusions

I

Considering context in sentiment detection

I

Creation cross-domain contextualized sentiment lexicons

I

Outperforms generic approaches Future work:

I

I I

Different context scopes (paragraph, documents, text windows) Consider other machine learning approaches

Outlook and Conclusions

28 / 28

Suggest Documents