Liner2 — a Generic Framework for Named Entity ...

1 downloads 0 Views 898KB Size Report
Apr 4, 2017 - of disjoint atomic names, i.e. [Motorola] [Moto X], also names of facilities with their location are annotated separatly, i.e. [Citi Handlowy] w ...
Liner2 — a Generic Framework for Named Entity Recognition Michał Marcińczuk

Jan Kocoń

Marcin Oleksy

Institute of Informatics Wrocław University of Science and Technology Wybrzeże Wyspiańskiego 27, Wrocław, Poland

April 4, 2017

The 6th Workshop on Balto-Slavic Natural Language Processing, Valencia, Spain The work was funded by the the Polish Ministry of Science and Higher Education (CLARIN ERIC, 2016–2018)

Introduction »

Introduction

Scope only for Polish, named entity recognition, normalization (lemmatization).

Goal To test how much effort to we need to adopt existing models for NER for Polish to the new requirements (BSNLP NER Shared Task) and what level of performance can we obtain.

M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

2 / 18

NER models for Polish »

NE in KPWr

1

KPWr (Corpus of Wroclaw University of Technology) https://clarin-pl.eu/ dspace/handle/11321/270,

2

1349 short documents (ca. 200 words each, 15 text genres) on the Creative Commons license annotated with named entities,

3

more than 82 categories of named entities organized in a 3-level hierarchy,

M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

3 / 18

NER models for Polish »

NE hierarchy in KPWr https://clarin-pl.eu/dspace/handle/11321/294 1

event – names of events organized by humans [6 subtypes],

2

facility – names of buildings and stationary constructions (e.g. monuments) developed by humans [9 subtypes],

3

living – people and other livings names [6 subtypes],

4

location – names of geographical (e.g, mountains, rivers) and geopolitical entities (e.g., countries, cities) [31 subtypes],

5

organization – names of organizations, institutions, organized groups of people [10 subtypes],

6

product – names of artifacts created or manufactured by humans (products of mass production, arts, books, newspapers, etc.) [26 subtypes],

7

adjective – adjective forms of proper names [3 subtypes],

8

numerical – numerical identifiers which indicate entities [5 subtypes],

9

other – other names which do not fit into previous categories [10 subtypes].

M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

4 / 18

NER models for Polish »

Distribution of top NE categories

21% 8% 4% 40%

M. Marcińczuk, J. Kocoń, M. Oleksy

15%

5%

April 4, 2017

2%

3% 1%

living (12k) location (6.3k) organization (4.6k) product (2.6k) adjective (1.5k) facility (1.3k) other (0.8k) event (0.7k) numex (0.1k)

5 / 18

Liner2 »

Liner2 overview https://clarin-pl.eu/dspace/handle/11321/231, implemented in Java, a set of modules for sequence labelling (statistical, ruleand dictionary-based), the statistical model uses Conditional Random Fields (CRF++ library), a rich set of features — 56 basic features (ortographic, morphological, lexicon-base and wordnet-based) and several complex features, dictionaries for NER — NELexicon (2.3M names obtained from different sources from Internet, including Wikipedia; https://clarin-pl.eu/dspace/handle/11321/247) and a dictionary of named entity triggers PNET (http://zil.ipipan.waw.pl/PNET), processes tokenized texts. M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

6 / 18

Liner2 »

Liner2 applications NER — https://clarin-pl.eu/dspace/handle/11321/263, TIMEX — https://clarin-pl.eu/dspace/handle/11321/302, Event — https://clarin-pl.eu/dspace/handle/11321/301.

Task NER boundaries NER top9 NER n82 TIMEX boundaries TIMEX 4class Event mentions

P [%] 86.04 73.73 67.65 86.68 84.97 80.88

R [%] 83.02 69.00 58.83 81.01 76.67 77.82

F [%] 84.50 71.30 62.93 83.75 80.61 79.32

Figure: Precision (P), recall (R) and F-measure (F) for various tasks obtained with Liner2.

M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

7 / 18

Liner2 »

Liner2 applications NER — https://clarin-pl.eu/dspace/handle/11321/263, TIMEX — https://clarin-pl.eu/dspace/handle/11321/302, Event — https://clarin-pl.eu/dspace/handle/11321/301.

Task NER boundaries BSNLP NER NER top9 NER n82 TIMEX boundaries TIMEX 4class Event mentions

P [%] 86.04 ? 73.73 67.65 86.68 84.97 80.88

R [%] 83.02 ? 69.00 58.83 81.01 76.67 77.82

F [%] 84.50 ? 71.30 62.93 83.75 80.61 79.32

Figure: Precision (P), recall (R) and F-measure (F) for various tasks obtained with Liner2. M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

7 / 18

BSNLP NER Shared Task »

Differences comparing with the KPWr NER

1

NE mention boundaries: in the KPWr corpus nested names are annotated as a sequence of disjoint atomic names, i.e. [Motorola] [Moto X], also names of facilities with their location are annotated separatly, i.e. [Citi Handlowy] w [Poznaniu],

2

NE categorization (next slide).

M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

8 / 18

BSNLP NER Shared Task »

NE category mapping KPWr category nam_loc nam_fac nam_liv nam_org_nation nam_org nam_eve nam_pro nam_adj nam_num nam_oth

BSNLP category LOC LOC PER PER ORG MISC MISC ignored ignored ignored

model top9 top9 top9 n82 top9 top9 top9 top9 top9 top9

Figure: Mapping from KPWr categories of named entities to BSNLP categories.

M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

9 / 18

BSNLP NER Shared Task »

Official evaluation Task Names matching Relaxed partial Relaxed exact Strict Normalization Coreference Document level Language level Cross-language level

P

R

F

66.24 65.40 71.10 75.50

63.27 62.78 58.81 44.44

64.72 64.07 66.61 55.95

7.90 3.70 n/a

42.71 8.00 n/a

12.01 5.05 n/a

Figure: Results obtained by Liner2 in the BSNLP NER Shared Task

M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

10 / 18

Error analysis »

Test set post-evaluation

the test set was annotated separatly by two annotators according to the BSNLP NER Shared Task guidelines, annotators took part in the annotation of the KPWr corpus in the past, then the annotations were agreed to create a gold standard, the annotaton and agreement verification was done using the Inforex system (https://clarin-pl.eu/dspace/handle/11321/13).

M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

11 / 18

Error analysis »

Inforex — annotation

M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

12 / 18

Error analysis »

Inforex — agreement

M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

13 / 18

Error analysis »

Inforex — verification

M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

14 / 18

Error analysis »

Agreement on the test set

Agreement was calculated using the Positive Specific Agreement measure (PSA) on the level of NE mentions. Names matching (strict) NE boundaries NE boundaries and categories NE boundaries, categories and lemmas

PSA 97% 94% 93%

Figure: Inter-annotator agreement on the test set

M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

15 / 18

Error analysis »

Official vs our evaluation

Evaluation P [%] Names matching (strict) Official 71.10 Our 83.39 (+12.29) Normalization Official 75.50 Our 71.57 (-3.93)

R [%]

F [%]

58.81 70.19

(+11.38)

66.61 76.22

(+9.61)

44.44 60.24

(+15.80)

55.95 65.42

(+9.47)

Figure: Comparision of evaluation results

M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

16 / 18

Summary »

Conclusions

we obtained lower results for named entity recognition and lemmatization than we expected, there is a discrapency between the official and our evaluation, our understanding of the guidelines differ from what it is expected, for instance: we did not annotate incomplete named entities like “Komisja” (Eng. Comission) which refers to “Komisja Europejska” (Eng. European Comission), other?

revision of the gold standard annotation of the sets and futher tunning is required.

M. Marcińczuk, J. Kocoń, M. Oleksy

April 4, 2017

17 / 18

M. Marcińczuk, J. Kocoń, M. Oleksy

The end »

Thank you for your attention.

April 4, 2017

18 / 18

Suggest Documents