A review of relation extraction systems using distant ...

0 downloads 0 Views 511KB Size Report
Natural language processing also benefited from these improvements to offer ... “Gunmen kill at least 28 Coptic Christians in central Egypt.” ⇓. Type : Attack.
A review of relation extraction systems using distant supervision

Supervisor : Xavier Tannier

Author : Mahmut CAVDAR

Institution : LIMSI - CNRS

Secretariat : 01 69 15 81 58 email : [email protected]

Contents Contents

i

1 Introduction

1

2 Problem

2

3 Methods 3.1 Deepdive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Multi-instance multi-label relation extraction . . . . . . . . . . . . . . . . . . . . .

3 3 4

4 Conclusion et perspectives

5

Bibliographie

6

i

Chapitre

1

Introduction Over the past few years in particular, advances in computing technology (e.g. storage capacity, computational capability, transmission rate, etc.) have allowed to implement ideas that could not be feasible before. Natural language processing also benefited from these improvements to offer new application like other fields of computer science. The field has seen big success in switching from classical method over statistics to machine-learning(linear or non-linear models like a neural network) approaches. The new ideas needed the new approach and new approach gave rise to others ideas. Recently, fields such as machine translation, speech recognition, chatbots have emerged for natural language processing. Also, information extraction is one of them thanks to society led to the production of huge volumes of content which is generally non-structured. Information extraction is an area which aims to extract factual information in free text. In other words, identify a predefined set of concept(e.g., database records) from a domain consists of a corpus of texts. Figure 1.1 shows an example a piece of news about a terrorist attack and a structured information extracted from that piece. “Gunmen kill at least 28 Coptic Christians in central Egypt.” ⇓ Type : Attack Location : central Egypt DeadCount : 28 Weapon : Gun Victim : Coptic Christians Figure 1.1 – Example of automatically extracted information from a news

Most models of information extraction are based on supervised learning. Depending on the volume of the used data, classical method suffers for two reasons : to labeling training data is time-consuming and hard to repeat for a new application. In that point unsupervised, bootstraping and distant supervision offer reasonable solutions. Briefly, unsupervised relation extraction extracts a large set of relational tuples without requiring hand-labeled corpora. Users don’t specify their desired type of relation or information. In bootstrap approach initially, user provide a small number of positives examples(seeds). These examples are used iteratively to generate new extraction patterns and new positives examples extracted from the corpus. In distant supervision, instead of user seed instances, database of facts is used to generate automatically training examples. Distant supervision approach is very efficient in term of scale(very large number of relations, e.g., web KBP). There are different distant supervision methods for relation extraction problem. Second part presents problem definition and notation. Section 3 discusses different approach. Finaly I conclude the paper in last section.

1

Chapitre

2

Problem In this paper we focus on distant supervision for relation extraction between two entities. Let R donete relations space, E the set of entites and X the set of words(documents). Relation extraction task is defined as : f : X → E×E×R, from a given data set {r1 (e1 , e2 ), r2 (e3 , e4 )...rn (ek , em )} where rn ∈ R and ek , em ∈ E. We define our task as a function that takes as input document collection X, a set of entity mentions extrable extracted from Xe (distant supervision data from knowledge base), a set of requested relation labels l and an extraction model, and outputs a set of relations r such that any of the relations extracted in X. For example, we want to exract that NATO is to join the anti-Islamic State coalition. We define a relaiton r(e1 , e2 ), where r is the relation name, e.g., join in our example, and e1 and e2 are two entities, e.g., Nato and coalition in our example.

2

Chapitre

3

Methods I examined two distant supervision approaches to extraction relation : one based on the Deepdive framework, and other based on the multi-instances multi-label method.

3.1

Deepdive

Deepdive framework, proposed by Niu et al., extracts relations from large number of web pages. Deepdive’s Architecture consists of three phases : feature extraction, probabilistic engineering and statistical inference and learning. Deepdive gets linguistic feature by using tools like named-entity recognizer and dependency paths finder. Then these feature are used to discover correlations between linguistic patterns and relations defined by user. By using Markov logic program which powered additional domain knowledge, statistical model is trained and knowledge base is populated with entities and relationships.

Figure 3.1 – Deepdive’s Architecture

One of most important design idea of DeepDive is to make KBC systems easier to debug and improvement. Depdive uses some set of labeled data to produce a calibration plot which is used to summarize the overall quality of the results. Then according the plot user can get an idea about the next step to improve the system and to efficiently handle uncertainty in predictions histogram. Another key challenge of Deepdive approach is scalability, i.e., how to process terabytes of unstructred data efficiently (web-scale relation extraction). Task linguistic features such as dependency paths has high-CPU-utilization problem. Also, statistical learning and inference phase needs to scale. These problem are solved with Condor infrastructure and BISMARCK system, respectively. Condor, high-throughput batch computing system, works on hundreds of workstations and shared cluster machines. Bismarck system integrates many machine learning techniques into an RDBMS. DeepDive demonstrates that a promising approach is to integrate various and large data sources and best-of-breed algorithms via statistical learning and inference. Also Deepdive system’s quality hardly depends on its features, rules and pre-processing pipelines.

3

CHAPITRE 3. METHODS

3.2

4

Multi-instance multi-label relation extraction

Contrary to traditional methods, in multi-instance multi-label learning, a problem object is represented by a set of instances and associated with multiple labels.[4] For clarification purposes, let’s take the example of movie. A movie can be labeled to several classes depends on purposes, e.g., drama, romance or Robert Zemeckis’s directing (multi-label). On the other hand, for real world label prediction model, user needs various instances. For example, multiple sections can be extracted from movie, and thus the movie can be represented by set of instances and be labeled(multiple) by using these instances.

DB=(Co-Founder(Steve Jobs,Apple) Sentence Jobs, Wozniak, and Ronald Wayne formed Apple Computer in the garage of Jobs’s Los Altos home on Crist Drive. On September 17, 1985, Jobs submitted a letter of resignation to the Apple Board. In 2001, Jobs was granted stock options in the amount of 7.5 million shares of Apple with an exercise price of $18.30.

Label Co-Founder Leave -

Aligning a database of facts with text introduces challenges. Sometimes same entity pair may have different label in various sentences. For example, in Figure 1, the tuple(Steve Jobs,Apple) has two valid labels : Co-Founder and Leave, each label can be mentioned by different sentences. Multi-instance multi-label relation extraction method, proposed by Surdeanu et al. for the Stanford KBP system, brought a solution to this problem with a novel graphical model.[3]

Multi-instance multi-label relation extraction model assumes each relation mention of an entity pair has one of the pre-specified relation labels as well as an additional NIL label and model allows the pair to have multiple mentions. Latent variable z is used to represent actual relation label of a mention from knowledge base. And each yj classifier decides if relation j holds for the given entity tuple, using the output of z classifier as input. MIML model is trained by using hard discriminative Expectation Maximization method. EM method has problem with find global maximum. A local logistic regression classifier is used to find z’s initialization values.

Chapitre

4

Conclusion et perspectives So far, we have reviewed two different approaches of the entity-relation extraction problem by using with the distant supervision algorithms. Deepdive is the best performing system in the TAC-KBP 2014 Slot Filling challenge with 0.3672 F1 score. For Deepdive framework, grounding phase can be a serious bottleneck if one does not use scalable relational technology. Also in some case, e.g. poor results in POS tagging, it cause wrong results. Relation extraction problem can be modeled differently and can represent several viewpoints. Deepdive and MIML approaches assume that each relation has only two entities(binary relation). Extracting n-ary relation requires handle more than binary relations and single sentences. And also, papers focuse on discovering only static relationships between entities. Furthermore, the dynamic and n-ary relation might constitute an interesting area for research.

5

Bibliographie [1] Gabor Angeli, Sonal Gupta, Melvin Jose, Christopher D. Manning, Christopher Re, Julie Tibshirani, Jean Y. Wu, Sen Wu, and Ce Zhang. Stanfords 2014 slot filling systems. 2014 : In TAC. [2] Feng Niu, Ce Zhang, Christopher R´e, and Jude W Shavlik. Deepdive : Web-scale knowledge-base construction using statistical learning and inference. 2012 : In VLDS, pages 25–28. [3] Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. Multiinstance multi-label learning for relation extraction. 2012 : In EMNLP. [4] Zhi-Hua Zhou, Min-Ling Zhang, Sheng-Jun Huang, Yu-Feng Li. Multi-Instance Multi-Label Learning. 2012 : Artificial Intelligence, 176 (1). [5] Feng Niu, Christopher R´e, AnHai Doan, Jude Shavlik. Tuffy : Scaling up Statistical Inference in Markov Logic Networks using an RDBMS. 2011 : ARXIV. [6] Mike Mintz, Steven Bills, Rion Snow, and Dan Ju- rafsky. Distant supervision for relation extraction without labeled data. 2009 : Association for Computational Linguistics, pages 1003–1011.

6

Suggest Documents