Evolutionary Learning of Syntax Patterns for Genic ... - Google Sites
Recommend Documents
classroom use is granted without fee provided that copies are not made or distributed for profit or commercial ... GECCO
Jul 15, 2015 - ABSTRACT. There is an increasing interest in the development of tech- niques for automatic relation extra
(Unaccusative) Root Small Clauses in Serbian. 73. 2.4. Small Clause .... reflected the lack of innovation (of more complex syntactic structures) at that point, or both. .... timeline. This monograph draws directly upon the field of theoretical syntax
Dec 14, 2018 - ... of: (1) full sentences (FullS; e.g., The case is closed); (2) Small Clauses (SC; e.g., Case closed); (3) ... in its full complexity, and in its hierarchical/recursive form. ... of syntax, and more in line with Jackendoff and Pinker
Data made available by the courtesy of Microsoft .... Part-of-Speech mapping template: whether the ..... clude that PSDI
IE can be framed as a machine learning task: given information in unstructured ... of applying Gleaner (Goad- rich et al., 2004) to the Learning Language in Logic.
Molnar RI, Witte H, Dinkelacker I, Villate L, Sommer RJ. 2012. Tandem- repeat patterns ... model organism Pristionchus pacificus. G3 2:1027â1034. Mooers AO ...
such as the Xavante in Brazil, the Wodaabe in Niger, or the Gabra in Kenya, very little intermixing of cultures occurs (Maybury-Lewis, 1992). In such cases, most ...
questions they address, and the techniques used to check the validity of current ... spring up in the future to explore
Learning to Extract Genic Interactions Using Gleaner ... ing genic interactions. Gleaner is a two-stage ..... predicates which try to capture both the structure and.
the meeting with his old girlfriend) form separate phonological phrases (68, 69) ... is the ungrammaticality of sentences such as *Peter smokes, I think, cigars (ex.
Engineering, National Taiwan University, Taipei, Taiwan ( e-mail: ..... Int. Conf. on Robotics, Automation and Mechatron
Excerpt from 2015 OUP book, Evolutionary Syntax, Ljiljana Progovac. âFirst of all, it is maintained in e.g. Hurford (1990) and Fitch (2008, and references cited ...
in this paper we show that the learner is able to infer a recursive and ambiguous .... are tried by recursively dividing all the edges in a sentence into two parts and ...
Learning Efficiency: This style of teaching focuses on life long learning by drawing on ...... Frank Thomas, Ollie Johnston, Disney Animation: The Illusion of Life ...
Apr 26, 2008 - tative potasiu m tran sporter. Oryza sa tiva. 1.00E. -4. 9. S tg 4. S tgnhsbm. 4. 6. C. X. 6. 11411. (G. A. )1. 6. GT. TC. TT. GGAGTGGTC. C. A. TC.
Nov 26, 2010 - on a GA that searches for optimal premise structure and input membership ..... re¯ ects human demand or preference in model construc- tion. &.
(Whitley et al, 1993; Moriarty and Miikkulainen, 1996; Stanley and Miikkulainen, ...... Deb K (2001) Multi-objective optimization using evolutionary algorithms. ... Heidrich-Meisner V, Igel C (2009a) Hoeffding and Bernstein races for selecting ...
7777, Metro Manila, Philippines; 4Institute of Biotechnology, College of Agriculture, Rajendranagar, Acharya ... locus was mapped onto chromosome 11 using RM21 and RM224, ... Cytoplasmic-genetic male sterility (CGMS)-based three-line.
Nov 19, 2009 - Such a process could account ..... This latter test was made by checking whether the correlogram con- ...
Oct 14, 2008 - tion (in this case of C); the structure that results is an adjunction structure. ... VP. V. NP. Det N'. N
Certain verbs permit a zero complementizer and, in fact, se does not allow an overt complementizer. (55) Olu se yu /:J k
Oct 14, 2008 - In Government-and-Binding theory, the name given to the abstract underlying level .... of individuals, ta
Oct 14, 2008 - 1 Course information ... Webpage: http://user.uni-frankfurt.de/â¼castrovi ... The syntax-semantics inter
Evolutionary Learning of Syntax Patterns for Genic ... - Google Sites
Evolutionary Learning of Syntax Patterns for Genic Interaction Extraction
Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, Fabiano Tarlao, Marco Virgolin
UNIVERSITÀ DEGLI STUDI DI TRIESTE DIPARTIMENTO DI INGEGNERIA E ARCHITETTURA
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Problem
➔ Identifying sentences that contain interactions between genes and proteins ◆ from biomedical literature ➔ Available data: ◆ dictionary of genes, proteins and interactors ◆ example sentences
2
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Why? ➔ Biomedical literature is: ◆ vast ◆ rapidly growing
➔ Challenging problem: automatic extraction of knowledge from a text in natural language ◆ informations are “diluted” in the text ◆ very challenging problem: discover relations between entities
3
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Goal ➔ Generation of a classifier C in order to identify sentences containing interactions between genes and proteins ◆ automatically ◆ based on recurring syntactic patterns
4
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Our approach ➔ Classifier C is a set of regular expressions (regex)
C={r1,r2,...} ➔ Each regex is a sentence classifier (“accepts” or “does not accept”) ◆ C accepts sentences accepted by at least one regex ➔ Regex applied on a semantical representation of the text
5
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Our approach (II) ➔ Regex generated automatically ◆ by means of Genetic Programming (GP) ◆ starting from examples ● strings which must be accepted ● strings which must not be accepted
6
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Sentences preprocessing Mapping of a sentence s in a ɸ-string x a. substitution of words in s with “annotations” i. gene, protein, interactor or ii. Part-Of-Speech b. mapping of annotations in Unicode characters c. concatenation
7
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Sentences preprocessing (II) Example: s = YfhP may act as a negative regulator for the transcription of yfhQ ↓ [YfhP] [may] [act] [as] [a] [negative] [regulator] [for] [the] [transcription] [of] [yfhQ]
Generation of C: GP ➔ We used a Tree-based GP ➔ In this work candidate solution = regex
9
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Key aspects ➔ Multi-objective fitness: ◆ f=(Accuracy, FPR, Regex length) ◆ we purposefully avoided to include any problemspecific knowledge (gene/protein/…)
➔ Problem handled by mean of separate-andconquer ➔ Final output: set of regular expressions C={r1, r2,...} 10
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Separate-and-conquer ➔ Each regex ri ∈ C makes an independent and parallel classification ➔ Each regex is tailored for a sub-problem ◆ the problem is solved “step-by-step”
➔ Final output = logic OR of classifications
11
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Separate-and-conquer ● C=∅ ● we execute a GP search over the examples obtaining r* ● if FPR < threshold ○ C = C ∪ {r*} ● else ○ terminate
● remove from the positive examples those which were classified correctly by r* 12
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Classifier example C = {r1, r2} r1 = GENEPTN[ˆRB][^NNS VBN GENEPTN]++ r2 = . INOUN IN GENEPTN . [ˆDT NN]
13
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Experimental evaluation: the data ➔ Dataset: 456 sentences from biomedical papers ◆ ½ with interactions e ½ without ◆ manually labelled by experts
➔ Dataset splitted in Learning e Testing ◆ ≈80% examples in Learning ◆ ≈20% examples in Testing
➔ 5 fold randomly generated ◆ with Testingi≠Testingj 14
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Baseline 1, 2: problem specific knowledge ➔ Annotations-Co-Occurrence ◆ it is tightly tailored to this specific problem ◆ sentence is positive if contains ● at least 2 genes/proteins ● at least 1 interactor
➔ Annotations-LLL05-Patterns ◆ 10 pattern generated in “LLL'05 Challenge: Genic Interaction Extraction with Alignments and Finite State Automata”
- J. Hakenberg et alia ◆ built over >90% of the dataset (also testing!)
15
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Baseline 3: ɸ-SSLEA ➔ Based on Smart State Labeling Algorithm ◆ algorithm for DFA learning ◆ works well in presence of noise
➔ Hill-Climbing ➔ Generates DFA which accepts or refuse a ɸstring x ◆ if x accepted ⇒ x contains an interaction between gene/protein ◆ otherwise, no 16
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Baseline 4, 5: Words-NaiveBayes e Words-SVM ➔ Standard for text classification ◆ Supervised Machine Learning methods
➔ Feature based on word occurrences ➔ Preprocessing ◆ stemming ◆ features selection
17
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Results Averaged over the 5 folds Classifier
Accuracy
FPR
FNR
Annotations-Co-Occurrence
77.8
40.0
4.5
Annotations-LLL05-Patterns
82.3
25.0
10.5
Words-NaiveBayes
51.3
25.0
95.0
Words-SVM
73.8
29.0
23.5
ɸ-SSLEA
59.8
44.0
33.5
C
73.7
23.5
22.5 18
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Results (II) ➔ C performs as well as Word-SVM and better than other learning approaches ➔ accuracies of C and Annotations-Co-Occurrence (which exploits domain knowledge of an expert) are very close ◆ Pro: C is composed by patterns (regex) readable ◆ Con: time to generate C (hours) ≫ time to generate other methods (minutes) ● but ≈ time taken for classifying (seconds)
19
Evolutionary Learning of Syntax Patterns
DIA - UniTs
Conclusions We proposed: ➔ a method for the automatic synthesis of a classifier for natural language sentences ◆ based on syntactic pattern ◆ by mean of GP ◆ separate-and-conquer ➔ results are highly promising