Synthetic Paragram Embeddings for Semantic Textual Similarity
Recommend Documents
I thank my parents for teaching me about the things good parents do. It is only ..... Then leaving the linguistic and moving to the mathematical, or rather the computer science domain. ... Similarity are mentioned and a few ways to calculate the Simi
Two texts that say the same things but in present tense and in past tense. 12. Two texts that say ... Each test were made in pdf so no text editing could be done.
utilizing Word2Vec. We also use various semantic simi- larity metrics like Jaccard Coefficient and Containment. Coefficient to aid the computation of semantic ...
Jun 17, 2016 - Moore, Natalie B. Steinhauser, Gwendolyn E. Camp- bell, Elaine Farrow, and Charles B. Callaway, 2010. Sustaining TEL: From ... Rodney D. Nielsen, Wayne Ward, and James H. Martin. 2009. Recognizing entailment in ...
Apr 10, 2017 - The other one is a corpus of Twitter posts (tweets) written in Portuguese. ... marks in the other word. For instance, the diacritical symmetry between maçã (apple) ..... Computer Speech & Language, 15(3):287â333. K. R. Vivek ...
Jun 7, 2012 - First Joint Conference on Lexical and Computational Semantics ... described below. ... the goal of providing a binary answer for each pair. (H,T) ...
Page 1. Proceedings of SemEval-2016, pages 628â633,. San Diego, California, June 16-17, 2016. c 2016 Association for Computational Linguistics.
to accumulate similarity scores between objects (Line 25-. 30). The object .... points (POIs) to regions (ROIs). The authors ..... ularity at which to apply the filtering: it is very straight- ... cities, crawled from Flickr using geographic queries
Aug 23, 2014 - are used can be different (Sahami and Heilman,. 2006). For example .... Yuhua Li, David McLean, Zuhair A. Bandar, James D. O'Shea, and ...
Jun 4, 2015 - References. Daniel Bär, Chris Biemann, Iryna Gurevych, and Torsten. Zesch. 2012. ... Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006.
Jun 4, 2015 - In this section we describe the measures used as fea- tures in our system .... The Named Entities of type PER (Person) are also used as clues: we .... The best performance .... Raj, Rita Singh, William Walker, Manfred Warmuth,.
Dec 22, 2011 - [16] Sanjeev Arora, László Babai, Jacques Stern, and Z. Sweedyk. The Hardness of Approximate Optima in. Lattices, Codes, and Systems of ...
Dec 22, 2011 - k-NN to handle non-PSD similarity/distance functions and ..... transfer function f and arbitrary choice of landmark pairs P, let w(g,f) be the best.
Evaluation Data Analysis. Evaluation data .... has the best system for the track 5 English data (r: 0.8547) ...... vised
models to be evaluated on the type of data used to assess textual ..... tasks: text reuse detection, binary classificati
Jun 7, 2012 - Palmer similarity measure (Wu and Palmer, 1994) and two âProxigeneaâ ..... Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram ...
May 13, 2016 - Page 1. Baltic J. Modern Computing, Vol. 4 (2016), No. 2, pp. 256â268. Semantic Textual Similarity in Quality Estimation. Hanna B ...
Jun 16, 2016 - ical features (which have been also typically used in STS tasks to .... Another system submitted by the Samsung Poland NLP Team named.
English Sentence Pairs Using Regression Model over Pairwise Features. Hussein T. ... The teams are then ranked by their test score on a hidden human .... spell-checker correction using Enchant software. ... Manually Assigned Domain Class. FNWN, OnWN,
Word2vec: https://code.google.com/archive/ p/word2vec/ .... A comprehensive solution for the statisti- ..... ings of AMT
Keywords Similarity measures · Web Intelligence · Web Search Engines ·. Information Integration. 1 Introduction. Text
numerous applications including: machine trans- lation (MT) ... More recently, deep learning became competitive with top
Synthetic Paragram Embeddings for Semantic Textual Similarity
Jun 17, 2016 - quality embeddings trained using the Para- ... chine translation to generate artificial para- phrases. 3. ..... the features using Gini importance5 (Singh et al.,. 2010). Table 4 ... ADAPT Centre at Dublin City University (Grant.
DCU-SEManiacs at SemEval-2016 Task 1: Synthetic Paragram Embeddings for Semantic Textual Similarity Chris Hokamp, Piyush Arora ADAPT Centre School of Computing Dublin City University Dublin, Ireland {chokamp,parora}@computing.dcu.ie Abstract
1. Using a margin-based objective function to train high-quality sentence embeddings without using supervised scores
We experiment with learning word representations designed to be combined into sentencelevel semantic representations, using an objective function which does not directly make use of the supervised scores provided with the training data, instead opting for a simpler objective which encourages similar phrases to be close together in the embedding space. This simple objective lets us start with highquality embeddings trained using the Paraphrase Database (PPDB) (Wieting et al., 2015; Ganitkevitch et al., 2013), and then tune these embeddings using the official STS task training data, as well as synthetic paraphrases for each test dataset, obtained by pivoting through machine translation.
2. Creating new synthetic training data using machine translation to generate artificial paraphrases 3. Using ensemble models to combine features generated by our embedding networks with features obtained from other sources 1.1
Our submissions include runs which only compare the similarity of phrases in the embedding space, directly using the similarity score to produce predictions, as well as a run which uses vector similarity in addition to a suite of features we investigated for our 2015 Semeval submission. For the crosslingual task, we simply translate the Spanish sentences to English, and use the same system we designed for the monolingual task.
1
Introduction
We describe the work carried out by the DCUSEManiacs team on the Semantic Textual Similarity (STS) task at SemEval-2016 (Agirre et al., 2013; Agirre et al., 2014; Nakov et al., 2015). The main ideas we investigate in our systems are:
Task Description
The Semeval Semantic Textual Similarity (STS) task provides participants with training data consisting of pairs of sentences annotated with gold-standard semantic similarity scores. The crowd-sourced similarity scores are given on a scale from 0 (no relation) to 5 (semantic equivalence). Thus, our aim is to use the training data to learn a model which predicts a score between 0 and 5 for unseen input pairs (Nakov et al., 2015). The monolingual STS task has been organized each year since 2012, and most approaches have viewed the learning task as a regression problem, where real-valued model output is clipped to be 0