Synthetic Paragram Embeddings for Semantic Textual Similarity

Recommend Documents

I thank my parents for teaching me about the things good parents do. It is only ..... Then leaving the linguistic and moving to the mathematical, or rather the computer science domain. ... Similarity are mentioned and a few ways to calculate the Simi

Textual Similarity

Two texts that say the same things but in present tense and in past tense. 12. Two texts that say ... Each test were made in pdf so no text editing could be done.

Semantic Textual Similarity between sentences

utilizing Word2Vec. We also use various semantic similarity metrics like Jaccard Coefficient and Containment. Coefficient to aid the computation of semantic ...

Interpretable Semantic Textual Similarity - Association for ...

Jun 17, 2016 - Moore, Natalie B. Steinhauser, Gwendolyn E. Camp- bell, Elaine Farrow, and Charles B. Callaway, 2010. Sustaining TEL: From ... Rodney D. Nielsen, Wayne Ward, and James H. Martin. 2009. Recognizing entailment in ...

Multilingual System for Measuring Semantic Textual Similarity

Aug 23, 2014 - Alexander ChÃ¡vez. HÃ©ctor DÃ¡vila. DI, University of Matanzas, Cuba. {alexander.chavez, ..... Chris Brockett, 2007. Aligning the RTE 2006 ...

Exploring Word Embeddings for Unsupervised Textual User ...

Apr 10, 2017 - The other one is a corpus of Twitter posts (tweets) written in Portuguese. ... marks in the other word. For instance, the diacritical symmetry between maÃ§Ã£ (apple) ..... Computer Speech & Language, 15(3):287â333. K. R. Vivek ...

An approach to Semantic Textual Similarity based on Textual Entailment

Jun 7, 2012 - First Joint Conference on Lexical and Computational Semantics ... described below. ... the goal of providing a binary answer for each pair. (H,T) ...

Semantic Textual Similarity with Machine Translation Evaluation ...

Page 1. Proceedings of SemEval-2016, pages 628â633,. San Diego, California, June 16-17, 2016. c 2016 Association for Computational Linguistics.

Partitioning Strategies for Spatio-Textual Similarity Join

to accumulate similarity scores between objects (Line 25-. 30). The object .... points (POIs) to regions (ROIs). The authors ..... ularity at which to apply the filtering: it is very straight- ... cities, crawled from Flickr using geographic queries

TeamZ: Measuring Semantic Textual Similarity for Spanish Using an ...

Aug 23, 2014 - are used can be different (Sahami and Heilman,. 2006). For example .... Yuhua Li, David McLean, Zuhair A. Bandar, James D. O'Shea, and ...

Jun 13, 2013 - UMCC_DLSI: Textual Similarity based on Lexical-Semantic features. Alexander ChÃ¡vez, Antonio FernÃ¡ndez OrquÃn,. HÃ©ctor DÃ¡vila, Yoan ...

TATO: Leveraging on Multiple Strategies for Semantic Textual Similarity

Jun 4, 2015 - References. Daniel BÃ¤r, Chris Biemann, Iryna Gurevych, and Torsten. Zesch. 2012. ... Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006.

SOPA: Random Forests Regression for the Semantic Textual Similarity

Jun 4, 2015 - In this section we describe the measures used as features in our system .... The Named Entities of type PER (Person) are also used as clues: we .... The best performance .... Raj, Rita Singh, William Walker, Manfred Warmuth,.

Dec 22, 2011 - [16] Sanjeev Arora, LÃ¡szlÃ³ Babai, Jacques Stern, and Z. Sweedyk. The Hardness of Approximate Optima in. Lattices, Codes, and Systems of ...

Dec 22, 2011 - k-NN to handle non-PSD similarity/distance functions and ..... transfer function f and arbitrary choice of landmark pairs P, let w(g,f) be the best.

SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and ...

Evaluation Data Analysis. Evaluation data .... has the best system for the track 5 English data (r: 0.8547) ...... vised

SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and ...

models to be evaluated on the type of data used to assess textual ..... tasks: text reuse detection, binary classificati

Jun 7, 2012 - Palmer similarity measure (Wu and Palmer, 1994) and two âProxigeneaâ ..... Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram ...

Semantic Textual Similarity in Quality Estimation - Baltic Journal of ...

May 13, 2016 - Page 1. Baltic J. Modern Computing, Vol. 4 (2016), No. 2, pp. 256â268. Semantic Textual Similarity in Quality Estimation. Hanna B ...

Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation

Jun 16, 2016 - ical features (which have been also typically used in STS tasks to .... Another system submitted by the Samsung Poland NLP Team named.

UdL at SemEval-2017 Task 1: Semantic Textual Similarity Estimation

English Sentence Pairs Using Regression Model over Pairwise Features. Hussein T. ... The teams are then ranked by their test score on a hidden human .... spell-checker correction using Enchant software. ... Manually Assigned Domain Class. FNWN, OnWN,

SemEval-2017 Task 1: Semantic Textual Similarity ... - ACL Anthology

Word2vec: https://code.google.com/archive/ p/word2vec/ .... A comprehensive solution for the statisti- ..... ings of AMT

An Overview of Textual Semantic Similarity Measures Based on Web ...

Keywords Similarity measures Â· Web Intelligence Â· Web Search Engines Â·. Information Integration. 1 Introduction. Text

SemEval-2017 Task 1: Semantic Textual Similarity ... - ACL Anthology

numerous applications including: machine translation (MT) ... More recently, deep learning became competitive with top

Synthetic Paragram Embeddings for Semantic Textual Similarity

Download PDF

1 downloads 0 Views 186KB Size Report

Comment

Jun 17, 2016 - quality embeddings trained using the Para- ... chine translation to generate artificial para- phrases. 3. ..... the features using Gini importance5 (Singh et al.,. 2010). Table 4 ... ADAPT Centre at Dublin City University (Grant.

DCU-SEManiacs at SemEval-2016 Task 1: Synthetic Paragram Embeddings for Semantic Textual Similarity Chris Hokamp, Piyush Arora ADAPT Centre School of Computing Dublin City University Dublin, Ireland {chokamp,parora}@computing.dcu.ie Abstract

1. Using a margin-based objective function to train high-quality sentence embeddings without using supervised scores

We experiment with learning word representations designed to be combined into sentencelevel semantic representations, using an objective function which does not directly make use of the supervised scores provided with the training data, instead opting for a simpler objective which encourages similar phrases to be close together in the embedding space. This simple objective lets us start with highquality embeddings trained using the Paraphrase Database (PPDB) (Wieting et al., 2015; Ganitkevitch et al., 2013), and then tune these embeddings using the official STS task training data, as well as synthetic paraphrases for each test dataset, obtained by pivoting through machine translation.

2. Creating new synthetic training data using machine translation to generate artificial paraphrases 3. Using ensemble models to combine features generated by our embedding networks with features obtained from other sources 1.1

Our submissions include runs which only compare the similarity of phrases in the embedding space, directly using the similarity score to produce predictions, as well as a run which uses vector similarity in addition to a suite of features we investigated for our 2015 Semeval submission. For the crosslingual task, we simply translate the Spanish sentences to English, and use the same system we designed for the monolingual task.

1

Introduction

We describe the work carried out by the DCUSEManiacs team on the Semantic Textual Similarity (STS) task at SemEval-2016 (Agirre et al., 2013; Agirre et al., 2014; Nakov et al., 2015). The main ideas we investigate in our systems are:

Task Description

The Semeval Semantic Textual Similarity (STS) task provides participants with training data consisting of pairs of sentences annotated with gold-standard semantic similarity scores. The crowd-sourced similarity scores are given on a scale from 0 (no relation) to 5 (semantic equivalence). Thus, our aim is to use the training data to learn a model which predicts a score between 0 and 5 for unseen input pairs (Nakov et al., 2015). The monolingual STS task has been organized each year since 2012, and most approaches have viewed the learning task as a regression problem, where real-valued model output is clipped to be 0