Danilo Croce, Valerio Storch and Roberto Basili. *Sem - Atlanta, June 2013.
UNITOR-CORE TYPED. Combining Text Similarity and Semantic. Filters through
SV ...
UNITOR-CORE TYPED Combining Text Similarity and Semantic Filters through SV Regression
Danilo Croce, Valerio Storch and Roberto Basili UNIVERSITY OF ROMA, TOR VERGATA
*Sem - Atlanta, June 2013
Outline ¨
Modeling STS through kernel-based regression ¤ Similarity
functions
¨
Semantic constraints for the Typed STS
¨
STS challenge results ¤ Core-STS ¤ Typed-STS
Textual similarity as SV regression ¨
¨
STS is modeled as a Support Vector (SV) regression problem [Smola and Scholkopf, 2004] The semantic relatedness between two texts is first redundantly modeled through a set of independent similarity functions. ¤
each function reflects a specific semantic perspective: n
¨
E.g. syntactic and lexical similarity
A Support Vector regressor learns the proper combination of different functions acquired in an unsupervised fashion
STS functions: lexical information ¨
Lexical Overlap (LO) considers the lexical similarity between sentences ¤ LO
¨
is the Jaccard Similarity score between word sets
Lexical information is generalized through a Word Space model ¤ each
word is a vector in a space where distance reflects semantic relations
¨
Linear Combination (SUM) ¤ a
sentence is modeled as the linear combination of its words ¤ similarity is the cosine similarity between the resulting representations
STS functions: introducing syntax ¨
A Distributional Compositional Semantics based operator A sentence is a set of syntactically restricted compounds (V-SBJ) ¤ Syntactic bi-grams similarity is modeled as the projection in lexically-driven Subspaces [Annesi et al, 2012] ¤ The Syntactic Soft Cardinality (SSC) operator combines the contribution of specific compounds based on the Soft-cardinality function [Jimenez et al, 2012] ¤
¨
A semantically Smoothed Partial Tree Kernel (SPTK) [Croce et al, 2011] operator based a Convolution Kernel jointly modeling syntactic and lexical semantic similarity in both sentences ¤ it extends the similarity between tree structures with a function of node similarity (here from the Word Space) ¤
Semantic constraints for the Typed STS The chemist R.S. Hudson began manufacturing soap in the back of his small shop in West Bomich in 1837
PERSON (e.g. R.S. Hudson) ¨ Verbs (e.g. began manifacturing) ¨ LOCATIONS (e.g. West Bomich) ¨ TIME (e.g. 1837) ¨
people involved event location time
Semantic constraints for the Typed STS ¨
Given a semi-structured source, we filter fields by their type ¤ ¤
¨
¨
¨
When a time-based similarity is targeted, the dcDate field should be considered while the dcCreator may be neglected Other fields may contain useful information, but it should be filtered
An information extraction system is used to extract useful information, e.g. temporal information Similarity functions are applied to selected fields where specific phrases have been extracted Example: the time similarity ¤ ¤
the dcDate field is fully considered only phrases expressing temporal information are considered within the dcTitle, dcSubject and dcDescription
UNITOR-CORE: Experimental Setup ¨
A regressor has been trained in a 13 dimensional feature space ¤ ¤ ¤ ¤
¨
¨
5 scores from LO 5 scores from SUM 1 score from SSC 2 scores from SPTK
The word space model is derived from the distributional analysis of the UkWaC corpus Three runs differing in the training set definition ¤ ¤ ¤
Run1: training dataset are heuristically selected, i.e. one regressor for test dataset; a linear kernel is employed Run2: all data have been used within the regressor, only one classifier Run3: same trainset of Run1; a gaussian kernel is employed
UNITOR-Core: Results Run1
Run2
Run3
Run1*
headlines
.635 (50)
.651 (39)
.603 (58)
.671 (30)
OnWN
.574 (33)
.561 (36)
.549 (40)
.637 (25)
FNWN
.352 (35)
.358 (32)
.327 (44)
.459 (07)
SMT
.328 (39)
.310 (49)
.319 (44)
.348 (21)
Mean
.494 (37)
.490 (42)
.472 (52)
.537 (19)
¨
The selection of wrong dataset provide performance drop ¤ Run1*:
a better selection of the training material, i.e. dataset maximizing performance ¤ Improvement from 37th to 19th position
UNITOR-Typed: Experimental Setup ¨
The same SV regressor-based schema is applied ¤ A
specific regressor is learned for each target similarity ¤ Text are not sentential n the
LO and SUM function are employed
¤ The
Stanford NER system extracts phrases referring to the classes PERSON, TIME, LOCATION
time
dcTitle
dcSubject
dcDesc.
dcCreator
dcDate
dcSource
DATE
DATE
DATE
-
*
-
UNITOR-Typed: Results ¨
Run 1 2
¨
Two Runs ¤ Run1: linear kernel is used within ¤ Run2: a gaussian kernel is used general .7981 .7564
author .8158 .8076
peop.inv. .6922 .6758
time .7471 .7090
location .7723 .7351
event .6835 .6623
subject .7875 .7520
descr. .7996 .7745
mean .7620 .7341
Error Analysis: some scores are overestimated due to some “coincidences” The Octagon and Pavilions, Pavilion Garden, Buxton, c 1875 VS The Beatles, The Octagon, Pavillion Gardens, St John’s Road, Buxton, 1963
Conclusion ¨
We modeled STS as a SV regression problem ¤ a
SV regressor learns how to combine basic similarity measures ¤ we apply simple but effective semantic filters to emphasize specific information ¤ no hand-coded resource is used ¨
Future work ¤ improve
the domain adaptation of the proposed similarity functions and combination approach ¤ provide a method to properly select training material
Thank you for your attention…