Jun 29, 2009 - Results: instance similarity measure - run-time amount time to enrich 100K indexed instances (hrs:min) in
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Instance-Based Ontology Matching By Instance Enrichment Balthasar A.C. Schopman – supervisors: Antoine Isaac Shenghui Wang Stefan Schlobach Vrije Universiteit Amsterdam
June 29, 2009
Conclusions
Ontology matching
Instance-based OM
Outline
1
Ontology matching
2
Instance-based OM
3
IBOMbIE
4
Experiments
5
Comparison other OM
6
Conclusions
IBOMbIE
Experiments
Comparison other OM
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Research questions
General research questions: How do different algorithm design options of IBOMbIE influence the final result? How does the performance of IBOMbIE relate to other OM algorithms?
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Questions from the audience
Crucial questions: please interrupt me. Other questions: after presentation please.
Comparison other OM
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Conclusions
Introduction
Ontology Definition of an ontology1 : An ontology typically (1) defines a vocabulary relevant in a certain domain of interest, (2) specifies the meaning of terms and (3) specifies relations between terms. Ontologies: controlled vocabulary thesaurus database schema canonical semantic web ontology: a set of typed, interrelated concepts defined in a formal language
1
by Euzenat and Shvaiko
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Conclusions
Introduction
Ontology Definition of an ontology1 : An ontology typically (1) defines a vocabulary relevant in a certain domain of interest, (2) specifies the meaning of terms and (3) specifies relations between terms. Ontologies: controlled vocabulary thesaurus database schema canonical semantic web ontology: a set of typed, interrelated concepts defined in a formal language
1
by Euzenat and Shvaiko
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Introduction
Ontology Matching (OM) Ontologies ... facilitate interoperability between parties do not solve heterogeneity problem, but raise it to a higher level: the OM level Elementary OM techniques: terminological structure-based semantic-based instance-based
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Introduction
Ontology Matching (OM) Ontologies ... facilitate interoperability between parties do not solve heterogeneity problem, but raise it to a higher level: the OM level Elementary OM techniques: terminological structure-based semantic-based instance-based
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Introduction
Instance-based OM (IBOM) Variants IBOM: 1
use dually annotated instances (DAI)
2
create DAI
3
use extension of concepts (DAI not required)
General pros and cons: Con: does not deduce specific relations Con: suitable instances rarely available Pro: focus on active part of ontology Pro: able to deal with ambiguous linguistic phenomena: synonym, homonym
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Introduction
Instance-based OM (IBOM) Variants IBOM: 1
use dually annotated instances (DAI)
2
create DAI
3
use extension of concepts (DAI not required)
General pros and cons: Con: does not deduce specific relations Con: suitable instances rarely available Pro: focus on active part of ontology Pro: able to deal with ambiguous linguistic phenomena: synonym, homonym
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Intro
Definitions of ‘instance of’-relation Example definitions: Canonical semantic web definition Library definition someone:Peter foaf:name
rdf:type
"Peter"
foaf:knows someone:Nate
foaf:Person
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Intro
Definitions of ‘instance of’-relation Example definitions: Canonical semantic web definition Library definition ontology / vocabulary
object o1
c1
c1
c2 c3 ...
object o2 c1
c2 c3
...
Comparison other OM
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Intro
Application
Two library scenarios: KB and TEL match controlled vocabularies data-sets: book catalogs multi-lingual
Experiments
Comparison other OM
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
IBOM
IBOM: measuring similarity
c1 c2
Comparison other OM
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
IBOM
IBOM: measuring similarity
c1 c2
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
IBOM
IBOM: measuring similarity
c1 c2
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
IBOM
IBOM: measuring similarity
c1 c2
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
IBOM
Jaccard coefficient
Jaccard coefficient: J(c1 , c2 ) =
|i1 ∩ i2 | |i1 ∪ i2 |
quantifies the overlap of the extension of concepts → relatedness between concepts Con: no multi-sets
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
IBOM
Jaccard coefficient
Jaccard coefficient: J(c1 , c2 ) =
|i1 ∩ i2 | |i1 ∪ i2 |
quantifies the overlap of the extension of concepts → relatedness between concepts Con: no multi-sets
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
IBOM
Creating dually annotated instances (DAI)
Jaccard needs DAI If DAI unavailable: exact instance matching → merge annotations
approximate instance matching → enrich instances
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
IBOM
Creating dually annotated instances (DAI)
Jaccard needs DAI If DAI unavailable: exact instance matching → merge annotations
approximate instance matching → enrich instances
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Instance matching
Approximate instance matching
Instance similarity measures: Lucene vector space model (VSM)
Experiments
Comparison other OM
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Enriching instances
Basic instance enrichment (IE)
data-set D1
data-set D2
i
i
i2
i1 a i
b
match
A
B i
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Enriching instances
Basic instance enrichment (IE)
data-set D1
data-set D2
i
i
i2
i1 a
b
A
B
i
A
B i
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Enriching instances
IE parameter: topN data-set D1
data-set D2
i
i2 i1 a
i
b
1st match
A
B i3 D
2nd match i4 3rd match
A
C
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Enriching instances
IE parameter: topN data-set D1
data-set D2
i
i2 i1 A
B
a
b
i3
A
B
D
i
i4 A
C
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Enriching instances
IE parameter: topN data-set D1
data-set D2
i
i2 i1 A
B
a
b
i3
A
B
D
i
i4
D A
C
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Enriching instances
IE parameter: topN data-set D1
data-set D2
i
i2 i1 A
B
a
b
i3
A
B
D
i
i4
D A
C
A
C
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Enriching instances
IE parameter: similarity threshold (ST) data-set D1
data-set D2
i
i2 i1 a
i
b
sim(i1,i2) = 0.8
A
i3
sim(i1,i3) = 0.4 sim(i1,i4) = 0.2
B
D i4 A
C
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Enriching instances
IE parameter: similarity threshold (ST) data-set D1
data-set D2
i
i2 i1
i
a
b
A
B
sim(i1,i2) = 0.8
A
i3
sim(i1,i3) = 0.4 sim(i1,i4) = 0.2
B
D i4 A
C
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Enriching instances
IE parameter: similarity threshold (ST)
data-set D1
data-set D2
i
i2 i1
i
a
b
A
B
sim(i1,i2) = 0.8
A
i3
sim(i1,i3) = 0.4
D i4
D sim(i1,i4) = 0.2
B
A
C
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Enriching instances
IE parameter: similarity threshold (ST)
data-set D1
data-set D2
i
i2 i1 a
b
A
B
i
sim(i1,i2) = 0.8
A
i3
sim(i1,i3) = 0.4
D i4
D A
C
sim(i1,i4) = 0.2
B
A
C
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Experimental questions
Experimental questions
Instance similarity measure topN parameter ST parameter combining topN + ST parameters performance as compared to other OM algorithms
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Evaluation
Alignment evaluation
Methods: Gold standard := good alignment Reindexing
Measures: Precision Recall f-measure
Comparison other OM
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Conclusions
Results of experiments
Results: instance similarity measure - quality
1
1 P VSM R VSM F VSM P Lucene R Lucene F Lucene
0.8
0.6
performance
performance
0.8
0.4
0.2
P VSM R VSM F VSM P Lucene R Lucene F Lucene
0.6
0.4
0.2
0 10
100
1000
10000
100000
1e+06
0 100
mapping rank
1000
10000 mapping rank
(a) Gold standard
(b) Reindex
Virtually equal
100000
1e+06
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Conclusions
Results of experiments
Results: instance similarity measure - quality
1
1 precision VSM precision Lucene
0.8
0.6
0.6
overlap
performance
0.8
0.4
0.2
0.4
0.2
0
0 1
10
100
1000
10000
100000
1e+06
0
500
mapping rank
(c) Overlap
1000
1500
2000
2500
3000
3500
4000
mapping rank
(d) Manual Evaluation
Edge to VSM
4500
5000
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Conclusions
Results of experiments
Results: instance similarity measure - run-time time to enrich 100K instances (hrs:min) Lucene VSM 1:04 0:17 7:20 0:22 26:15 0:32 (e) stats
1600 VSM Lucene 1400
1200
increase run-time
amount indexed instances 524K 1,457K 2,506K
1000
800
600
400
200
0 4
6
8
10
12
14
16
18
20
indexed documents * 100K
(f) figure it out
Optimizations VSM: pre-calculate weights indexed documents purge insignificant weights (35% + 50%) word centered indexing approach
22
24
26
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Conclusions
Results of experiments
Results: instance similarity measure - run-time time to enrich 100K instances (hrs:min) Lucene VSM 1:04 0:17 7:20 0:22 26:15 0:32 (g) stats
1600 VSM Lucene 1400
1200
increase run-time
amount indexed instances 524K 1,457K 2,506K
1000
800
600
400
200
0 4
6
8
10
12
14
16
18
20
indexed documents * 100K
(h) figure it out
Optimizations VSM: pre-calculate weights indexed documents purge insignificant weights (35% + 50%) word centered indexing approach
22
24
26
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Conclusions
Results of experiments
Results: topN parameter (TEL)
As N increases, quality of mappings decrease 0.45
0.25 top1 (baseline) top2 top3 top4 top5 top6
0.4
0.2
0.35
top1 (baseline) top2 top3 top4 top5 top6
0.3
f-measure
f-measure
0.15 0.25
0.2
0.1 0.15
0.1
0.05
0.05
0 1
10
100
1000
10000
mapping rank
(i) Gold standard
100000
1e+06
0 100
1000
10000 mapping rank
(j) Reindex
100000
1e+06
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Conclusions
Results of experiments
Results: similarity threshold parameter (KB) Best performance with ST: ST=µ Best performance: baseline (topN=1, ST=∞) 0.6
0.4 baseline T=mean-1.5s T=mean-s T=mean-.5s T=mean T=mean+.5s T=mean+s T=mean+1.5s
0.5
0.35
0.3
baseline T=mean-1.5s T=mean-s T=mean-.5s T=mean T=mean+.5s T=mean+s T=mean+1.5s
0.4
f-measure
f-measure
0.25
0.3
0.2
0.15 0.2 0.1 0.1 0.05
0 10
100
1000
10000 mapping rank
(k) Gold standard
100000
1e+06
0 100
1000
10000 mapping rank
(l) Reindex
100000
1e+06
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Conclusions
Results of experiments
Results: combining parameters Using both parameters performs good in TEL, not in KB... possibly due to: more selective IBOMbIE pays off in TEL, because vocabularies + instance annotations are more different than in KB scenario. 0.4
0.35
0.3
0.3 baseline topN=1 ST=mu-0.5s topN=1 ST=mu topN=1 ST=mu+0.5s topN=2 ST=mu-0.5s topN=2 ST=mu topN=2 ST=mu+0.5s topN=3 ST=mu-0.5s topN=3 ST=mu
0.25
baseline topN=1 ST=mu-0.5s topN=1 ST=mu topN=1 ST=mu+0.5s topN=2 ST=mu-0.5s topN=2 ST=mu topN=2 ST=mu+0.5s topN=3 ST=mu topN=3 ST=mu+0.5s
0.2
f-measure
f-measure
0.25
0.2
0.15
0.15 0.1 0.1 0.05 0.05
0 100
1000
10000
100000
mapping rank
(m) KB
(evaluation method: reindexing)
1e+06
0 100
1000
10000 mapping rank
(n) TEL
100000
1e+06
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
OAEI
Ontology alignment evaluation initiative (OAEI)
DSSim Lily TaxoMap IBOMbIE
terminological X X X ✗
structurebased X X X ✗
semanticbased X X X ✗
instancebased ✗ ✗ ✗ X
DSSim, Lily and TaxoMap: consider KB ontologies “huge” feature functionality to deal with large ontologies
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
OAEI
Performance comparison: quality 0.8 P IBOMbIE topN=1 R IBOMbIE topN=1 P DSSim R DSSim P Lily R Lily P TaxoMap R TaxoMap
0.7
0.6
performance
0.5
0.4
0.3
0.2
0.1
0 0
2000
4000
6000 mapping rank
8000
10000
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
OAEI
Performance comparison: resources + coverage
matcher DSSim Lily TaxoMap IBOMbIE
run-time 12:00 ? 2:40 1:54
amount mappings 2930 2797 1851 7000+
(Amount lexically equal concepts KB vocabulaires = 2,895)
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Conclusions + discussion
IBOMbIE algorithm is quite promising: Relatively low run-time Able to deal with large ontologies Amount + quality of mappings Pros of IBOM Able to align ontologies using disjunct data-sets
Basic instance enrichment appears best performing method. Possible cause: Jaccard coefficient does not support multi-sets.
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Fin
Thank you... any questions ?
Experiments
Comparison other OM
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Vocabularies
KB TEL
vocabulary GTT Brinkman LCSH Rameau SWD
size 35K 5K 340K 155K 805K
Experiments
Comparison other OM
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
IE parameter: similarity threshold (ST)
KB TEL
standard ST: µ step-size: 21 σ
D1 annotated with O1 O2 O1 O2
D2 annotated with O2 O1 O2 O1
µ 0.297 0.279 0.260 0.232
σ 0.106 0.101 0.097 0.084
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
VSM Weights are components of vectors: term frequency - inverse document frequency: TF-IDF e.g. audiovisual features tfidfw ,d = tfw ,d ∗ idfw √ nw ,d tfw ,d = |d| idfw = log VSM cosine similarity
|D| |d ∈ D : w ∈ d|
Pn wi ,d wi ,d d~1 · d~2 cosine sim(d1 , d2 ) = = qP i =1 q1P 2 |d~1 ||d~2 | w2 w2 i
i ,d1
i
i ,d2
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Evaluation method: gold standard
Gold standard := good alignment P = precision = R = recall =
|{reference} ∩ {retrieved}| |{retrieved}|
|{reference} ∩ {retrieved}| |{reference}|
F = f − measure = 2 ∗
P ∗R P +R
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
Evaluation method: reindexing o_1
o_2
a
x
b
y
c
z
instance i_dual {a, b} {x}
instance i_dual reindex
{x, z} {a, b}
P=
Pdually
annotated instances |{reference}∩{retrieved}| |{retrieved}|
R=
Pdually
annotated instances |{reference}∩{retrieved}| |{reference}|
|{reindexed instances}|
Conclusions
Ontology matching
Instance-based OM
IBOMbIE
Experiments
Comparison other OM
IbOM by IM algorithm overview
Whole algorithm Start: two data-sets Dx and Dy 1
Enrich instances of Dx with annotations of instances of Dy For every instance a: 1 2
Find N best matching instances {b} in Dy Add annotations of {b} to a
2
Enrich vice versa
3
Merge data-sets into one dually annotated data-set
4
Apply Jaccard measure
Conclusions