Ontology Evolution in Data Integration: Query Rewriting to the Rescue

2 downloads 4911 Views 2MB Size Report
Ontology Evolution in Data Integration: Query Rewriting to the Rescue. Haridimos Kondylakis, Dimitris Plexousakis. Computer Science Department, University of ...
Ontology Evolution in Data Integration: Query Rewriting to the Rescue Haridimos Kondylakis, Dimitris Plexousakis Computer Science Department, University of Crete Information Systems Laboratory, FORTH-ICS

Problem Statement

Query

Mappings

Data Integration System

Sub-queries

2 of 37

DB

DB

DB

2 /17

Past Approaches  Mapping Adaptation S

move elem

O M1

O1 M2

add elem O2

M3

delete constraint

 Mapping Composition S

M

Can use schema mapping tools to construct E.

O E

M’ = M ° E 3 of 37

O’

3 /17

“Everything should be as simple as it is, but not simpler” -Albert Einstein Ontology RDF/S as Ontology global schema

DI System Independent

Modular

More Intuitive

Mappings created only once

Mappings do not change SpaRQL

Data Integration System

Mappings

4 of 37

DB

Can be easily updated SpaRQL

DB

DB

DB

Mappings

DB

DB

4 /17

“Everything that exists, it is only change”

-Heraclitus 535 BCE Definition (High-level Change Operation) A change operation u over O, is any tuple (δa, δd) where δa O = ø δd O A change operation u from O1 to O2 is a change operation over O1 such that: δa O2\ O1 δd O1\ O2

 High-level change operations because:  Most Important: They contain a smaller number of individual low-level deletions/additions (explained later)  The produced change log has smaller size  Explanations on failures, are more intuitive and concise.  Requirement: EO1,O2 should be complete, non-ambiguous and

unique 5 of 37 Composition and inversion are desirable but not obligatory properties

5 /17

Example Evolution Literal

fullname name

Literal address

Person

Literal ssn Literal

Literal Literal

street has_cont_point

gender Actor

city

Cont. Point

 u1 = Delete_Property(gender, ø, ø, ø, ø, Person, xsd:String, ø, ø)  u2 = Merge_Properties( {street, city}, address)  u3 = Rename_Property(name, fullname) 6 of 37

6 /17

Example Evolution Continued  EO1, O2  GAV mappings  u1 : Delete_Property(gender, ø, ø, ø, ø, Person, xsd:String, ø, ø)  x,y, fullname(x, y)  name(x ,y)  u2 : Merge_Properties( {street, city}, address)  x,y, address(x, y)  y1, y2, street(x, y1) ˄ city(x, y2) ˄ concat(y, {y1, y2})  u3 : Rename_Property(name, fullname)  ????

 EO2, O1  inv(u3):Rename_Property(fullname, name)  inv(u2):Split_Property(address, {street, city})

 inv(u1):Add_Property(gender, ø, ø, ø, ø, Person, xsd:String, ø, ø)

7 /17

Evolving Data Integration System (EDI) Definition (Evolving Data Integration System)

An evolving data integration system EDI is a tuple of the form (O1, S1, M1), …, (Om, Sm, Mm) where •Oi is a version of the ontology (1≤ i ≤m). •Si is a set of the local sources (1≤ i ≤m). •Mi is the mapping between Si and Oi (1≤ i ≤m). Om

O1 M1 S11

… S12

Om

Oi S13

… Mm

Mi Si1

Sm1

Sm1

8 /17

Query Processing  Queries to EDI are posed in terms of the Om  In this work we do not consider OPT and FILTER operations.  The remaining fragment corresponds to union of conjunctive queries (Perez &

Arenas, 2009). select ?SSN ?NAME ?ADDRESS

where { ?X

?Y }

type Person; ssn ?SSN; fullname ?NAME; has_contact_point ?Y. type Cont.Point; address ?ADDRESS.

π?SSN,?NAME,?ADDRESS ( (?X, type, Person) AND (?X, ssn, ?SSN) AND (?X, fullname, ?NAME) AND (?X, has_contact_point, ?Y) AND (?Y, type, Cont.Point) AND (?Y, address, ?ADDRESS) )

9 /17

Person

Query Expansion  QuOnto Reasoner

(Poggi, 2008) is used for query expansion.  Automatically identifies constraints

in RDF/S ontologies.

 Time complexity:

 O(m*(s+1)2)s  m=#of ontology elements  s=#of query subgoals

q Expander

exp(q)

Actor

π?SSN,?NAME,?ADDRESS ( (?X, type, Person) AND (?X, ssn, ?SSN) AND (?X, fullname, ?NAME) AND (?X, has_contact_point, ?Y) AND (?Y, type, Cont.Point) AND (?Y, address, ?ADDRESS)) UNION π?SSN,?NAME,?ADDRESS ( (?X, type, Actor) AND (?X, ssn, ?SSN) AND (?X, fullname, ?NAME) AND (?X, has_contact_point, ?Y) AND (?Y, type, Cont.Point) AND (?Y, address, ?ADDRESS)) 10 /17

Valid Rewriting  Since GAV mappings are

used, rewriting is performed using unfolding  Time Complexity:  O( q*n*s)  q = #expanded queries  n = #change operations  s = #subgoals in q 



x, y, fullname(x, y) name(x , y)

x, y, address(x, y)  a,b, street(x, a) ˄ city(x, b) ˄ concat(y, {a,b})

( π?SSN,?NAME,?ADDRESS ( ?SSN,?NAME,?ADDRESS ππ?SSN,?NAME, ?STREET, ?CITY ( (?X, type, Person) AND type, Person) AND (?X, (?X, type, Person) AND (?X, ssn, ?SSN) AND ?SSN) AND (?X, (?X, ssn,ssn, ?SSN) AND (?X, name, ?NAME) AND fullname, ?NAME) (?X, name, ?NAME) ANDAND (?X, name, ?NAME) AND (?X, has_contact_point, ?Y)AND AND has_contact_point, AND (?X, has_cont.point, ?Y) (?X, has_contact_point, ?Y)?Y) AND (?Y, type, Cont.Point) AND type, Cont.Point) AND AND (?Y,(?Y, type, Cont.Point) AND (?Y, street, ?STREET) AND address, ?ADDRESS)) (?Y,(?Y,street, ?STREET) AND (?Y, city, ?CITY) AND (?Y,UNION city, ?CITY)) πconcat(?ADDRESS, ( {?CITY, ?STREET}) ?SSN,?NAME,?ADDRESS π?SSN,?NAME,?STREET, ( ?CITY π?SSN,?NAME,?ADDRESS (?X, type,(Actor) AND (?X, type, Actor) AND (?X, AND (?X,type, ssn,Actor) ?SSN) AND (?X, ssn, ?SSN) (?X, ?SSN)AND AND fullname, ?NAME) AND (?X,ssn, name, ?NAME) AND (?X, name, ?NAME) AND (?X, ?NAME) AND?Y) AND (?X,name, has_contact_point, (?X, has_contact_point, ?Y) (?X, ?Y)AND AND (?Y,has_contact_point, type, Cont.Point) AND (?Y, type, Cont.Point) AND (?Y, Cont.Point) AND (?Y,type, address, ?ADDRESS)) (?Y, street, ?STREET) AND (?Y, street, ?STREET) AND (?Y, (?Y,city, city, ?CITY)) ?CITY) AND

concat(?ADDRESS, {?CITY, ?STREET})

11 /17

Problems & Solutions  π ?NAME,?GENDER (

(?X, type, Actor) AND name, ?NAME) AND (?X, fullname, ?NAME) AND (?X, gender, ?GENDER))

u1 : Delete_Property(gender, ø, ø, ø, ø, Person , xsd:String, ø, ø) u2 : Merge_Properties( {street, city}, address) u3 : Rename_Property(name, fullname)

Definition ( Affecting change operation )

A change operation u affects the conjunctive query q expressed using terms from O1, (u ◊ q) iff • δa(u)=ø • triple pattern t q that can be unified with a triple of δd(u).

12 /17

Minimally-Containing Rewritings Definition ( Minimally-Containing Rewriting) A query q˄ is a minimally-containing rewriting of a conjunctive query q using a set of mappings (views) M if and only if (1) q˄ is a containing rewriting of q (q q˄) and (1) there exists no containing rewriting q˄˄ of q using M, such that the expansion of q˄˄ contains the expansion of q˄. qcontaining Answers to q

Proposition The minimally-containing rewriting of a conjunctive query q over EO1, O2 can be computed in O(n*s) (n=#change operations, s=#query subgoals ). 13 /17

Minimally-Generalized Rewritings Literal

name

EO1, O2

Literal Person

Literal

ssn

street

personal_info

has_cont_point gender

Literal

Literal

Actor

Literal city

Cont. Point

u1 : Delete_Property(gender, ø, ø, ø, ø, Person , xsd:String, ø, ø) u2 : Merge_Properties( {street, city}, address) u3 : Rename_Property(name, fullname)

 ππ?NAME,?GENDER ?NAME,?GENDER((

(?X, (?X,type, type,Actor) Actor)AND AND (?X, (?X, name, fullname, fullname, ?NAME) ?NAME) ?NAME) AND AND AND (?X, (?X,gender, personal_info, ?GENDER)) ?GENDER)) 14 /17

Minimally-Generalized Rewritings Definition ( Generalized Query )

Let q expressed using O1, qGEN is a generalized query of q over EO1, O2 iff: • q qGEN •does not exist u in EO1, O2 such that u ◊ qGEN. Definition ( Minimally-Generalized Query ) A generalized query qGEN of q over EO1, O2 is called minimal if there is not qGEN˄ such that q qGEN˄ and qGEN˄ qGEN Proposition A minimally-generalized query of q over EO1, O2 can be computed in O(a*n*s) ( a = #affecting change operations, n = #change operations, s =#query subgoals).

15 /17

Conclusions  Ontology evolution is reality and data integration systems should be aware of this  We show how to answer queries over evolving ontologies without mapping

1. 2. 3. 4. 5.

 16 of 37

redefinition We use high-level changes to model ontology evolution We interpret them as sound GAV mappings We expand queries to consider constraints from ontology We valid rewrite (via unfolding) queries to consider GAV mappings When equivalent rewritings cannot be produced 1. We provide best “over-approximations”  Minimally-Containing rewritings  Minimally-Generalized rewritings 2. Guiding query redefinition To the best of our knowledge no other system today is capable of query answering over multiple ontology versionss

16 /17

Questions?

Suggest Documents