Ontology Evolution in Data Integration: Query Rewriting to the Rescue. Haridimos Kondylakis, Dimitris Plexousakis. Computer Science Department, University of ...
Ontology Evolution in Data Integration: Query Rewriting to the Rescue Haridimos Kondylakis, Dimitris Plexousakis Computer Science Department, University of Crete Information Systems Laboratory, FORTH-ICS
Problem Statement
Query
Mappings
Data Integration System
Sub-queries
2 of 37
DB
DB
DB
2 /17
Past Approaches Mapping Adaptation S
move elem
O M1
O1 M2
add elem O2
M3
delete constraint
Mapping Composition S
M
Can use schema mapping tools to construct E.
O E
M’ = M ° E 3 of 37
O’
3 /17
“Everything should be as simple as it is, but not simpler” -Albert Einstein Ontology RDF/S as Ontology global schema
DI System Independent
Modular
More Intuitive
Mappings created only once
Mappings do not change SpaRQL
Data Integration System
Mappings
4 of 37
DB
Can be easily updated SpaRQL
DB
DB
DB
Mappings
DB
DB
4 /17
“Everything that exists, it is only change”
-Heraclitus 535 BCE Definition (High-level Change Operation) A change operation u over O, is any tuple (δa, δd) where δa O = ø δd O A change operation u from O1 to O2 is a change operation over O1 such that: δa O2\ O1 δd O1\ O2
High-level change operations because: Most Important: They contain a smaller number of individual low-level deletions/additions (explained later) The produced change log has smaller size Explanations on failures, are more intuitive and concise. Requirement: EO1,O2 should be complete, non-ambiguous and
unique 5 of 37 Composition and inversion are desirable but not obligatory properties
5 /17
Example Evolution Literal
fullname name
Literal address
Person
Literal ssn Literal
Literal Literal
street has_cont_point
gender Actor
city
Cont. Point
u1 = Delete_Property(gender, ø, ø, ø, ø, Person, xsd:String, ø, ø) u2 = Merge_Properties( {street, city}, address) u3 = Rename_Property(name, fullname) 6 of 37
6 /17
Example Evolution Continued EO1, O2 GAV mappings u1 : Delete_Property(gender, ø, ø, ø, ø, Person, xsd:String, ø, ø) x,y, fullname(x, y) name(x ,y) u2 : Merge_Properties( {street, city}, address) x,y, address(x, y) y1, y2, street(x, y1) ˄ city(x, y2) ˄ concat(y, {y1, y2}) u3 : Rename_Property(name, fullname) ????
EO2, O1 inv(u3):Rename_Property(fullname, name) inv(u2):Split_Property(address, {street, city})
inv(u1):Add_Property(gender, ø, ø, ø, ø, Person, xsd:String, ø, ø)
7 /17
Evolving Data Integration System (EDI) Definition (Evolving Data Integration System)
An evolving data integration system EDI is a tuple of the form (O1, S1, M1), …, (Om, Sm, Mm) where •Oi is a version of the ontology (1≤ i ≤m). •Si is a set of the local sources (1≤ i ≤m). •Mi is the mapping between Si and Oi (1≤ i ≤m). Om
O1 M1 S11
… S12
Om
Oi S13
… Mm
Mi Si1
Sm1
Sm1
8 /17
Query Processing Queries to EDI are posed in terms of the Om In this work we do not consider OPT and FILTER operations. The remaining fragment corresponds to union of conjunctive queries (Perez &
Arenas, 2009). select ?SSN ?NAME ?ADDRESS
where { ?X
?Y }
type Person; ssn ?SSN; fullname ?NAME; has_contact_point ?Y. type Cont.Point; address ?ADDRESS.
π?SSN,?NAME,?ADDRESS ( (?X, type, Person) AND (?X, ssn, ?SSN) AND (?X, fullname, ?NAME) AND (?X, has_contact_point, ?Y) AND (?Y, type, Cont.Point) AND (?Y, address, ?ADDRESS) )
9 /17
Person
Query Expansion QuOnto Reasoner
(Poggi, 2008) is used for query expansion. Automatically identifies constraints
in RDF/S ontologies.
Time complexity:
O(m*(s+1)2)s m=#of ontology elements s=#of query subgoals
q Expander
exp(q)
Actor
π?SSN,?NAME,?ADDRESS ( (?X, type, Person) AND (?X, ssn, ?SSN) AND (?X, fullname, ?NAME) AND (?X, has_contact_point, ?Y) AND (?Y, type, Cont.Point) AND (?Y, address, ?ADDRESS)) UNION π?SSN,?NAME,?ADDRESS ( (?X, type, Actor) AND (?X, ssn, ?SSN) AND (?X, fullname, ?NAME) AND (?X, has_contact_point, ?Y) AND (?Y, type, Cont.Point) AND (?Y, address, ?ADDRESS)) 10 /17
Valid Rewriting Since GAV mappings are
used, rewriting is performed using unfolding Time Complexity: O( q*n*s) q = #expanded queries n = #change operations s = #subgoals in q
x, y, fullname(x, y) name(x , y)
x, y, address(x, y) a,b, street(x, a) ˄ city(x, b) ˄ concat(y, {a,b})
( π?SSN,?NAME,?ADDRESS ( ?SSN,?NAME,?ADDRESS ππ?SSN,?NAME, ?STREET, ?CITY ( (?X, type, Person) AND type, Person) AND (?X, (?X, type, Person) AND (?X, ssn, ?SSN) AND ?SSN) AND (?X, (?X, ssn,ssn, ?SSN) AND (?X, name, ?NAME) AND fullname, ?NAME) (?X, name, ?NAME) ANDAND (?X, name, ?NAME) AND (?X, has_contact_point, ?Y)AND AND has_contact_point, AND (?X, has_cont.point, ?Y) (?X, has_contact_point, ?Y)?Y) AND (?Y, type, Cont.Point) AND type, Cont.Point) AND AND (?Y,(?Y, type, Cont.Point) AND (?Y, street, ?STREET) AND address, ?ADDRESS)) (?Y,(?Y,street, ?STREET) AND (?Y, city, ?CITY) AND (?Y,UNION city, ?CITY)) πconcat(?ADDRESS, ( {?CITY, ?STREET}) ?SSN,?NAME,?ADDRESS π?SSN,?NAME,?STREET, ( ?CITY π?SSN,?NAME,?ADDRESS (?X, type,(Actor) AND (?X, type, Actor) AND (?X, AND (?X,type, ssn,Actor) ?SSN) AND (?X, ssn, ?SSN) (?X, ?SSN)AND AND fullname, ?NAME) AND (?X,ssn, name, ?NAME) AND (?X, name, ?NAME) AND (?X, ?NAME) AND?Y) AND (?X,name, has_contact_point, (?X, has_contact_point, ?Y) (?X, ?Y)AND AND (?Y,has_contact_point, type, Cont.Point) AND (?Y, type, Cont.Point) AND (?Y, Cont.Point) AND (?Y,type, address, ?ADDRESS)) (?Y, street, ?STREET) AND (?Y, street, ?STREET) AND (?Y, (?Y,city, city, ?CITY)) ?CITY) AND
concat(?ADDRESS, {?CITY, ?STREET})
11 /17
Problems & Solutions π ?NAME,?GENDER (
(?X, type, Actor) AND name, ?NAME) AND (?X, fullname, ?NAME) AND (?X, gender, ?GENDER))
u1 : Delete_Property(gender, ø, ø, ø, ø, Person , xsd:String, ø, ø) u2 : Merge_Properties( {street, city}, address) u3 : Rename_Property(name, fullname)
Definition ( Affecting change operation )
A change operation u affects the conjunctive query q expressed using terms from O1, (u ◊ q) iff • δa(u)=ø • triple pattern t q that can be unified with a triple of δd(u).
12 /17
Minimally-Containing Rewritings Definition ( Minimally-Containing Rewriting) A query q˄ is a minimally-containing rewriting of a conjunctive query q using a set of mappings (views) M if and only if (1) q˄ is a containing rewriting of q (q q˄) and (1) there exists no containing rewriting q˄˄ of q using M, such that the expansion of q˄˄ contains the expansion of q˄. qcontaining Answers to q
Proposition The minimally-containing rewriting of a conjunctive query q over EO1, O2 can be computed in O(n*s) (n=#change operations, s=#query subgoals ). 13 /17
Minimally-Generalized Rewritings Literal
name
EO1, O2
Literal Person
Literal
ssn
street
personal_info
has_cont_point gender
Literal
Literal
Actor
Literal city
Cont. Point
u1 : Delete_Property(gender, ø, ø, ø, ø, Person , xsd:String, ø, ø) u2 : Merge_Properties( {street, city}, address) u3 : Rename_Property(name, fullname)
ππ?NAME,?GENDER ?NAME,?GENDER((
(?X, (?X,type, type,Actor) Actor)AND AND (?X, (?X, name, fullname, fullname, ?NAME) ?NAME) ?NAME) AND AND AND (?X, (?X,gender, personal_info, ?GENDER)) ?GENDER)) 14 /17
Minimally-Generalized Rewritings Definition ( Generalized Query )
Let q expressed using O1, qGEN is a generalized query of q over EO1, O2 iff: • q qGEN •does not exist u in EO1, O2 such that u ◊ qGEN. Definition ( Minimally-Generalized Query ) A generalized query qGEN of q over EO1, O2 is called minimal if there is not qGEN˄ such that q qGEN˄ and qGEN˄ qGEN Proposition A minimally-generalized query of q over EO1, O2 can be computed in O(a*n*s) ( a = #affecting change operations, n = #change operations, s =#query subgoals).
15 /17
Conclusions Ontology evolution is reality and data integration systems should be aware of this We show how to answer queries over evolving ontologies without mapping
1. 2. 3. 4. 5.
16 of 37
redefinition We use high-level changes to model ontology evolution We interpret them as sound GAV mappings We expand queries to consider constraints from ontology We valid rewrite (via unfolding) queries to consider GAV mappings When equivalent rewritings cannot be produced 1. We provide best “over-approximations” Minimally-Containing rewritings Minimally-Generalized rewritings 2. Guiding query redefinition To the best of our knowledge no other system today is capable of query answering over multiple ontology versionss
16 /17
Questions?