Linked Data and Live Querying for Enabling Support ... - Google Sites

1 downloads 146 Views 876KB Size Report
Linked Data and Live Querying for Enabling. Support Platforms for Web Dataspaces. Jürgen Umbrich1, Marcel Karnstedt1, J
Digital Enterprise Research Institute

www.deri.ie

Linked Data and Live Querying for Enabling Support Platforms for Web Dataspaces Jürgen Umbrich1, Marcel Karnstedt1, Josiane Xavier Parreira1, Axel Polleres2, Manfred Hauswirth1 1DERI, National University of Ireland, Galway, Ireland 2Siemens AG Österreich, Vienna, Austria

© Copyright 2010 Digital Enterprise Research Institute. All rights reserved.

1

Outline Digital Enterprise Research Institute

 Web as a set of interlinked Web dataspaces  Enabling DSSP for Web dataspaces  Linked Data  Missing components  Challenges  Efficient query processing  Challenges  Index consistency study  Hybrid query processing mechanism

www.deri.ie

The World Wide Web Digital Enterprise Research Institute

www.deri.ie

CSV HTML

href http

href

rel

href http

HTML

href href

CSS

HTML

img PNG

PDF

Web of Documents

The World Wide Web Digital Enterprise Research Institute

www.deri.ie

CSV HTML

href http href

rel

CSS

href

href • Unstructured • Heterogeneous http HTML • Data integration href HTML mostly manual img PNG

PDF

Web of Documents

The World Wide Web Digital Enterprise Research Institute

www.deri.ie

RDF CSV HTML

href

href

CSS

rel

href http

HTML

href

HTML

img PNG

PDF

RDF

RDF

Web of Data

The World Wide Web Digital Enterprise Research Institute

www.deri.ie

RDF CSV HTML

CSS

rel

href • Standards href

href

href • URIs as identifiers • Typed links http HTML • Web heterogeneous HTML img distr. DB PNG

PDF

RDF

RDF

Web of Data

Dataspace Support Platforms Digital Enterprise Research Institute

www.deri.ie

[Franklin 2005]

Dataspace Support Platforms Digital Enterprise Research Institute

www.deri.ie

• data management for smallscale loosely connected heterogeneous source • services hide complexity of data management

[Franklin 2005]

Two directions, similar goals Digital Enterprise Research Institute

CSV

www.deri.ie

RDF

CSS HTML

HTML

HTML

PNG PDF

RDF RDF

Web of Data

web-scale heterogeneous distributed database

Dataspace

data management for small-scale loosely connected heterogeneous source

Proposed Solution Digital Enterprise Research Institute

CSV

RDF

www.deri.ie

CSS HTML

HTML

HTML

PNG PDF

RDF RDF

Web of Data

Dataspace

Linked Data for enabling support platforms for Web dataspaces

Web Dataspaces and support platforms

Digital Enterprise Research Institute

www.deri.ie

standards no guarantees

RIF SKOS GRDDL

no central control

RDF SPARQL OWL RDFa

search

catalogs RDF

RDF

CSV

RDF

query indexes

HTML HTTP PDF

REST API

discovery dynamic

incomplete knowledge

heterogeneous administration

enhancement

DSSP -> Linked Data Digital Enterprise Research Institute

www.deri.ie

 Participants/Relationships -> Resources/Links  XML for interchanging data -> RDF  standardised access method common query language -> HTTP/SPARQL  Global keys -> URIs  Discovery -> crawling/reasoning  Integration of other dataspaces -> entity recognition, ontologies

Open Challenges Digital Enterprise Research Institute

www.deri.ie

 Graph-Based Data Model to scale to the size of the Web   Efficient processing methods (index, query)

 Search and Query  Structured queries with keyword search  Ranking (different levels, typed links, trust, etc)  Guarantees:  Full guarantee not possible, assessment of possible guarantees is needed

Query Processing Digital Enterprise Research Institute

www.deri.ie

 Catalogs for query planning/processing  Key component on a DSSP  Linked Data: vocabularies, meta data descriptions as catalogs  Complete Web catalogs not feasible: scale and dynamics  Indexing also affected by dynamics  Distributed query processing approaches  Works for a few number of large repositories  Web of Data: large number of small repositories

Query Processing Digital Enterprise Research Institute

www.deri.ie

 Alternative approach: “live” querying  Link traversal query approaches  Exploit Linked Data principles (dereferenceable URIs)  Guarantee ``live’’ results  Query time in the range of seconds  Our vision: hybrid query processing  Combine offline (static) and online (dynamic) processing  Trade-off between performance/complements/ fresheness

Index Consistency Study Digital Enterprise Research Institute

www.deri.ie

 Two Linked Data Web index (SPARQL endpoints)  Sindice (RDF, RDFa, Microformats, ~ 20 billion triples)  Openlink (LOD cache; ~20 billion triples)  16,616 distinct entity queries  Sampled from the BTC 2011 dataset  Number of entities found and exec. time Web

Sindice

Openlink

Entities found

16616

5007

13096

Avg. query time

3261 ms

136 ms

86 ms

Index Consistency Study Digital Enterprise Research Institute

 Web Recall: % of Web results found in the endpoints

www.deri.ie

Index Consistency Study Digital Enterprise Research Institute

 Web Recall: % of Web results found in the endpoints

Openlink consistent information for 50% of the entities

www.deri.ie

Index Consistency Study Digital Enterprise Research Institute

www.deri.ie

 Web Recall: % of Web results found in the endpoints

Sindice consistent information for 30% of the entities

Index Consistency Study Digital Enterprise Research Institute

 Web Recall: % of Web results found in the endpoints

www.deri.ie

Hybrid Query Model Digital Enterprise Research Institute

www.deri.ie

Linked Data Web

guarantees fresh results

Live query interface

SPARQL query query results

provides fast query times

(sub) query query planner (sub) query

results knowledge of dynamics results

Index interface

Repository Repository

hybrid query engine

query planning guided by dynamic knowledge

Query planning Digital Enterprise Research Institute

 Knowledge of Dynamics  Mining and statistical approaches  Query planner  Incorporate dynamics as cost factor  Latency and availability of sources ?  Selectivity based on statistics or rules ?  Query Execution  Split query into static and dynamic parts  Only update potentially outdates results  Consider user requirements (fresh vs speed)

www.deri.ie

Conclusion Digital Enterprise Research Institute

 Enabling DSSP for Web dataspaces via Linked Data  Common data representation  Standard assess methods, global keys  Still open challenges (e.g. search and query)  Study shows that repositories lack completeness and freshness  Hybrid query processing  Combine offline (static) and online (dynamic) processing  Trade-off between performance/completeness/ fresheness

www.deri.ie

Suggest Documents