Introduction Linked Data Iterative and Incremental Linked ... - Catecbol

0 downloads 221 Views 9MB Size Report
Postgraduate University: Universidad Politécnica de Madrid. ▫ Research stays: • National Research ... Facebook Grap
ToC n Introduction n Linked Data n Iterative and Incremental Linked Data Life cycle n Q&A

INTERNAL USE ONLY

1

Copyright 2017 FUJITSU LIMITED

Personal Introduction http://boris.villazon.terrazas.name/ n Undergraduate University: Universidad Católica Boliviana n Postgraduate University: Universidad Politécnica de Madrid n Research stays: •  National Research Council, Rome, Italy •  Digital Enterprise Research Institute, NUI, Galway, Ireland •  The University of Aberdeen, Scotland, UK

n Ontology Engineering Group n iSOCO, Intelligent Software Components n Expert System n Fujitsu Laboratories of Europe n Tetuan Valley Startup School INTERNAL USE ONLY

2

Copyright 2017 FUJITSU LIMITED

Main references Pan, Jeff et al. (Ed) Exploiting Linked Data and Knowledge Graphs in Large Organisations - Knowledge Graph Foundations Boris Villazón Terrazas, Nuria García-Santa, Yuan Ren - Construction of Enterprise Knowledge Graphs Boris Villazón Terrazas, Nuria García-Santa, Yuan Ren

Best Practices for Publishing Linked Data W3C Working Group Note Bernadette Hyland, Ghislain Atemezing, Boris Villazón Terrazas

Wood, David (Ed) Linking Government Data Methodological Guidelines for Publishing Government Linked Data Boris Villazón Terrazas, Luis M. Vilches, Oscar Corcho, Asunción Gómez-Pérez

INTERNAL USE ONLY

3

Copyright 2017 FUJITSU LIMITED

Introduction

4

Copyright 2017 FUJITSU LIMITED

Knowledge Graphs n Knowledge in a graph form n Captures entities, attributes, and relationships n Nodes are entities n Nodes are labeled with attributes n Typed edges between two nodes represent a relationship between entities

Edmundo Paz-Soldán Iris

bornIn

Cochabamba

INTERNAL USE ONLY

5

Copyright 2017 FUJITSU LIMITED

Google n The Knowledge Graph is a knowledge base to enhance its search engine's search results with semantic-search information gathered from a wide variety of sources.

https://www.google.com/intl/es419/insidesearch/features/search/knowledge.html INTERNAL USE ONLY

6

Copyright 2017 FUJITSU LIMITED

Microsoft Concept Graph n What we need in order to act like a human is nothing more than knowledge about concepts (e.g., persons and animals) and the ability to conceptualize (e.g., cats are animals). Concepts are the glue that holds our mental world together

https://concept.research.microsoft.com/ INTERNAL USE ONLY

7

Copyright 2017 FUJITSU LIMITED

IBM Watson n Watson is a question answering computer system capable of answering questions posed in natural language. There is a knowledge graph under the hood.

https://www.ibm.com/watson/ INTERNAL USE ONLY

8

Copyright 2017 FUJITSU LIMITED

and more n Amazon Product Graph n Facebook Graph API n LinkedIn Knowledge Graph n …

INTERNAL USE ONLY

9

Copyright 2017 FUJITSU LIMITED

Linked Data

10

Copyright 2017 FUJITSU LIMITED

Knowledge Representation n Decades of research into knowledge representation n Some knowledge graph implementations use RDF Triples, the foundations of Linked Data. n Linked Data is one of the core concepts and pillars of the Semantic Web. The Semantic Web is all about making links between datasets understandable not only to humans but also to machines. n Standards for defining and exchanging Knowledge •  RDF, RDFa, JSON-LD, schema.org •  RDFS, OWL, SKOS, FOAF

INTERNAL USE ONLY

11

Copyright 2017 FUJITSU LIMITED

Linked Data enables a Web of Data Global Identifier: URI (Uniform Resource Identifier), which is a string of characters used to identify a name or a resource on the Internet.

Data Model: RDF (Resource Description Framework), which is a standard model for data interchange on the Web.

Access Mechanism: HTTP Connection: Typed Links

8000000

“Even the Rain”

http://.../population http://.../name http://cia.../Bolivia

http://.../filming_location

CIA World FactBook

http://imdb.../TLLuvia

MovieDB

© Slide adapted from “5 min Introduction to Linked Data”- Olaf Hartig INTERNAL USE ONLY

12

Copyright 2017 FUJITSU LIMITED

LOD Cloud

"Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak http://lod-cloud.net/" INTERNAL USE ONLY

13

Copyright 2017 FUJITSU LIMITED

Resource Description Framework n RDF is a basic Knowledge Representation language based on semantic networks. •  Useful to represent metadata and describe any type of information in a machine-accesible way (aka data model) •  Resources are described in terms of properties and property values using RDF statement •  Statements are represented as triples, consisting of a subject, predicate, and object [S,P,O]

Subject

property

Object

Statement

INTERNAL USE ONLY

14

Copyright 2017 FUJITSU LIMITED

SPARQL n Language developed to allow accessing datasets expressed in RDF.

Application

Application SPARQL queries

SQL queries

Relational DB

INTERNAL USE ONLY

RDF

15

Copyright 2017 FUJITSU LIMITED

RDF Example

“Esteban García” hasName hasColleague

http://uem.es/Esteban

http://uem.es/Enrique

hasColleague http://uem.es/Diego

INTERNAL USE ONLY

hasSex

16

“Male”

Copyright 2017 FUJITSU LIMITED

RDF Example n For practical purposes, specially if handwritten, URIs are shortened using XML namespaces n xmlns:ue=“http://uem.es/” n ue:Esteban is equivalent to http://uem.es/Esteban n RDF serializations: XML, N3, N-Triple, JSON-LD

“Esteban García” person:hasName

ue:Esteban

person:hasColleague

ue:Enrique

person:hasColleague ue:Diego

INTERNAL USE ONLY

“Male” person:hasSex

17

Copyright 2017 FUJITSU LIMITED

RDF - SPARQL “Esteban García” person:hasName

ue:Esteban

person:hasColleague

ue:Enrique

person:hasColleague ue:Diego

“Male” person:hasSex

Query: “Tell me who are the people who have Enrique as colleague” ?

person:hasColleague

ue:Enrique

Result: ue:Esteban and ue:Diego

SPARQL query language for RDF. W3C recommendation SELECT ?s WHERE { ?s person:hasColleague ue:Enrique.} INTERNAL USE ONLY

18

Copyright 2017 FUJITSU LIMITED

Hands on - http://data.semanticweb.org/snorql/ n Information about People SELECT DISTINCT ?person ?p ?o WHERE { ?person a ; ?p ?o } GROUP BY ?person ?p ?o LIMIT 30

n Information about a particular person SELECT DISTINCT ?person ?p ?o WHERE { ?person a ; foaf:firstName "Boris" ; foaf:lastName "Villazon-Terrazas" ; ?p ?o } GROUP BY ?person ?p ?o

n Infomation about people and companies SELECT DISTINCT ?affiliation ?member WHERE { swrc:affiliation ?affiliation . ?affiliation foaf:member ?member } INTERNAL USE ONLY

19

Copyright 2017 FUJITSU LIMITED

Hands on - https://query.wikidata.org/ n Whose birthday is today? SELECT ?entityLabel (YEAR(?date) as ?year) WHERE { BIND(MONTH(NOW()) AS ?nowMonth) BIND(DAY(NOW()) AS ?nowDay) ?entity wdt:P569 ?date . FILTER (MONTH(?date) = ?nowMonth && DAY(?date) = ?nowDay) SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } LIMIT 10

INTERNAL USE ONLY

20

Copyright 2017 FUJITSU LIMITED

Hands on - http://es-la.dbpedia.org/ n Soccer players from Cochabamba SELECT ?label ?goals WHERE { ?player a . ?player . ?player ?goals . ?player rdfs:label ?label } ORDER BY DESC(?goals)

INTERNAL USE ONLY

21

Copyright 2017 FUJITSU LIMITED

Iterative and Incremental Linked Data Life cycle

22

Copyright 2017 FUJITSU LIMITED

Publishing Linked Data

? Process that involves a high number of steps, design decisions and technologies.

INTERNAL USE ONLY

23

Copyright 2017 FUJITSU LIMITED

INTERNAL USE ONLY

24

Copyright 2017 FUJITSU LIMITED

INTERNAL USE ONLY

25

Copyright 2017 FUJITSU LIMITED

Specification n Identification and analysis of the data sources n URI design n Definition of the license

INTERNAL USE ONLY

26

Copyright 2017 FUJITSU LIMITED

Identification and analysis of the data sources After we have identified and selected the data sources n Search and compile all the available data and documentation about those resources n Identify the schema of those resources including conceptual components and their relationships n Identify the items in the domain, i.e., things whose properties and relations are described in the data sources

INTERNAL USE ONLY

27

Copyright 2017 FUJITSU LIMITED

URI Design n Use meaningful URIs, instead of opaque URIs, when possible n Separate TBox (ontology model) from ABox (instances) URIs n Base URI •  http://data.gov.bo/ •  http://health.data.gov.bo/

n TBox URIs •  http://data.gov.bo/ontology/{class|property}

n ABox URIs •  http://data.gov.bo/resource/ •  http://data.gov.bo/resource/province/Tiraque

INTERNAL USE ONLY

28

Copyright 2017 FUJITSU LIMITED

Definition of the License n Data Licensing is not trivial n Requires technological knowledge n Depends on business strategy and models

n Data is an intellectual asset and can be protected by intellectual property rights n Licenses create a secure business environment n Creative Commons, and Open Data Commons

INTERNAL USE ONLY

29

Copyright 2017 FUJITSU LIMITED

Cáceres - Example

Restaurants

Hotels

INTERNAL USE ONLY

30

Copyright 2017 FUJITSU LIMITED

Cáceres - Example n  Base URI http://opendata.caceres.es/

n  TBox URIs http://opendata.caceres.es/def/ontomunicipio#{class|property} http://opendata.caceres.es/def/ontomunicipio#Albergue

n  ABox URIs http://data.gov.bo/resource/{domain}/{type}/{resourcename} http://opendata.caceres.es/recurso/turismo/alojamiento/Albergue/2-las-veletas

INTERNAL USE ONLY

31

Copyright 2017 FUJITSU LIMITED

Cáceres - Example Licencia para los conjuntos de datos Todos los conjuntos de datos que ofrece el Ayuntamiento de Cáceres, si no se indica lo contrario, se publican bajo los términos de la licencia Creative Commons-Reconocimiento (CC-by 3.0)

INTERNAL USE ONLY

32

Copyright 2017 FUJITSU LIMITED

INTERNAL USE ONLY

33

Copyright 2017 FUJITSU LIMITED

Modelling

Search for suitable vocabularies

are there suitable vocabularies?

Yes

Build the vocabulary by reusing available vocabularies

No

INTERNAL USE ONLY

34

Copyright 2017 FUJITSU LIMITED

Modelling Highly reliable Web Sites Search for suitable non-ontological resources

Domain-related sites

Government Catalogs are there suitable resources?

Yes

Build the vocabulary by transforming available resources

*

No

Build the vocabulary from scratch

Boris Villazón-Terrazas, A Method for Reusing and Re-Engineering Non-Ontological Resources for Building Ontologies. IOS Press 2012 INTERNAL USE ONLY

35

Copyright 2017 FUJITSU LIMITED

Cáceres - Example

INTERNAL USE ONLY

36

Copyright 2017 FUJITSU LIMITED

Cáceres - Example

INTERNAL USE ONLY

37

Copyright 2017 FUJITSU LIMITED

Cáceres - Example n Protégé, free an open source ontology editor and framework for building intelligent systems

INTERNAL USE ONLY

38

Copyright 2017 FUJITSU LIMITED

INTERNAL USE ONLY

39

Copyright 2017 FUJITSU LIMITED

Generation n Transformation n Data cleansing / curation n Linking

INTERNAL USE ONLY

40

Copyright 2017 FUJITSU LIMITED

Transformation n Take the data sources selected in the specification activity and transform them to RDF according to the vocabulary created in the modelling activity n Some tools n CSV and spreadsheets •  Open Refine, XLWrap, RDF123

n RDB •  R2RML

n XML •  GRDDL, ReDeFer

n RML

INTERNAL USE ONLY

41

Copyright 2017 FUJITSU LIMITED

Transformation – RDB2RDF n A majority of dynamic Web content is backed by relational databases (RDB), and so are many enterprise systems.

n W3C RDB2RDF Working Group R2RML: RDB to RDF Mapping Language - http://www.w3.org/TR/r2rml/ Direct Mapping - http://www.w3.org/TR/rdb-direct-mapping/ R2RML and Direct Mapping Test Cases - https://www.w3.org/TR/rdb2rdftest-cases/ RDB2RDF Implementation Report - http://www.w3.org/TR/rdb2rdf-testcases/ INTERNAL USE ONLY

42

Copyright 2017 FUJITSU LIMITED

Transformation – Geospatial to RDF n Tool for generating RDF from geospatial information n The geometry could be available in GML or WKT

https://github.com/boricles/geometry2rdf

INTERNAL USE ONLY

43

Copyright 2017 FUJITSU LIMITED

Transformation – MARC21 to RDF n A MARC Mappings and RDF generator

INTERNAL USE ONLY

44

Copyright 2017 FUJITSU LIMITED

Linking Look for similar datasets in https://datahub.io/

Identify suitable data sets as linking targets

Discover relationships between data items LIMES

Silk Framework

http://aksw.org/Projects/limes

http://silkframework.org/

Validate the relationships discovered

INTERNAL USE ONLY

45

Copyright 2017 FUJITSU LIMITED

Cáceres - Example

Restaurants

INTERNAL USE ONLY

46

Copyright 2017 FUJITSU LIMITED

INTERNAL USE ONLY

47

Copyright 2017 FUJITSU LIMITED

Publication n Dataset publication n Metadata publication

INTERNAL USE ONLY

48

Copyright 2017 FUJITSU LIMITED

Dataset publication n Tools for storing RDF/SPARQL endpoint/Linked Data frontend n Virtuoso Universal Server, Jena Fuseki, Sesame, 4Store, YARS, Apache Marmotta, Pubby, Linked Data API, Linked Data Platform

n Store the RDF data in different graphs n http://example.com/graph/ontology n http://example.com/graph/dataset n http://example.com/graph/links

INTERNAL USE ONLY

49

Copyright 2017 FUJITSU LIMITED

Metadata publication n  VoID allows to express metadata about RDF datasets

n  The PROV Ontology

http://www.w3.org/TR/void/ http://www.w3.org/TR/prov-o/ INTERNAL USE ONLY

50

Copyright 2017 FUJITSU LIMITED

Cáceres - Example

http://localhost:3030/

https://jena.apache.org/documentation/serving_data/ INTERNAL USE ONLY

51

Copyright 2017 FUJITSU LIMITED

INTERNAL USE ONLY

52

Copyright 2017 FUJITSU LIMITED

Exploitation

Data Analytics

INTERNAL USE ONLY

53

Copyright 2017 FUJITSU LIMITED

Exploitation n Faceted browser interface. n Geospatial visualization using Google Maps and Open Street Maps. n Visualization of geometries (LineStrings, Polygons, etc) when using the GeoLinkedData data model.

SPARQL

map4rdf

Triplestore http://oeg-dev.dia.fi.upm.es/map4rdf/ INTERNAL USE ONLY

54

Copyright 2017 FUJITSU LIMITED

Cáceres - Example

INTERNAL USE ONLY

55

Copyright 2017 FUJITSU LIMITED

Cáceres - Example

INTERNAL USE ONLY

56

Copyright 2017 FUJITSU LIMITED

Use cases http://linkeddata.es/

57

Copyright 2017 FUJITSU LIMITED

http://geo.linkeddata.es/

INTERNAL USE ONLY

Maintenance

58

Copyright 2017 FUJITSU LIMITED

http://aemet.linkeddata.es/

INTERNAL USE ONLY

59

Copyright 2017 FUJITSU LIMITED

http://datos.bne.es

INTERNAL USE ONLY

60

Copyright 2017 FUJITSU LIMITED

Questions

61

Copyright 2017 FUJITSU LIMITED

http://boris.villazon.terrazas.name/ n @boricles n https://www.linkedin.com/in/boris-villazon-terrazas-01a6015 n http://slideshare.net/boricles n https://github.com/boricles n https://scholar.google.com/citations? hl=en&user=N0NWnc0AAAAJ&view_op=list_works&sortby=pu bdate

INTERNAL USE ONLY

62

Copyright 2017 FUJITSU LIMITED