Postgraduate University: Universidad Politécnica de Madrid. ⫠Research stays: ⢠National Research ... Facebook Grap
ToC n Introduction n Linked Data n Iterative and Incremental Linked Data Life cycle n Q&A
INTERNAL USE ONLY
1
Copyright 2017 FUJITSU LIMITED
Personal Introduction http://boris.villazon.terrazas.name/ n Undergraduate University: Universidad Católica Boliviana n Postgraduate University: Universidad Politécnica de Madrid n Research stays: • National Research Council, Rome, Italy • Digital Enterprise Research Institute, NUI, Galway, Ireland • The University of Aberdeen, Scotland, UK
n Ontology Engineering Group n iSOCO, Intelligent Software Components n Expert System n Fujitsu Laboratories of Europe n Tetuan Valley Startup School INTERNAL USE ONLY
2
Copyright 2017 FUJITSU LIMITED
Main references Pan, Jeff et al. (Ed) Exploiting Linked Data and Knowledge Graphs in Large Organisations - Knowledge Graph Foundations Boris Villazón Terrazas, Nuria García-Santa, Yuan Ren - Construction of Enterprise Knowledge Graphs Boris Villazón Terrazas, Nuria García-Santa, Yuan Ren
Best Practices for Publishing Linked Data W3C Working Group Note Bernadette Hyland, Ghislain Atemezing, Boris Villazón Terrazas
Wood, David (Ed) Linking Government Data Methodological Guidelines for Publishing Government Linked Data Boris Villazón Terrazas, Luis M. Vilches, Oscar Corcho, Asunción Gómez-Pérez
INTERNAL USE ONLY
3
Copyright 2017 FUJITSU LIMITED
Introduction
4
Copyright 2017 FUJITSU LIMITED
Knowledge Graphs n Knowledge in a graph form n Captures entities, attributes, and relationships n Nodes are entities n Nodes are labeled with attributes n Typed edges between two nodes represent a relationship between entities
Edmundo Paz-Soldán Iris
bornIn
Cochabamba
INTERNAL USE ONLY
5
Copyright 2017 FUJITSU LIMITED
Google n The Knowledge Graph is a knowledge base to enhance its search engine's search results with semantic-search information gathered from a wide variety of sources.
https://www.google.com/intl/es419/insidesearch/features/search/knowledge.html INTERNAL USE ONLY
6
Copyright 2017 FUJITSU LIMITED
Microsoft Concept Graph n What we need in order to act like a human is nothing more than knowledge about concepts (e.g., persons and animals) and the ability to conceptualize (e.g., cats are animals). Concepts are the glue that holds our mental world together
https://concept.research.microsoft.com/ INTERNAL USE ONLY
7
Copyright 2017 FUJITSU LIMITED
IBM Watson n Watson is a question answering computer system capable of answering questions posed in natural language. There is a knowledge graph under the hood.
https://www.ibm.com/watson/ INTERNAL USE ONLY
8
Copyright 2017 FUJITSU LIMITED
and more n Amazon Product Graph n Facebook Graph API n LinkedIn Knowledge Graph n …
INTERNAL USE ONLY
9
Copyright 2017 FUJITSU LIMITED
Linked Data
10
Copyright 2017 FUJITSU LIMITED
Knowledge Representation n Decades of research into knowledge representation n Some knowledge graph implementations use RDF Triples, the foundations of Linked Data. n Linked Data is one of the core concepts and pillars of the Semantic Web. The Semantic Web is all about making links between datasets understandable not only to humans but also to machines. n Standards for defining and exchanging Knowledge • RDF, RDFa, JSON-LD, schema.org • RDFS, OWL, SKOS, FOAF
INTERNAL USE ONLY
11
Copyright 2017 FUJITSU LIMITED
Linked Data enables a Web of Data Global Identifier: URI (Uniform Resource Identifier), which is a string of characters used to identify a name or a resource on the Internet.
Data Model: RDF (Resource Description Framework), which is a standard model for data interchange on the Web.
Access Mechanism: HTTP Connection: Typed Links
8000000
“Even the Rain”
http://.../population http://.../name http://cia.../Bolivia
http://.../filming_location
CIA World FactBook
http://imdb.../TLLuvia
MovieDB
© Slide adapted from “5 min Introduction to Linked Data”- Olaf Hartig INTERNAL USE ONLY
12
Copyright 2017 FUJITSU LIMITED
LOD Cloud
"Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak http://lod-cloud.net/" INTERNAL USE ONLY
13
Copyright 2017 FUJITSU LIMITED
Resource Description Framework n RDF is a basic Knowledge Representation language based on semantic networks. • Useful to represent metadata and describe any type of information in a machine-accesible way (aka data model) • Resources are described in terms of properties and property values using RDF statement • Statements are represented as triples, consisting of a subject, predicate, and object [S,P,O]
Subject
property
Object
Statement
INTERNAL USE ONLY
14
Copyright 2017 FUJITSU LIMITED
SPARQL n Language developed to allow accessing datasets expressed in RDF.
Application
Application SPARQL queries
SQL queries
Relational DB
INTERNAL USE ONLY
RDF
15
Copyright 2017 FUJITSU LIMITED
RDF Example
“Esteban García” hasName hasColleague
http://uem.es/Esteban
http://uem.es/Enrique
hasColleague http://uem.es/Diego
INTERNAL USE ONLY
hasSex
16
“Male”
Copyright 2017 FUJITSU LIMITED
RDF Example n For practical purposes, specially if handwritten, URIs are shortened using XML namespaces n xmlns:ue=“http://uem.es/” n ue:Esteban is equivalent to http://uem.es/Esteban n RDF serializations: XML, N3, N-Triple, JSON-LD
“Esteban García” person:hasName
ue:Esteban
person:hasColleague
ue:Enrique
person:hasColleague ue:Diego
INTERNAL USE ONLY
“Male” person:hasSex
17
Copyright 2017 FUJITSU LIMITED
RDF - SPARQL “Esteban García” person:hasName
ue:Esteban
person:hasColleague
ue:Enrique
person:hasColleague ue:Diego
“Male” person:hasSex
Query: “Tell me who are the people who have Enrique as colleague” ?
person:hasColleague
ue:Enrique
Result: ue:Esteban and ue:Diego
SPARQL query language for RDF. W3C recommendation SELECT ?s WHERE { ?s person:hasColleague ue:Enrique.} INTERNAL USE ONLY
18
Copyright 2017 FUJITSU LIMITED
Hands on - http://data.semanticweb.org/snorql/ n Information about People SELECT DISTINCT ?person ?p ?o WHERE { ?person a ; ?p ?o } GROUP BY ?person ?p ?o LIMIT 30
n Information about a particular person SELECT DISTINCT ?person ?p ?o WHERE { ?person a ; foaf:firstName "Boris" ; foaf:lastName "Villazon-Terrazas" ; ?p ?o } GROUP BY ?person ?p ?o
n Infomation about people and companies SELECT DISTINCT ?affiliation ?member WHERE { swrc:affiliation ?affiliation . ?affiliation foaf:member ?member } INTERNAL USE ONLY
19
Copyright 2017 FUJITSU LIMITED
Hands on - https://query.wikidata.org/ n Whose birthday is today? SELECT ?entityLabel (YEAR(?date) as ?year) WHERE { BIND(MONTH(NOW()) AS ?nowMonth) BIND(DAY(NOW()) AS ?nowDay) ?entity wdt:P569 ?date . FILTER (MONTH(?date) = ?nowMonth && DAY(?date) = ?nowDay) SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } LIMIT 10
INTERNAL USE ONLY
20
Copyright 2017 FUJITSU LIMITED
Hands on - http://es-la.dbpedia.org/ n Soccer players from Cochabamba SELECT ?label ?goals WHERE { ?player a . ?player . ?player ?goals . ?player rdfs:label ?label } ORDER BY DESC(?goals)
INTERNAL USE ONLY
21
Copyright 2017 FUJITSU LIMITED
Iterative and Incremental Linked Data Life cycle
22
Copyright 2017 FUJITSU LIMITED
Publishing Linked Data
? Process that involves a high number of steps, design decisions and technologies.
INTERNAL USE ONLY
23
Copyright 2017 FUJITSU LIMITED
INTERNAL USE ONLY
24
Copyright 2017 FUJITSU LIMITED
INTERNAL USE ONLY
25
Copyright 2017 FUJITSU LIMITED
Specification n Identification and analysis of the data sources n URI design n Definition of the license
INTERNAL USE ONLY
26
Copyright 2017 FUJITSU LIMITED
Identification and analysis of the data sources After we have identified and selected the data sources n Search and compile all the available data and documentation about those resources n Identify the schema of those resources including conceptual components and their relationships n Identify the items in the domain, i.e., things whose properties and relations are described in the data sources
INTERNAL USE ONLY
27
Copyright 2017 FUJITSU LIMITED
URI Design n Use meaningful URIs, instead of opaque URIs, when possible n Separate TBox (ontology model) from ABox (instances) URIs n Base URI • http://data.gov.bo/ • http://health.data.gov.bo/
n TBox URIs • http://data.gov.bo/ontology/{class|property}
n ABox URIs • http://data.gov.bo/resource/ • http://data.gov.bo/resource/province/Tiraque
INTERNAL USE ONLY
28
Copyright 2017 FUJITSU LIMITED
Definition of the License n Data Licensing is not trivial n Requires technological knowledge n Depends on business strategy and models
n Data is an intellectual asset and can be protected by intellectual property rights n Licenses create a secure business environment n Creative Commons, and Open Data Commons
INTERNAL USE ONLY
29
Copyright 2017 FUJITSU LIMITED
Cáceres - Example
Restaurants
Hotels
INTERNAL USE ONLY
30
Copyright 2017 FUJITSU LIMITED
Cáceres - Example n Base URI http://opendata.caceres.es/
n TBox URIs http://opendata.caceres.es/def/ontomunicipio#{class|property} http://opendata.caceres.es/def/ontomunicipio#Albergue
n ABox URIs http://data.gov.bo/resource/{domain}/{type}/{resourcename} http://opendata.caceres.es/recurso/turismo/alojamiento/Albergue/2-las-veletas
INTERNAL USE ONLY
31
Copyright 2017 FUJITSU LIMITED
Cáceres - Example Licencia para los conjuntos de datos Todos los conjuntos de datos que ofrece el Ayuntamiento de Cáceres, si no se indica lo contrario, se publican bajo los términos de la licencia Creative Commons-Reconocimiento (CC-by 3.0)
INTERNAL USE ONLY
32
Copyright 2017 FUJITSU LIMITED
INTERNAL USE ONLY
33
Copyright 2017 FUJITSU LIMITED
Modelling
Search for suitable vocabularies
are there suitable vocabularies?
Yes
Build the vocabulary by reusing available vocabularies
No
INTERNAL USE ONLY
34
Copyright 2017 FUJITSU LIMITED
Modelling Highly reliable Web Sites Search for suitable non-ontological resources
Domain-related sites
Government Catalogs are there suitable resources?
Yes
Build the vocabulary by transforming available resources
*
No
Build the vocabulary from scratch
Boris Villazón-Terrazas, A Method for Reusing and Re-Engineering Non-Ontological Resources for Building Ontologies. IOS Press 2012 INTERNAL USE ONLY
35
Copyright 2017 FUJITSU LIMITED
Cáceres - Example
INTERNAL USE ONLY
36
Copyright 2017 FUJITSU LIMITED
Cáceres - Example
INTERNAL USE ONLY
37
Copyright 2017 FUJITSU LIMITED
Cáceres - Example n Protégé, free an open source ontology editor and framework for building intelligent systems
INTERNAL USE ONLY
38
Copyright 2017 FUJITSU LIMITED
INTERNAL USE ONLY
39
Copyright 2017 FUJITSU LIMITED
Generation n Transformation n Data cleansing / curation n Linking
INTERNAL USE ONLY
40
Copyright 2017 FUJITSU LIMITED
Transformation n Take the data sources selected in the specification activity and transform them to RDF according to the vocabulary created in the modelling activity n Some tools n CSV and spreadsheets • Open Refine, XLWrap, RDF123
n RDB • R2RML
n XML • GRDDL, ReDeFer
n RML
INTERNAL USE ONLY
41
Copyright 2017 FUJITSU LIMITED
Transformation – RDB2RDF n A majority of dynamic Web content is backed by relational databases (RDB), and so are many enterprise systems.
n W3C RDB2RDF Working Group R2RML: RDB to RDF Mapping Language - http://www.w3.org/TR/r2rml/ Direct Mapping - http://www.w3.org/TR/rdb-direct-mapping/ R2RML and Direct Mapping Test Cases - https://www.w3.org/TR/rdb2rdftest-cases/ RDB2RDF Implementation Report - http://www.w3.org/TR/rdb2rdf-testcases/ INTERNAL USE ONLY
42
Copyright 2017 FUJITSU LIMITED
Transformation – Geospatial to RDF n Tool for generating RDF from geospatial information n The geometry could be available in GML or WKT
https://github.com/boricles/geometry2rdf
INTERNAL USE ONLY
43
Copyright 2017 FUJITSU LIMITED
Transformation – MARC21 to RDF n A MARC Mappings and RDF generator
INTERNAL USE ONLY
44
Copyright 2017 FUJITSU LIMITED
Linking Look for similar datasets in https://datahub.io/
Identify suitable data sets as linking targets
Discover relationships between data items LIMES
Silk Framework
http://aksw.org/Projects/limes
http://silkframework.org/
Validate the relationships discovered
INTERNAL USE ONLY
45
Copyright 2017 FUJITSU LIMITED
Cáceres - Example
Restaurants
INTERNAL USE ONLY
46
Copyright 2017 FUJITSU LIMITED
INTERNAL USE ONLY
47
Copyright 2017 FUJITSU LIMITED
Publication n Dataset publication n Metadata publication
INTERNAL USE ONLY
48
Copyright 2017 FUJITSU LIMITED
Dataset publication n Tools for storing RDF/SPARQL endpoint/Linked Data frontend n Virtuoso Universal Server, Jena Fuseki, Sesame, 4Store, YARS, Apache Marmotta, Pubby, Linked Data API, Linked Data Platform
n Store the RDF data in different graphs n http://example.com/graph/ontology n http://example.com/graph/dataset n http://example.com/graph/links
INTERNAL USE ONLY
49
Copyright 2017 FUJITSU LIMITED
Metadata publication n VoID allows to express metadata about RDF datasets
n The PROV Ontology
http://www.w3.org/TR/void/ http://www.w3.org/TR/prov-o/ INTERNAL USE ONLY
50
Copyright 2017 FUJITSU LIMITED
Cáceres - Example
http://localhost:3030/
https://jena.apache.org/documentation/serving_data/ INTERNAL USE ONLY
51
Copyright 2017 FUJITSU LIMITED
INTERNAL USE ONLY
52
Copyright 2017 FUJITSU LIMITED
Exploitation
Data Analytics
INTERNAL USE ONLY
53
Copyright 2017 FUJITSU LIMITED
Exploitation n Faceted browser interface. n Geospatial visualization using Google Maps and Open Street Maps. n Visualization of geometries (LineStrings, Polygons, etc) when using the GeoLinkedData data model.
SPARQL
map4rdf
Triplestore http://oeg-dev.dia.fi.upm.es/map4rdf/ INTERNAL USE ONLY
54
Copyright 2017 FUJITSU LIMITED
Cáceres - Example
INTERNAL USE ONLY
55
Copyright 2017 FUJITSU LIMITED
Cáceres - Example
INTERNAL USE ONLY
56
Copyright 2017 FUJITSU LIMITED
Use cases http://linkeddata.es/
57
Copyright 2017 FUJITSU LIMITED
http://geo.linkeddata.es/
INTERNAL USE ONLY
Maintenance
58
Copyright 2017 FUJITSU LIMITED
http://aemet.linkeddata.es/
INTERNAL USE ONLY
59
Copyright 2017 FUJITSU LIMITED
http://datos.bne.es
INTERNAL USE ONLY
60
Copyright 2017 FUJITSU LIMITED
Questions
61
Copyright 2017 FUJITSU LIMITED
http://boris.villazon.terrazas.name/ n @boricles n https://www.linkedin.com/in/boris-villazon-terrazas-01a6015 n http://slideshare.net/boricles n https://github.com/boricles n https://scholar.google.com/citations? hl=en&user=N0NWnc0AAAAJ&view_op=list_works&sortby=pu bdate
INTERNAL USE ONLY
62
Copyright 2017 FUJITSU LIMITED