Linked Data Cloud. â· Interconnected datasets published on the Web by principles defined by Tim Berners-Lee (2006). â· Principles (5-star ranking system).
Cartography in Linked Data Cloud Otakar Čerba Department of Geomatics, University of West Bohemia, Plzeň, Czech Republic
Linked Data Cloud
Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/
Linked Data Cloud
I I
Interconnected datasets published on the Web by principles defined by Tim Berners-Lee (2006) Principles (5-star ranking system) I I I I I
Open license Structured data Open and non-proprietary format RDF format / URI as identifiers Links
Linked Data Cloud
I
Two types of datasets I I
I
Data (exchange format) – GeoNames.org, LinkedGeoData Semantic resources (vocabularies, ontologies. . . ) – GEMET, EuroVoc
Combination of data and vocabularies – DBpedia, Wikidata, AGROVOC
Research questions
I I I I
What is the position of the concept “cartography” in the LOD space? Is “cartography” integrated into LOD space? Do LOD resources provide relevant information on cartography? Are instances (aliases) of “cartography” interconnected?
Research method
I
Automated searching of resources containing the concept “cartography” I
I
Generating of Data Network I I
I
To find aliases of the concept “cartography” – “Follow Your Nose” approach (exploration of identity links) Nodes – LOD resources Edges – Identity links connected aliases of one term
Testing I I
descriptive properties → semantic information graph properties → information on interlinking
“Cartography” Data Network AV
ST
EN
GS
DN
FB
EA
DB
GM
WD
CY
AA
ND
LA
NL
UM
BC
BF
Green – OK; Yellow – syntactic error; Red – technical error
CT
EV
“Cartography” Data Network – Statistics
I I I I
20 data resources 8 (40%) datasets without any error 6 (30%) datasets with a syntactic error – sink nodes in the data network 6 (30%) resources with a technical error (data is not used in following analysis) – isolated nodes in the data network
“Cartography” – Explicit semantics
I
Resources with a definition or a description – 8 (40%) I I
I
Broader terms of “Cartography” I I
I
Any identical or similar texts Non-professional definitions “Geography”, “Earth sciences”, “Mathematical geography” Dependency on national conventions
Narrower terms I I I
Types of maps Cartographic disciplines “Scale”
“Cartography” – Explicit semantics
I
Related terms I I I
I
Same as narrower terms “GIS”, “Geography”, “Topography” “Map”, “Atlas”. . .
Resources with the most rich explicit semantics – NL (National Agricultural Library Thesaurus), LA (Library of Congress) and BF (Bibliothèque nationale de France) → librarian institutions on national level
Analysis of interconnections – Metrics (Quantitative parameters) I
Best resource – node properties I
I I I I I
I
Degree centrality – how many times is a resource directly connected to other resources Betweenness centrality – the resources is a “bridge” Authority score – resources reference to the dataset Hub score – a dataset provides many links to other resources Page Rank – advantageous position in the network Multiple-criteria decision analysis (weighted sum)
Overall view – graph properties I
I I
Density – ratio of number of edges and maximum amount of edges Reciprocity – ratio of reciprocal edges (links) Clustering coefficient – number of closed triangles in the graph
Data Network Analyses – Nodes (Resources) Metrics
Top 3 resources
Degree centrality
AV (1), EA (0,73), BC (0,55) WD (1), AV (0,79), DB (0,72) AV (1), GM (0,93), EV (0,93) EA (1), AV (0,56), BC (0,43) LA (1), WD (0,94), DN (0,75)
Betweenness centrality Authority score Hub score Page Rank
All values are normalized.
“Cartography” Data Network – Authority
ND ●
AA ●
WD ●
LA
DN BC ●
●
DB
ST
AV
● NL
● EA
GM EV
BF ●
Best Resources Containing the Term “Cartography” I I I I
From the view of interconnection to other resources Multiple-criteria decision analysis Weighted sum of previous metrics (+ closeness centrality) Saaty method for weights
Resource AGROVOC DBpedia Wikidata Biblioteca Nazionale Centrale di Firenze Enviromental Applications Reference THe.
Weighted sum 0,33 0,24 0,22 0,21 0,18
Data Network Analyses – Term “Cartography”
I I I
Density – 0,16 → aliases are very poorly interconnected Reciprocity – 0,14 → aliases are not connected by reciprocal links in majority of cases Clustering coefficient – 0,28 → aliases do not constitute distinct closed subsets of resources
“Cartography” Data Network – Clusters NL
GM
EV EA AV
ST
DN DB
BC
WD
ND BF LA
AA
Clusters are interconnected quite well (usually by two or more edges).
Conclusions
I
I
I
“Cartography” represents a stable concept (term), which is published by important LOD resources based on crowdsourcing (DBpedia, Wikidata) as well as by LOD resources provided by public organizations (libraries) The interconnection of aliases is not satisfactory (low density and reciprocity) – data network is resemble a tree structure (instead of a robust network) Explicit semantics should be improved (but it is the common problem of LOD)
Conclusions
I I
Following research – comparison with similar scientific disciplines, study of cartographic terms Using LOD = Opportunity I
I
for cartographers and ICA to promote cartography and have an influence on perception of cartography (through LOD resources edited by volunteers) to see a diversity of cartography (through social role of identity links)
Thank you for attention and questions This publication was supported by the project LO1506 of the Czech Ministry of Education, Youth and Sports.