Cartography in Linked Data Cloud

10 downloads 0 Views 1MB Size Report
Linked Data Cloud. ▷ Interconnected datasets published on the Web by principles defined by Tim Berners-Lee (2006). ▷ Principles (5-star ranking system).
Cartography in Linked Data Cloud Otakar Čerba Department of Geomatics, University of West Bohemia, Plzeň, Czech Republic

Linked Data Cloud

Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/

Linked Data Cloud

I I

Interconnected datasets published on the Web by principles defined by Tim Berners-Lee (2006) Principles (5-star ranking system) I I I I I

Open license Structured data Open and non-proprietary format RDF format / URI as identifiers Links

Linked Data Cloud

I

Two types of datasets I I

I

Data (exchange format) – GeoNames.org, LinkedGeoData Semantic resources (vocabularies, ontologies. . . ) – GEMET, EuroVoc

Combination of data and vocabularies – DBpedia, Wikidata, AGROVOC

Research questions

I I I I

What is the position of the concept “cartography” in the LOD space? Is “cartography” integrated into LOD space? Do LOD resources provide relevant information on cartography? Are instances (aliases) of “cartography” interconnected?

Research method

I

Automated searching of resources containing the concept “cartography” I

I

Generating of Data Network I I

I

To find aliases of the concept “cartography” – “Follow Your Nose” approach (exploration of identity links) Nodes – LOD resources Edges – Identity links connected aliases of one term

Testing I I

descriptive properties → semantic information graph properties → information on interlinking

“Cartography” Data Network AV

ST

EN

GS

DN

FB

EA

DB

GM

WD

CY

AA

ND

LA

NL

UM

BC

BF

Green – OK; Yellow – syntactic error; Red – technical error

CT

EV

“Cartography” Data Network – Statistics

I I I I

20 data resources 8 (40%) datasets without any error 6 (30%) datasets with a syntactic error – sink nodes in the data network 6 (30%) resources with a technical error (data is not used in following analysis) – isolated nodes in the data network

“Cartography” – Explicit semantics

I

Resources with a definition or a description – 8 (40%) I I

I

Broader terms of “Cartography” I I

I

Any identical or similar texts Non-professional definitions “Geography”, “Earth sciences”, “Mathematical geography” Dependency on national conventions

Narrower terms I I I

Types of maps Cartographic disciplines “Scale”

“Cartography” – Explicit semantics

I

Related terms I I I

I

Same as narrower terms “GIS”, “Geography”, “Topography” “Map”, “Atlas”. . .

Resources with the most rich explicit semantics – NL (National Agricultural Library Thesaurus), LA (Library of Congress) and BF (Bibliothèque nationale de France) → librarian institutions on national level

Analysis of interconnections – Metrics (Quantitative parameters) I

Best resource – node properties I

I I I I I

I

Degree centrality – how many times is a resource directly connected to other resources Betweenness centrality – the resources is a “bridge” Authority score – resources reference to the dataset Hub score – a dataset provides many links to other resources Page Rank – advantageous position in the network Multiple-criteria decision analysis (weighted sum)

Overall view – graph properties I

I I

Density – ratio of number of edges and maximum amount of edges Reciprocity – ratio of reciprocal edges (links) Clustering coefficient – number of closed triangles in the graph

Data Network Analyses – Nodes (Resources) Metrics

Top 3 resources

Degree centrality

AV (1), EA (0,73), BC (0,55) WD (1), AV (0,79), DB (0,72) AV (1), GM (0,93), EV (0,93) EA (1), AV (0,56), BC (0,43) LA (1), WD (0,94), DN (0,75)

Betweenness centrality Authority score Hub score Page Rank

All values are normalized.

“Cartography” Data Network – Authority

ND ●

AA ●

WD ●

LA

DN BC ●



DB

ST

AV

● NL

● EA

GM EV

BF ●

Best Resources Containing the Term “Cartography” I I I I

From the view of interconnection to other resources Multiple-criteria decision analysis Weighted sum of previous metrics (+ closeness centrality) Saaty method for weights

Resource AGROVOC DBpedia Wikidata Biblioteca Nazionale Centrale di Firenze Enviromental Applications Reference THe.

Weighted sum 0,33 0,24 0,22 0,21 0,18

Data Network Analyses – Term “Cartography”

I I I

Density – 0,16 → aliases are very poorly interconnected Reciprocity – 0,14 → aliases are not connected by reciprocal links in majority of cases Clustering coefficient – 0,28 → aliases do not constitute distinct closed subsets of resources

“Cartography” Data Network – Clusters NL

GM

EV EA AV

ST

DN DB

BC

WD

ND BF LA

AA

Clusters are interconnected quite well (usually by two or more edges).

Conclusions

I

I

I

“Cartography” represents a stable concept (term), which is published by important LOD resources based on crowdsourcing (DBpedia, Wikidata) as well as by LOD resources provided by public organizations (libraries) The interconnection of aliases is not satisfactory (low density and reciprocity) – data network is resemble a tree structure (instead of a robust network) Explicit semantics should be improved (but it is the common problem of LOD)

Conclusions

I I

Following research – comparison with similar scientific disciplines, study of cartographic terms Using LOD = Opportunity I

I

for cartographers and ICA to promote cartography and have an influence on perception of cartography (through LOD resources edited by volunteers) to see a diversity of cartography (through social role of identity links)

Thank you for attention and questions This publication was supported by the project LO1506 of the Czech Ministry of Education, Youth and Sports.

Suggest Documents