Example of Data Cube Vocabulary

5 downloads 200 Views 1MB Size Report
/webapp/extensions. Page 5. Step 3 – Create Project o Double click on google-refine application o New tab will open in
Publishing Statistical Data file to RDF with Google Refine

George Papastefanatos Irene Petrou

Institute for the Management of Information Systems RC “Athena”

Open Data Day 2013@ Athens

Introduction o Use

Google Refine what for?

oFor unifying and cleaning data oFor converting data to RDF

Step 1 – Download Data o Download 

excel file with data from:

http://www.statistics.gr/portal/page/portal/ESYE/BUC KET/General/resident_population_census2011.xls

Step 2 – Download & Install Google Refine o Download

Google Refine Zip File from:

http://code.google.com/p/google-ref ine/wiki/Downloads?tm=2

o Extract

zip file o Download RDF Refine Extension - a Google Refine extension for exporting RDF from: o http://refine.deri.ie/ o Extract the zip file to the folder /webapp/extensions

Step 3 – Create Project o Double

click on google-refine application o New tab will open in browser with google refine workspace o Click on Create Project o Click on Get Data From This Computer and choose excel file “resident_population_census2011.xls” o Click Next

CLEAN MESSY DATA

Step 4 – Unifying and Cleaning Data Click on the Worksheets you wish to Import: In this example is only one o To remove unwanted empty rows and cells o

o Uncheck Store blank rows o Uncheck Store blank cells as nulls

Type ignore first 1 line(s) at the beginning of file to remove unwanted rows o Parse next 1 line(s) as column headers to indicate the headers of our data columns o Click on Create Project o

Step 4 – Unifying and Cleaning Data    

We can see that 20995 rows were imported to our project Click on Show 50 rows to view more data rows Still we need to remove some unwanted columns to get the raw data we need Remove columns: ◦ Επίπεδο Διοικητικής Διαίρεσης ◦ α/α ◦ Columns 6,7 and 8 By clicking on each column -> Edit column -> Remove this column

Step 4 – Unifying and Cleaning Data Also we wish to remove the first 4 rows which are totals from the data o We click on the of each row we wish to remove o Click All and then choose Facet – Facet by star to filter our dataset o On the left side you can now see two options: o

o True o False

Click on True -> All -> Edit Rows -> Remove all Matching rows o Click on False to see the remaining data. o Our Data are now Unified and Cleaned. o

BUILD RDF SKELETON

Step 5 – Reset RDF Skeleton o To

view and edit the RDF skeleton click on the RDF extension and then click on Edit RDF Skeleton… o You can see a basic skeleton generated automatically by the RDF extension. o However, is not a properly design structure so Click OK and then Click again on RDF extension but this time click on Reset RDF Skeleton… and OK

Step 6 – Add prefixes o Since

we wish to work with Data Cube Vocabulary we need to update the necessary prefixes

o Click on RDF -> Edit RDF Skeleton o Click on add prefix and type qb on the prefix field and click anywhere in the window o The URI field will be updated automatically o Click Save!!!

Step 7 – Base URI and URI patterns o Click edit and type http://linkedstatistics.gr o Before start building the skeleton it is important that you have already decided the URI patterns that you will give to your structure and data o The URIs we are going to use are the following: o Dataset http://linkedstatistics.gr/data/{table_name} o Schema – Structural components (DSD, Components Specifications)

http://linkedstatistics.gr/schema#{name} o Vocabulary (dimensions, measures, attributes)

http://linkedstatistics.gr/dic/{id} o Vocabulary Values

http://linkedstatistics.gr/dic/{id}#{value} o Observation Values

http://linkedstatistics.gr/data/{NameofDataset}#{Values of Dimensions sxeparated with comma}

Start Building Skeleton based on Data Cube Vocabulary

Step 8 – Define Components o Dimension(s)

ogeocode : Γεωγραφικός Κωδικός Καλλικράτη o Measure(s)

opopulation: Μόνιμος Πληθυσμός o Attribute(s)

oUnit of measure : Number of people

Step 8 – Define Components o

(8a) Define geocode o Click on add rdf:type and type qb:DimensionProperty o Click on (row index) URI o Choose The cell's content is used ... as a URI o Choose Constant Value and type the URI for this dimension (no need to repeat the base URI): dic/geocode

o Click on add:property -> property? -> rdfs:label o Click on Configure o Choose The cell's content is used ... as language-tagged text: el o Choose Constant Value and type: Γεωγραφικός Κωδικός Καλλικράτη

o Click Save !!! o Click on RDF Preview.

Step 8 – Define Components o

(8b) Define population o Click add another root node to create a new node for your graph structure o Click on add rdf:type and type qb:MeasureProperty o Click on (row index) URI o Choose The cell's content is used ... as a URI o Choose Constant Value and type: dic/population

o Click on add:property -> property? -> rdfs:label o Click on Configure o Choose The cell's content is used ... as language-tagged text: el o Choose Constant Value and type: Μόνιμος Πληθυσμός

o Click Save !!!

Step 8 – Define Components o (8c)

Define UnitOfMeasure

o Click add another root node to create a new node for your graph structure o Click on add rdf:type and type qb:AttributeProperty o Click on (row index) URI oChoose The cell's content is used ... as a URI oChoose Constant Value and type: dic/UnitOfMeasure

o Click on add:property -> property? -> rdfs:label o Click on Configure oChoose The cell's content is used ... as a text oChoose Constant Value and type: Unit of measure

o Click Save !!!

Step 9 – Define DataSet, DSD and Components Specifications o

(9a) Define DataSet

o Click add another root node to create a new node for your graph structure o Click on add rdf:type and type qb:DataSet o Click on (row index) URI o Choose The cell's content is used ... as a URI o Choose Constant Value and type the URI for this dimension (no need to repeat the base URI): data/PopulationPerGeocodeCensus2011

o Click on add:property -> property? -> rdfs:comment o Click on Configure o Choose The cell's content is used ... as language-tagged text: el o Choose Constant Value and type: Απογραφή Πληθυσμού Κατοικιών 2011. ΜΟΝΙΜΟΣ Πληθυσμός o Click Save !!!

Step 9 – Define DataSet, DSD and Components Specifications o (9b)

Define Data Structure Definition

o Click on add:property -> property? -> qb:structure o Click on Configure oChoose The cell's content is used ... as a blank node to create a new node for the DSD and click OK

o Click on add rdf:type and type qb:DataStructureDefinition o Click on blank (cell) oChoose The cell's content is used ... as a URI oChoose Constant Value and type: schema/PopulationPerGeocodeCensus2011 oClick Save !!!

Step 9 – Define DataSet, DSD and Components Specifications o

9c) Define Component Specifications with dimensions, measures and attributes o Click on add:property -> property? -> qb:component (Repeat 3 times) o Click on Configure on each three new components o Choose The cell's content is used ... as a blank node to create a new node for the Component Specification node and click OK o Click on add rdf:type and type: qb:ComponentSpecification o Click on blank (cell)

(

o Choose The cell's content is used ... as a URI o Choose Constant Value and type : 1. schema/geocode 2. schema/population 3. Schema/UnitOfMeasure

o For each component specification Click on: 1. 2. 3.

add:property -> property? -> qb:dimension add:property -> property? -> qb:measure add:property -> property? -> qb:attribute

o For each component Click on Configure o Choose The cell's content is used ... as a URI o Choose Constant Value and type: 1. dic/geocode 2. dic/population 3. dic/UnitOfMeasure

Step 10 – Create URIs for the dimensions o Click add another root node to create a new node for your graph structure o Click on (row index) URI o Choose The cell's content is used ... as a URI o Choose ‘Γεωγραφικός κωδικός Καλλικράτη’

o Click on use custom expression ->preview/edit and type 'http://linkedstatistics.gr/dic/geocode#'+value.urlify() or 'dic/geocode#'+value.urlify() o Click on add:property -> property? -> rdfs:label o Click on Configure o Choose The cell's content is used ... as language-tagged text: el o Choose ‘Περιγραφή’

o Click Save !!!

Step 11 – Define observations o o

Click add another root node to create a new node for your graph structure Click on (row index) URI o o

o o

o

Choose The cell's content is used ... as a URI Choose ‘Μόνιμος Πληθυσμός’

Click on use custom expression ->preview/edit and type 'data/PopulationPerGeocodeCensus2011#'+cells["Γεωγραφικός κωδικός Καλλικράτη"].value.urlify() Click on 1. 2.

add:property -> property? -> dic/geocode add:property -> property? -> dic/population

3.

add:property -> property? -> dic/UnitOfMeasure

Click on Configure to give values on each component o

Choose The cell's content is used ...

o o o

as a URI for the geocode and the UnitOfMeasure as integer for the population Use content of from cell … 1. 2. 3.

o o

Γεωγραφικός κωδικός Καλλικράτη, use custom expression ->preview/edit and type 'dic/geocode#'+value.urlify() Μόνιμος πληθυσμός Constant Value and type the URI: http://dbpedia.org/reserouce/People

Click Save!! Click OK -> Export -> RDF as Turtle to view the RDF file

THANK YOU