The ESS and Big Data

18 downloads 292 Views 2MB Size Report
Eurostat big data in the European Statistical System. Michail SKALIOTIS – EUROSTAT, Head of Task Force 'Big Data'. Con
Conference by STATEC and EUROSTAT

Savoir pour agir: la statistique publique au service des citoyens

big data in the European Statistical System Michail SKALIOTIS – EUROSTAT, Head of Task Force 'Big Data' Eurostat

Datafication Digital footprint Eurostat



Proclamation of pope Benedict



Proclamation of pope Francis

2013 Eurostat

African proverb “ When the music changes, so does the dance” If we fail to listen we will be out of step! (Denise Lievesley)


Big data

@ ESS – key points

ESS (European Statistical System) 

Scheveningen Memorandum September 2013  Examine the potential of big data sources for official statistics  Official Statistics big data strategy as part of wider government strategy  Address privacy and data protection  Collaboration at European and global level  Address need for skills  Partnerships between different stakeholders (government, academics, private sector)  Developments in methodology, quality assessment and IT  Adopt action plan and roadmap for the ESS Eurostat

Big data

@ ESS – key points

ESS (European Statistical System) 

Scheveningen Memorandum Sep 2013  Task Force Big Data  Big Data Action Plan and Roadmap 1.0 Sept. 2014  ESS Pilots 2016 - 2019

Implementation of ESS Vision 2020: Big Data project integral part of the portfolio

European Commission Communication 

"Towards a thriving data driven economy"

Public Private Partnership on big data

International cooperation (UNSD, UNECE, etc.) Eurostat

Areas in Big data roadmap Policy

Quality framework

Experience sharing



Ethics / Communication



IT Infrastructures


Challenges ▫ cooperation, sharing of know-how ▫ development of a sound methodology ("from design-based to model-based approach") ▫ exploration & tentative implementation

Action (example) ▫ Pilot projects, carried out by the Member States (ESSnet) 

2015 – 2019 (FPA / SGA construction)

Exploring different big data sources (but also IT architecture, partnerships), developing generic guidelines and frameworks

Enable the ESS to gradually integrate big data sources into the production of European and national statistics ? Eurostat

Challenges ▫ new skills for NSI staff: statisticians vs. data scientists ? ▫ computing capacity, hardware ? ▫ analytical tools, software? ▫ storage ?

Action (example) ▫ Training program for European statisticians (ESTP) 

In the next years: dedicated courses on big data

Focus on big data sources and on big data tools

Acquiring the skills needed to assess sources and their quality, the skills to use tools and to explore big data sources Eurostat

Challenges ▫ integrating official statistics in big data strategies ▫ getting access to data & continuity of access ▫ data security & privacy concerns ▫ pay for data ?

Action (example) ▫ Project on the analysis of legislation and strategy (but also ethics and communication)  2015-2017 (22 months)  Analysis for EU and for Member States at national level ▫ See also the Feasibility study on the use of mobile positioning data for tourism statistics (report on feasibility of access) Eurostat

Challenges ▫ transversal challenges to all big data activities: quality and ethics & communication ▫ big data vs. statistics : "goodness of fit" (concepts, representativeness,…)

Action (example)

▫ impact on the public opinion of privacy and security concerns ?

▫ Cooperation with UN (lead) on a quality framework for big data ▫ Project on the analysis of ethics and communication (but also legislation and strategy)  2015-2017 (22 months)  Analysis for EU and for Member States at national level Eurostat

Big data =

Multiple sources & Multiple outputs Mobile phone data Tourism Statistics

Commuting Statistics

Traffic Statistics

Mobile Phone Data

Population Statistics

Satellite Images

Migration Statistics

Population Statistics

VGI websites


Smart Meters

Statistical domains






Balance of payments

Regional and GIS


ICT usage

Prices and inflation

Land use


National initiatives as a driver 

CBS Netherlands



CSO Ireland

Statistics Finland

SURS Slovenia


Insights for world heritage sites from Wikipedia use • Source • Hourly page views for each Wikipedia article • Content of Wikipedia articles • High timeliness, temporal detail and transparency, no geographical information

• Processing • Big Data Sandbox: computer cluster with 4 nodes • Tools: Pig, Map-Reduce, Python, R • Association of Wikipedia articles to specific WHS

• Output • Exposure of world heritage via Wikipedia

Insights for world heritage sites from Wikipedia use

Page views of English Wikipedia articles related to World Heritage Sites

Nowcasting Unemployment • Source • Google Trends (others to be explored) • High timeliness, geo info available, low transparency

• Processing • Low computing power required • Time-series modelling (machine learning to be explored) • Tools: R

• Output • Nowcasting of unemployment from 1 month lag to current time

The statistical office of the future  Data flows in addition to surveys and censuses  Embedded in data flow – statistics 'everywhere'  Product designers in addition to data collection designers 

Statistical modelling will be a major activity

From descriptive indicators to nowcasting (and forecasting)

 Trust and quality will be key  New role in teaching digital literacy  Accreditation and certification instead of pure production  Address issues linked to quality & transparency, privacy & confidentiality, access to third party data sources & data sharing, scientific standards & methodology, professional ethics, skills, … Eurostat

The NSI of the future: Official Statistics in a full-fledged IoT world

Svein Nordbotten: Use of electronically observed data in official statistics Eurostat

Thank you for your attention !
