boosting the ICT business sector with new business value and ... new business value in the ICT industry. ..... applications were developed; for iOS, Android and.
Ss. Cyril and Methodius University in Skopje School of Doctoral Studies Computer Science and Engineering
OPEN DATA: THE CURRENT STATE AND CHALLENGES IN MACEDONIA - PhD Report -
Candidate: Milos Jovanovik Mentor: Prof. Dimitar Trajanov, PhD
March 2013 Skopje
OPEN DATA: THE CURRENT STATE AND CHALLENGES IN MACEDONIA Milos Jovanovik Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University Skopje, Republic of Macedonia
ABSTRACT The concept of Open Data represents the idea that certain data should be freely available to the public, for use, reuse, republishing and redistributing, with little or no restrictions. The goal is to make data of public nature available in an open manner, in a raw and machine-readable format, so that they can be used for building useful applications which leverage their value, allow insight, provide access to government services, and support transparency. These data can contribute to the overall development of the society, by both boosting the ICT business sector with new business value and allowing the citizens a deeper insight into the work of their government. This recent rise in interest for Open Data introduced the necessity for efficient mechanisms which enable easy transformation, publishing, management, consumption and visualization of the data. In our ongoing research in the field of Open Data and Linked Data, we try different approaches which would potentially provide effective and efficient methods for creating an Open Data ecosystem in the Republic of Macedonia. HYPOTHESIS The hypothesis of the research states: Open Data and Open Government Data in Macedonia provide more transparency for the Government and its institutions, enable higher quality of life for the citizens, and add new business value in the ICT industry. I.
INTRODUCTION
The intensive development of e-technologies provided space for additional broadening of the concept of
openness: a large number of developed countries in the world are proactively opening their government documents and data to the public, in a raw, electronic format. The governments of USA, UK, Germany, Spain, Finland, Denmark, Austria, Australia, Canada, Italy, Kenya, etc., as well as local authorities from London, Edmonton, Zaragoza, New York, etc., have created their own Open Government Data portals. These portals provide government data in machine readable formats, which allow them to be easily integrated within applications which leverage the value of the data itself and provide the citizens with useful information, insight, analysis, etc., in various contexts. This approach provides more transparency in the work of the governments and their institutions, and, on the other hand, improves the quality of everyday life of the citizens. Additionally, the ICT industry is provided with the opportunity to develop innovative applications over the open data from the governments, which can potentially build new business models and provide new revenues for the ICT companies [1]. These are our main motivations for our ongoing research in the fields of Open Data and Linked Data. Our main goal is to create an Open Data eco-system in the Republic of Macedonia, and we are trying to achieve this by testing and developing different approaches which would potentially provide effective and efficient methods for achieving our goal. II. GOVERNMENT ACTIVITIES The Macedonian Government started its initiative for Open Government in 2011. The Ministry of Foreign Affairs sent out an official letter [2] on 24 August 2011 to the Open Government Partnership, in which it
expressed the intention of the Government of Republic of Macedonia to join the Open Government Partnership initiative1. The Macedonian Government officially joined the global Open Government Partnership in April 2012, and took part in the annual meeting in Brazil, from 16 – 18 April 2012. After this meeting, Macedonia made a commitment [3] to OGP with a list of activities which are planned as part of the official Action Plan [4] of the Government of Macedonia. The Action Plan has nine different sets of goals. The list of tasks consists of activities for establishing an Open Government Data portal in Macedonia, development of new business models from open data, prioritizing the opening of data requested and identified by the stakeholders and the citizens, using the inter-operable services as guidance in opening data, analysis of the legal framework that provides the concept of open data and determine the need for eventual changes, etc. After a series of meetings with the stakeholders, in which our research team took part as representatives from the academia, the Macedonian Government and the Ministry of Information Society and Administration, which was appointed as responsible for the Open Government and the Open Data initiatives, started working on the tasks from the Action Plan. At first, a beta version of the Open Government Data portal was published2. After a few months of gathering data from the ministries and the government institutions, the Macedonian Government published the official Macedonian Open Government portal3. The portal currently holds open data from twelve ministries, six government institutions and three independent institutions.
1
http://www.opengovpartnership.org
2
http://opendata.mioa.gov.mk/
3
http://opendata.gov.mk/
The other tasks from the Action Plan are currently ongoing, and as stated in the plan itself, are planned to be finished by 2014 and 2015. III. PROJECTS AT THE FACULTY OF COMPUTER SCIENCE AND ENGINEERING As part of our research in the fields of Open Data, Linked Data and the Semantic Web, we have worked on a number of academic and industry projects. The purpose of these projects was to identify the technical difficulties in creating, serving and using open data, to provide proof-of-concept solutions, as well as to provide example use-cases of the benefit which data published in open format can provide to the different stakeholders. These projects and ideas were presented to the Macedonian Government officials at a series of meetings which were organized by the World Bank and the Government. Our goal was to provide the other stakeholders in the Open Data eco-system with the technical details about developing, sustaining and using data in machine-readable formats. The projects done by our research group at the Faculty of Computer Science and Engineering in Skopje can be categorizes as finished, ongoing or future projects. A. Finished Projects The projects which were finished during this research include the Open University Data project, the development of the Crime Map for Macedonia, the development of an Open Data portal based on Semantic Web technologies, the development of mobile and web applications based on the open data from the Health Insurance Fund of Macedonia, and the development of a System for Disaster and Crisis Management. 1) Open University Data The Open University Data project was one of our first activities in the field of Open Data. The project had the
aim to join the Faculty in the trend of publishing public data in an open format, on the web, and making it available for everyone to use and reuse [5]. As part of the project, we developed a system for mapping relational data from databases into data represented in semantic web format (N3 and RDF). The system also provided options for editing and querying the semantic data by using a SPARQL endpoint. Besides the semantic data dynamically created from the relational databases of the Faculty, we added some static data for other faculties from the Ss. Cyril and Methodius University in Skopje, as well.
In order to automate the process of mapping the relational database tables to ontologies, we developed a Mapping Tool (Figure 3). It is web application which uses the D2R Server functionalities to provide an easier and simpler way to connect the relational data with existing ontologies.
The system developed for the project consists of five parts: a relational database, a D2R Server, a Mapping Tool, an Ontology Repository and RDF Documents, and a SPARQL endpoint (Figure 1).
Figure 2. The ontology used for the Open University Data project.
Figure 1. Open University Data system architecture. For the purpose of the project, we created an ontology. Following the best practices for ontology development, we reused existing ontologies for this task (Figure 2). Our University ontology uses the Friend of a Friend (FOAF) Ontology for representation of the employees and the students at the Faculty, and the Academic Institution Internal Structure Ontology and the University ontology for describing the internal organizational structure of the academic environment. Additionally, we use the GeoNames Ontology, the Timeline ontology and the Dublin Core Metadata Vocabulary.
Figure 3. The Mapping Tool. The SPARQL endpoint (Figure 4) allows dispatching of SPARQL queries via a query string for browsing the data. The results of the queries can also be shown in JSON, XML and XML+XSL format, which means that they can be used by both web and mobile applications. One of the aims of the project and the publishing of the Faculty data in an open, machine-readable format was to encourage developers to use the data to create applications which can be useful for our students and
the staff. We believe that by opening its data, the Faculty could provide better understanding of its structure and operations. Giving access to data can be beneficial to the Faculty, the students and the developers.
idea of opening this type of data to the general public, taking into consideration the privacy laws and the protection of the identity of the persons involved at the same time. Using the official bulletin from MOI, we managed to create a crime map of the Republic of Macedonia. Given that the official MOI bulletins were written in plain, natural language, in order to automate the process of identifying the details about a criminal event and geo-locating it on a map, we used natural language processing techniques. The Crime Map project is a web application, developed in PHP as a server-side programming language, using MySQL as a database engine and HTML and JavaScript for the presentation layer. The web application automatically updates the database with new information from MOI’s website, on a daily basis.
Figure 4. An example SPARQL query over the open data from the Faculty repository. 2) Crime Map for Macedonia Another project was the development of a Crime Map for the Republic of Macedonia [6]. We started the project from the idea that one of the fields which could benefit from the use of open data was crime analysis. Crime analysis is a law enforcement function which involves systematic analysis for the purpose of identifying, and analyzing patterns and trends in crime and social disorder. One key component of crime analysis is crime mapping, which provides a visual summarization of the criminal events which underwent in a certain geographic area, along with information about the location and severity of the crimes. In June 2011, the Ministry of Internal Affairs of the Republic of Macedonia (MOI) started issuing a bulletin on their official website, in which they publish a selection of the criminal events which happened the previous day (or days). This was one step closer to the
Figure 5. The crime map for the entire country. We developed an interactive user interface which allows the user to filter the events by the category of crime, by location, time of day, day of week, etc. This type of filtering of the crime event provides the end users with a powerful tool which allows detailed analysis of the crime patterns in a specific geographic location, such as a neighborhood. This point exposes the key benefit of the Crime Map: the citizens can take into consideration the crime data from the map for the
region of interest; either the part of town in which they plan to buy their future home, or where their children will go to school, or where their work place is, or will be situated.
module for forum discussion; module for blogging; module for displaying the published linked data and the semantically annotated (FOAF) profiles of the users and their relationships; One of the main motives for the development of the Open Data Portal is enabling the innovative users and businesses to leverage the value of open data, which is provided by other users, companies or governments. The development of software applications and services over open data provides numerous benefits for the citizens and the society in general, such as greater transparency, communication, acquisition of trust, innovation, and business value.
Figure 6. The crime map for the neighborhood near our Faculty. 3) Open Data Portal based on Semantic Web Technologies This rise in interest for Open Data, introduced the necessity for efficient mechanisms for publishing, management and consumption of such data. Therefore, we developed an Open Data Portal, with the use of the technologies of the Semantic Web. It intention is to allow users to publish, manage and consume data in machine-readable formats, interlink their data with data published elsewhere on the Web, publish applications build on top of the data, and interact with other users [7]. The platform provides the mechanisms which enable innovation in both application and use of open data, by either independent developers or ICT companies. It has a modular architecture and consists of six modules: module for displaying the user datasets and publications intended for public use; module for publishing and presentation of applications built by the users, which use the data available on the portal; module for sharing ideas;
The Open Data principle used on this platform brings huge benefits for the end-users and citizens. This system provides a great social for the citizens and the governments, as well as economic potential which can be used by the ICT business sector to build and promote services and applications based on the open data from the Portal. With this, the end-users participate directly or indirectly in the use and consumption of the data, and can use the experience to gather better insight and understanding of the ways their government works. This experience can then be used as a feedback back to the government bodies, in order to provide better services for the citizens. This solution also enables and promotes transparency, innovation and openness to all users and provides assistance in the provision of better solutions and ideas. The system and the application we designed and developed are intended to demonstrate the advantages the semantic web technologies provide when building an Open Data Portal. This platform should not be viewed as a platform which only offers an end product, but as a place which will constantly be upgraded and hopefully become a base which will offer opportunities for data and application creation, management, sharing and consumption.
principles of responsive interface design, which meant that they could be used from both web browsers and mobile phones. We focused the project on two applications: one for pharmacies, and another for drugs.
Figure 7. System Architecture.
The first one, called HIF Pharmacies, provides the users with location and contact information of pharmacies, based on their current location, or by their search preference (Figure 9). This can help the citizens in locating the nearest pharmacy, which can help them save time.
Our intention with the Open Data Portal was to build a strong community by removing the artificial and unfortunate separation of swimming information, caused by the fact that different government organizations keep their operating details and data on different online and offline locations.
Figure 9. The HIF Pharmacies Application. The second application, called HIF Drugs, provides the users with information about the nominal price of drugs (Figure 10), as defined by the Ministry of Health.
Figure 8. Flow of information in the Open Data Portal. 4) Health Insurance Fund Open Data Applications In cooperation with the Health Insurance Fund of Macedonia (HIF), we developed two mobile and web applications, which had the purpose of demonstration the benefit of using the public data available from HIF. For the project, we used HIF data already available on their website in Excel format, and transformed it into more suitable representation formats, such as XML. The applications were developed using the
Figure 10. The HIF Drugs Application. This information can be used by the citizens when buying a certain drug, to check the nominal price for
the drug and see if the pharmacy is selling the drug at this price, or maybe a lower / higher price. The citizens can add comments in the app for a given drug, and inform the other users about the pharmacies which sell the drug for a cheaper or more expensive price. This form of crowd-sourcing leverages the purpose of the application and the open data it uses.
Windows Phone smartphone platforms. The mobile applications provide map and list view of the events, links for publishing the event details on social media, user preferences and information about what to do in case of an emergency.
5) Disaster and Crisis Management System The Disaster and Crisis Management System is a webbased system for managing disasters and crisis, created for the Crisis Management Center of the Republic of Macedonia. The system is used for collecting, processing and sharing events for which the Center is in charge. It is also used for generating reports that are presented to other government institutions. The solution includes smartphone applications, which have the purpose of sharing these events in real-time with the general public [8]. Employees at the Crisis Management Center are responsible for adding and modifying events. There are two types of events: private events, which are visible by government employees, and public events, which are visibly by everyone. The system offers several web services which can be used by other applications for retrieving information about events of interest for the citizens. The web services offer various types of filters in order to allow the users to select and download only the data which they are interested in.
Figure 12. Mobile Applications. Additionally, the mobile applications enable the users to contribute with their data, both to the government bodies, and to other users. Users can report an event that they believe can be potentially interesting to other users by filling in a very simple form. Most of the data (such as coordinates, altitude, address, etc) are automatically determined by the application, by using services such as Google Maps. Users can also take pictures and include them in the event description which they are sharing with others through the application. The main purpose of the system is to open up government data and provide ways in which the data can be used in order to improve the lives of the citizens. Moreover, the system allows the end-users, i.e. the citizens, to crowd-source additional relevant information and data, further broadening the positive aspect of this type of open data. B. Ongoing and Future Projects
Figure 11. Web-based Application. Along with the web-based application, three mobile applications were developed; for iOS, Android and
Some of the currently ongoing projects include the analysis of open data and applications, and building a local open data catalog at the Faculty of Computer Science and Engineering.
1) Analysis of Open Data and Applications The idea behind the first project is to make an in-depth analysis of the currently published open data and the applications built upon them from the largest Government and non-Government Open Data portals in the world, in order to determine the categories they belong to. The categorization of both the data and the applications is made from different aspects: categorization based on the type of data: finance, traffic, education, climate conditions and weather, crime data, etc.; categorization based on the area of impact: transparency, everyday life, general quality of life, etc. The outcome of this analysis should point out the level of discrepancy between what the governments and their institutions believe is useful (and therefore publish it), and what is actually useful for the citizens, developers and business entities (what types of applications are developed and used). This should then be used as a feedback or a pointer for the governments and the institutions of the system about which types and categories of data are most interesting and have the most impact on both the citizens, and the developers and companies involved in Open Data applications development. The same analysis should give a good overview on the categories of data published in different countries. This would potentially point out the main difference in priorities in different countries, or the state of readiness of a government to publish data from a certain category. We started with an analysis of the data published on the UK Open Government Data Portal4. We gathered the metadata for all of the 8.900 datasets available on the portal, after which we made a feature extraction task, by using tf-idf (term frequency–inverse document frequency) for identifying the important terms within the metadata of the data sets. After this, we used hierarchical agglomerative clustering, in order to
4
http://data.gov.uk
combine the data sets by their description and tags. For similarity between the datasets we used cosine similarity, and for similarity between clusters we used mean linkage.
Figure 13. Results from the UK Open Government Data Analysis. The preliminary results of the analysis show that most of the data published by the UK Government fall into the categories of Finance, Transparency and Health (Figure 13). The next step here would be to analyze the applications which are developed based on the data from the UK Open Government Data Portal. After that, we believe we would have the enough data to make a comparison, and make certain conclusions for the data published by the UK Government. After this, the next step would be to apply the same analysis algorithm on other large open data portals. 2) Open Data Catalog After working on different projects which generated a fair amount of open data, we decided to set up an Open Data catalog5 at the Faculty. The catalog uses the
5
http://data.finki.ukim.mk
CKAN6 data management software as a platform. We choose CKAN, because it is used more than 50 data hubs around the world, including the official governments open data portals of UK, Germany, Netherlands, Austria, Brazil, Norway, Uruguay, etc. In January 2013, the USA also announced that they will move their existing open government data on a CKAN catalog. Our plan, within this activity, is to publish the data sets we’ve generated and used as part of the previous projects, and make them available from a single point on the web. The main purpose of the project would be to show the Government and the Ministry of Information Society and Administration that CKAN is the best choice for a platform for open government data, and possibly assist them in the process of moving their existing portal onto a CKAN catalog. IV. COMMON PITFALLS When we talk about the concept of Open Government Data, and Open Data in general, there are a couple of concerns which are raised regarding privacy and security: can this open access somehow harm the citizen, the society, of even the government and the country? What if some of the data are having a negative impact over the integrity and reputation of a particular citizen? What if the open access to data helps criminals or other individuals and groups with anti-social behavior to easily make plans and assessments for their next action? One common pitfall when opening up data and developing applications over open data is the invading of privacy of the citizens [9]. Privacy can be invaded when an individual can be unambiguously identified using only the data set or the application. In order to avoid this, we implemented a process of anonymization in the Open University Data solution. In the step of
6
http://ckan.org
transformation of the data from a relational database, into an RDF triple collection, we remove all of the personal data for students and employees, such as name, surname, student index number, unique master citizen number, etc. Another pitfall occurs when open data applications are using satellite maps to point to events [9]. The details on the satellite nowadays are so advanced, that it becomes very easy to identify the neighborhood, the street, the buildings and houses, their luxury, etc., which leads to easier identification of the person or people who took part in the event geo-located at the given location. This is especially problematic when we talk about crime maps, because this can lead to rightfully or wrongfully conclude that an individual was directly involved into a crime, whether as an offender, or a victim. In order to avoid this issue with the Crime Map of Macedonia, we provided a view with a heatmap, which does not put exact point on a map, but gives a map with different colors and intensity, based on the number of crimes in an area. V. CONCLUSION In this report we saw that the recent rise in interest for Open Data and Open Government Data introduces a new necessity for efficient mechanisms which enable easy transformation, publishing, management, consumption and visualization of the data. The Government of the Republic of Macedonia joined the global Open Government Partnership last year, and started working on the tasks from its Action Plan, on order to provide a functional eco-system for Open Government Data in the country. Following the best practices from countries which have successfully built their own Open Data ecosystems, our research team at the Faculty of Computer Science and Engineering is working on a number of projects which have the goal to show the stakeholders in the Open Data area in Macedonia methods, techniques, technologies, and examples for good usecases and applications for Open Government Data.
The opening of the data by the governments can be beneficial for all of the involved sides: the governments and their institutions become more transparent in their work; the ICT business sector gains access to an entirely new business area and is able to create new business values and profit; the citizens, through the various application built over open data gain insight into the way the governments and institutions work, and are provided with application which help them increase the quality of life. The Action Plan of the Macedonian Government has tasks which will take another two years to be completed. It is our goal in our ongoing research to try different approaches which would potentially provide effective and efficient methods for creating an Open Data eco-system in the Republic of Macedonia. REFERENCES st
[1]
V. Kundra, “Digital Fuel of the 21 Century: Innovation through Open Data and the Network Effect”, Joan Shorenstein Center on the Press, Politics and Public Policy, Harvard College, January 2012: http://shorensteincenter.org/2012/01/digital-fuel-ofthe-21st-century-innovation-through-open-data-andthe-network-effect/
[2]
Open Government Partnership, Participating Countries, Macedonia: http://www.opengovpartnership.org/node/%2077
[3]
OGP Country Commitment, Macedonia: http://www.opengovpartnership.org/commitments/op en-data-2
[4]
The Official Macedonian Government Action Plan for Open Government and Open Data: http://www.opengovpartnership.org/sites/www.openg ovpartnership.org/files/country_action_plans/Macedon ia_OGP_AP_1.pdf
[5]
M. Mitrevski, M. Jovanovik, R. Stojanov, D. Trajanov, th “Open University Data”, in Proceeding from the 9 Conference for Informatics and Information Technology, 2012.
[6]
D. Temelkovski, M. Jovanovik, I. Mishkovski, D. Trajanov, “Towards Open Data in Macedonia: Crime Map based on Ministry of Internal Affairs’ Bulletins”, in
th
Proceeding from the 9 Conference for Informatics and Information Technology, 2012. [7]
M. Kostovski, M. Jovanovik, D. Trajanov, “Open Data Portal based on Semantic Web Technologies”, in th Proceeding from the 7 South East European Doctoral Student Conference, 2012.
[8]
I. Mishkovski, R. Stojanov, B. Kostadinov, “Web-based Disaster and Crisis Management System”, in Proceeding th from the 10 Conference for Informatics and Information Technology, 2013 (to be published).
[9]
H. Graux, “Open Government Data: reconciling PSI reuse rights and privacy concerns”, European Public Sector Information Platform, Topic Report No. 2011/3, October 2011: http://epsiplatform.eu/sites/default/files/Topic_Report _Privacy.pdf