Citizen participation in scientific works: Issues ...

7 downloads 1002 Views 244KB Size Report
engage social media with citizen science, we developed novel methods to transfer unstructured data to ... content management systems (CMSs). These.
Citizen participation in scientific works: Issues, methods, and tools Dong-Po Deng1*, Te-En Lin2, Gaun-Shuo Mai3 1

* Institute of Information Science, Academia Sinica, 128 Sec. 2, Academia Rd., Nangang, 115, Taipei City, 115 Taiwan 2 Endemic Species Research Institute, Council of Agriculture, 1 Mingsheng East Road, Jiji 552, Nantou County, Taiwan 3 Biodiversity Research Center, Academia Sinica, 128 Sec. 2, Academia Rd., Nangang, 115, Taipei City, Taiwan Email: [email protected]

Summary. Reptile Road Mortality (in Chinese, 路殺社) is a citizen science project which aims to collect reports of dead animals that have been struck and/or killed by motor vehicles through the use of Facebook. The use of Facebook makes citizen easy to provide their observations. However, the crowdsourced information contributed by citizens through social media is often in unstructured data format such as text and image. It is a challenge to process unstructured data collections for scientific purposes. In order to engage social media with citizen science, we developed novel methods to transfer unstructured data to structured data for scientific purposes. Keywords. Citizen science, Facebook, crowdsourcing, volunteered geographic information, semantic web.

1. Introduction

citizen science, there is a need to develop novel

The power of social media is increasing its

methods to transfer unstructured data to

influence on the production of scientific works. A large number of social media users often contribute in situ information on the Web. They are often considered as human sensors who actively report what are happening in their surroundings.

Voluntary

participation

has

become an important part of citizen science. The emergence

of

social

media

offers

new

opportunities to recruit more participants to citizen science projects. Utilizing social media to engage with a large number of citizens can be a way to improve data collection over a large geographic region and a long time span. However, the design of social media applications and services aim to facilitate social interactions, but not scientific activities and data analyses. The crowdsourced information contributed by citizens through social media is often in unstructured data format such as text and image. It is a challenge to process unstructured

data

collections

for

scientific

purposes. In order to engage social media with

structured data.

2. Reptile Road Mortality: A Citizen Science Project This citizen science project is hosted by the Endemic Species Research Institute, Council of Agriculture, Taiwan. This project aims to collect reports of dead animals that have been struck and/or killed by motor vehicles through the use of a Facebook group. The reason of using Facebook as a crowdsourced data collection platform is its high user base in the Taiwanese population. Figure 1 illustrates a roadkill observation posted in the Facebook group Reptile Road Mortality. Chuang Yu-Ta took a photo of killed animal on the road and posted the photo with location and time description on the FB group. When Joyce read the post, she identified the species in Chuang Yu-Ta’s photo and left the species name as comment. Thus, the roadkill

observation was composed of photo, description

3.2 Information Formalization

of location and time, and identification of species.

Before we begin to transform the crowdsourced content to RDF, we first develop an ontology for not only expressing the notions of “Citizens as Sensors” but also formalizing the extracted nameentities, e.g. species and geospatial names. To make linked data interoperable, the ontology reuses suitable vocabularies from the existing ontologies as many as possible. To frame social media content, we use Semantically Interlinked Online Communities (SIOC) and The Friend of a

Figure 2. A post on the Facebook group Reptile Road Mortality, as well as biodiversity observation information embedded in the post. Because of privacy and security issues, Facebook strips metadata (EXIF) from the photos. Without EXIF data, a photo from Facebook is just an image; the photo cannot in itself indicate the date and location on which it was taken. The text messages accompanying the photos will be the main

sources

for

extracting

biodiversity

information about the species in the photos.

Friend (FOAF). To present the concept of sensor, we use the vocabularies of W3c Semantic Sensor Network (SSN). To semantically encode geospatial data, we use the vocabularies of Open Geospatial Consortium

(OGC)

GeoSPARQL.

To

match

biodiversity data coordination, Darwin-SW is used to represent species names. 3.3 Information Reuse The databases of the formalized crowdsourced information can be used to help construct better content management systems (CMSs). These CMSs can provide ecologists with tools to explore

3. Issues, Methods, and Tools for Citizen participation

ecological observations via taxonomy of species names and maps. Moreover, the use of NLP toolkits and formalized names can be used to

To deal with this crowdsourced information, we

improve the input of crowdsourced information.

propose an approach which is comprised of three

By using JavaScript, a semantic annotation plug-

steps as following sections.

in is developed for disambiguating the use of place names and species names.

3.1 Information Extraction To properly separate out words in sentences is an important step in Chinese NLP tasks. Using a lexicon

as

a

resource

to

conduct

the

segmentation is simple and efficient. However, most Chinese NLP tools are developed for general purposes. Their lexicons are rarely comprised of rich species and place names. To efficiently extract species and place names from Facebook threads, we compiled a placename lexicon from Taiwan Geographic Names database and a species-name lexicon from Taiwan Catalogue of Life databases (TaiCOL). Note that, however, species names and place names found in Facebook posts and comments are not always in the specific lexicons.

4. Conclusions Social media brings new opportunities to the citizen science. Information crowdsourced from social media is considered valuable for scientific works. This study proposed an approach to transferring

unstructured

crowdsourced

information to structured data for scientific purposes. This approach has been successfully implemented to facilitate social-media based citizen science projects. We believe it has broader application

in

user-generated

content

management as well, and it promises to be a good start in solving important design problems in citizen science projects on the Web.