Citizen participation in scientific works: Issues, methods, and tools Dong-Po Deng1*, Te-En Lin2, Gaun-Shuo Mai3 1
* Institute of Information Science, Academia Sinica, 128 Sec. 2, Academia Rd., Nangang, 115, Taipei City, 115 Taiwan 2 Endemic Species Research Institute, Council of Agriculture, 1 Mingsheng East Road, Jiji 552, Nantou County, Taiwan 3 Biodiversity Research Center, Academia Sinica, 128 Sec. 2, Academia Rd., Nangang, 115, Taipei City, Taiwan Email:
[email protected]
Summary. Reptile Road Mortality (in Chinese, 路殺社) is a citizen science project which aims to collect reports of dead animals that have been struck and/or killed by motor vehicles through the use of Facebook. The use of Facebook makes citizen easy to provide their observations. However, the crowdsourced information contributed by citizens through social media is often in unstructured data format such as text and image. It is a challenge to process unstructured data collections for scientific purposes. In order to engage social media with citizen science, we developed novel methods to transfer unstructured data to structured data for scientific purposes. Keywords. Citizen science, Facebook, crowdsourcing, volunteered geographic information, semantic web.
1. Introduction
citizen science, there is a need to develop novel
The power of social media is increasing its
methods to transfer unstructured data to
influence on the production of scientific works. A large number of social media users often contribute in situ information on the Web. They are often considered as human sensors who actively report what are happening in their surroundings.
Voluntary
participation
has
become an important part of citizen science. The emergence
of
social
media
offers
new
opportunities to recruit more participants to citizen science projects. Utilizing social media to engage with a large number of citizens can be a way to improve data collection over a large geographic region and a long time span. However, the design of social media applications and services aim to facilitate social interactions, but not scientific activities and data analyses. The crowdsourced information contributed by citizens through social media is often in unstructured data format such as text and image. It is a challenge to process unstructured
data
collections
for
scientific
purposes. In order to engage social media with
structured data.
2. Reptile Road Mortality: A Citizen Science Project This citizen science project is hosted by the Endemic Species Research Institute, Council of Agriculture, Taiwan. This project aims to collect reports of dead animals that have been struck and/or killed by motor vehicles through the use of a Facebook group. The reason of using Facebook as a crowdsourced data collection platform is its high user base in the Taiwanese population. Figure 1 illustrates a roadkill observation posted in the Facebook group Reptile Road Mortality. Chuang Yu-Ta took a photo of killed animal on the road and posted the photo with location and time description on the FB group. When Joyce read the post, she identified the species in Chuang Yu-Ta’s photo and left the species name as comment. Thus, the roadkill
observation was composed of photo, description
3.2 Information Formalization
of location and time, and identification of species.
Before we begin to transform the crowdsourced content to RDF, we first develop an ontology for not only expressing the notions of “Citizens as Sensors” but also formalizing the extracted nameentities, e.g. species and geospatial names. To make linked data interoperable, the ontology reuses suitable vocabularies from the existing ontologies as many as possible. To frame social media content, we use Semantically Interlinked Online Communities (SIOC) and The Friend of a
Figure 2. A post on the Facebook group Reptile Road Mortality, as well as biodiversity observation information embedded in the post. Because of privacy and security issues, Facebook strips metadata (EXIF) from the photos. Without EXIF data, a photo from Facebook is just an image; the photo cannot in itself indicate the date and location on which it was taken. The text messages accompanying the photos will be the main
sources
for
extracting
biodiversity
information about the species in the photos.
Friend (FOAF). To present the concept of sensor, we use the vocabularies of W3c Semantic Sensor Network (SSN). To semantically encode geospatial data, we use the vocabularies of Open Geospatial Consortium
(OGC)
GeoSPARQL.
To
match
biodiversity data coordination, Darwin-SW is used to represent species names. 3.3 Information Reuse The databases of the formalized crowdsourced information can be used to help construct better content management systems (CMSs). These CMSs can provide ecologists with tools to explore
3. Issues, Methods, and Tools for Citizen participation
ecological observations via taxonomy of species names and maps. Moreover, the use of NLP toolkits and formalized names can be used to
To deal with this crowdsourced information, we
improve the input of crowdsourced information.
propose an approach which is comprised of three
By using JavaScript, a semantic annotation plug-
steps as following sections.
in is developed for disambiguating the use of place names and species names.
3.1 Information Extraction To properly separate out words in sentences is an important step in Chinese NLP tasks. Using a lexicon
as
a
resource
to
conduct
the
segmentation is simple and efficient. However, most Chinese NLP tools are developed for general purposes. Their lexicons are rarely comprised of rich species and place names. To efficiently extract species and place names from Facebook threads, we compiled a placename lexicon from Taiwan Geographic Names database and a species-name lexicon from Taiwan Catalogue of Life databases (TaiCOL). Note that, however, species names and place names found in Facebook posts and comments are not always in the specific lexicons.
4. Conclusions Social media brings new opportunities to the citizen science. Information crowdsourced from social media is considered valuable for scientific works. This study proposed an approach to transferring
unstructured
crowdsourced
information to structured data for scientific purposes. This approach has been successfully implemented to facilitate social-media based citizen science projects. We believe it has broader application
in
user-generated
content
management as well, and it promises to be a good start in solving important design problems in citizen science projects on the Web.