Migrating Researcher from Local to Global: Using ORCID to Develop ...

4 downloads 1224 Views 215KB Size Report
Using ORCID to Develop the TLIS VIVO with CLISA and ... the visibility, collaboration of local and global researchers. ... This leads to the Semantic Web which.
Migrating Researcher from Local to Global: Using ORCID to Develop the TLIS VIVO with CLISA and Scopus Chao-chen Chen, Mike W. Ko, and Vincent Tsung-yeh Lee Graduate Institute of Library and Information Studies, National Taiwan Normal University, Taipei, Taiwan [email protected] {mikgtk,bigyes}@gmail.com

Abstract. This paper presents a prototype of TLIS VIVO, a researcher networking system of the Library and Information Science field based in Taiwan, by using ORCID, VIVO, and Linked Open Data technologies. It extends VIVO with the author identifier system ORCID, and integrates data thus harvested from Chinese Library & Information Science Abstracts (CLISA), and Scopus. The study demonstrates a practical approach to increase the visibility, collaboration of local and global researchers. Keywords: VIVO, Research Networking, ORCID, Library & Information Science, Open Linked Data.

1

Introduction

Contemporary research is regarded as growingly complicated. It is team-based, interdisciplinary, and cross-institutional cooperation. The universities involved always aim for openness which is the most important trend of scholarly communication in the digital age. To eliminate academic segregation set by disciplines and institutions, scholars need to turn to the Web for better discovery and communication at a distance, instead of in proximity. Interdisciplinary and international e-Science cooperation needs to tackle the problem of distributed data in different formats and subject-specific terms. This leads to the Semantic Web which strives for standardized knowledge representation so that data can be linked and integrated in a more meaningful way. The Linked Open Data project is a result of the Semantic Web technology. On the other hand, VIVO can be seen as an application of the technology in the academic sector [3]. Sometimes it may seem a difficult task for scholars to maintain his/her VIVO profile, especially on the part of publication list. To coordinate local and international publications with ease in a standard open format for the Internet is the problem this study tries to deal with.

2

Purpose of Research

Modeled on the NTNU VIVO experience and spirit [2], this study tries to build up a cooperative open scholar network of data and research. Focusing on local Library and S.R. Urs, J.-C. Na, and G. Buchanan (Eds.): ICADL 2013, LNCS 8279, pp. 113–116, 2013. © Springer International Publishing Switzerland 2013

114

C.-C. Chen, M.W. Ko, and V. Tsung-yeh Lee

Information Science (LIS) scholars, it employs open source technology for a Taiwan LIS (TLIS) VIVO. For local Chinese articles, they would be covered by Chinese Library and Information Science Abstracts (CLISA), while English articles by Scopus. With these sources, 95% or more of local LIS publications can be discovered. By using the global author identifier Open Research and Contributor ID (ORCID), resources can be integrated with the issue of name disambiguation settled. VIVO and ORCID are interoperable [6]. This provide a working basis for the study.

3

Implementation of the Prototype

The process can show that it is feasible to link related data of resources to the subjectspecific TLIS VIVO. 3.1

CLISA and Scopus to ORCID

Transliteration would cause repetition and ambiguity for both Chinese and English names of Chinese researchers. To reference the situation, for example, two-thirds of the six million authors in MEDLINE share a last name and first initial with at least one other author, and just one ambiguous name can involve up to eight persons on average [7]. ORCID can solve this problem by providing a central registry of unique identifiers for researchers, as well as its capability for Chinese character encoding and multiple name input [5]. Importing data from Scopus to ORCID is straight forward. With the name fields in both sources mapped, the author can first identify personal Scopus Author profile, select the appropriate items, then export them into the ORCID system. For CLISA, this study uses a dedicated data harvester which, after parsing, distinguishes the data fields, then makes use of the identifier attribute for Author to single out the field value. With an ORCID API, each CLISA record can be transcribed to the ORCID Messages Schema XML format. The POST command of cURL can export CLISA data to ORCID, and the PUT command updates the related data.\

Fig. 1. Workflow of exporting data from Scopus, CLISA to ORCID

Migrating Researcher from Local to Global: Using ORCID

3.2

115

ORCID to TLIS VIVO

Through the use of ORCID API, and GET command of cURL, three ORCID data formats are obtained: Profile, Bio, Works, all in ORCID Messages Schema XML. Technically, such XML formats cannot be directly imported into TLIS VIVO because VIVO uses RDF triplestore format to store records, with every data entry composed in the form of subject-predicate-object, as required by different ontologies and the publication requirement of Linked Data. For this study, the XML exported from ORCID is parsed to map the corresponding CSV (comma-separated values) file. According to VIVO’s Data Ingest Guide [1], a local ontology and workspace model is created, then the CSV file is converted to RDF and the tabular data mapped onto ontology. Next, SPARQL query is used to construct the ingested entities. The diagram below shows the following processes: To complete the process of ORCID data conversion, they are loaded to a Web repository. For vocabularies, popular ontologies such as BIBO, FOAF, and SKOS are used. Furthermore, to make TLIS VIVO a node in the LOD Cloud Diagram, the following requirements need to be met: access of the entire dataset via RDF crawling, an RDF dump or a SPARQL endpoint.

Fig. 2. The process from ORCID to TLIS VIVO

116

C.-C. Chen, M.W. Ko, and V. Tsung-yeh Lee

Throughout the implementation, there were challenges. One needs to have knowledge of Semantic Web technologies, including RDF and ontologies, and programming skills such as SPARQL. Incidentally, Thorisson [6] says getting data out and into VIVO triplestores is a challenge. Knoblock et al. [4] also remark that “mapping existing legacy data to the VIVO ontology and generating the corresponding RDF data that can then be loaded into VIVO can be very challenging”, which led to the devleopment of Karma, an open-source information integration tool that allows a user to quickly map legacy data sources into RDF for loading into VIVO.

4

Conclusion

Our study provides a practical approach in building a prototype of TLIS VIVO. We use ORCID to integrate the inflow of data located in CLISA and Scopus. With VIVO featuring Linked Data, TLIS VIVO can serve the purpose of enhancing the visibility, discovery and collaboration of local LIS scholars. During the study process, we exploit other people’s efforts, such as the data integration tool Karma. We learn about the current ORCID capabilities, for instance, Profile, Biography, and Works, while Affiliations, Grants, and Patents are still under development. Presently, ORCID Import Tools can only import data from Scopus and CrossRef. It is expected that other APIs can be provided in the near future. This study shows that, by combining VIVO, ORCID technologies, it can integrate local- and global-based datasets. It is hoped that there are more related studies to show a greater prospect and dimension of using technologies to serve LIS and other subject fields for discovery and use.

References 1. Blake, Jim: VIVO data ingest guide. VIVO release 1.2 (January 26, 2013), https:// wiki.duraspace.org/display/VIVO/VIVO+1.2+Data+Ingest+Guide 2. Chen, C.-C., Chen, C.-H., Lai, C.-C., Lu, C.-H., Yu, C.-Y.: Implementation of Open Scholar Platform and Integration of Open Resources in National Taiwan Normal University (NTNU). In: Chen, H.-H., Chowdhury, G. (eds.) ICADL 2012. LNCS, vol. 7634, pp. 344– 346. Springer, Heidelberg (2012) 3. Devare, M., et al.: VIVO: connecting people, creating a virtual life sciences community. DLib Magazine 13(7/8) (2007) 4. Knoblock, C.A., et al.: Mapping existing data sources into VIVO (2012), http://isi.edu/integration/karma/ other-materials/vivo2012/Karma-Abstract-v2.pdf 5. ORCID. Our mission (2013), http://orcid.org/content/mission-statement 6. Thorisson, G.: The VIVO platform and ORCID in the scholarly identity ecosystem. In: VIVO Conference (August 2011), http://www.slideshare.net/ gthorisson/vivo-conference-aug-2011-the-vivo-platform-andorcid-in-the-scholarly-identity-ecosystem 7. Torvik, V.I., Smalheiser, N.R.: Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data 3(3), 11 (2009)