Developing a Web-based SKOS Editor

5 downloads 94232 Views 1MB Size Report
development of a web-based, free SKOS editor suitable for the ... AJAX calls to controller. Controller. APP. Technologies: Java JSR 286 Portlet. Web App.
Developing a Web-based SKOS Editor Fariba Fana San Diego Supercomputer Center La Jolla CA, U NITED S TATES [email protected]

Mike Conway Dept. Family and Preventive Medicine University of California San Diego La Jolla CA, U NITED S TATES [email protected]

Melissa Tharp and William Scuba and Wendy Chapman Dept. Biomedical Informatics University of Utah Salt Lake City UT, U NITED S TATES {melissa.tharp|william.scuba|wendy.chapman}@utah.edu Simon Jupp European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus Cambridge, U NITED K INGDOM [email protected] Abstract

right, and as a means for sharing resources on the semantic web. In this paper, we describe the development of a web-based, free SKOS editor suitable for the creation and curation of knowledge organization systems in general, and health-related, linguistically-oriented thesauri designed to support health-related Natural Language Processing (NLP) in particular. SKOS is a flexible standard designed to represent and encode a wide number of different types of knowledge organization systems. The standard is widely used3 by governments (e.g. United Kingdom Public Sector Vocabularies), scientific bodies (e.g. NASA vocabularies), and non-governmental organizations (e.g. UNESCO Thesaurus). In contrast to its sibling World Wide Web Consortium semantic web standard, the Web Ontology Language (OWL), SKOS follows the principal of “minimal ontological commitment” (Baker et al., 2013). That is, SKOS concepts and relations are lightly specified, using thesaurus-style relations like broader rather than logically formalized relations commonly used in OWL or the Basic Formal Ontology (e.g. IS-A). SKOS models consist of concept schemes which serve as containers for concepts. Concepts can be related together in various ways to create

The Simple Knowledge Organization System (SKOS) was introduced to the wider research community by a 2005 World Wide Web Consortium (W3C) working draft, and further developed and refined in a 2009 W3C recommendation. Since then, SKOS has become the de facto standard for representing and sharing thesauri, lexica, vocabularies, taxonomies, and classification schemes. In this paper, we describe the development of a web-based, free SKOS editor built for the development, curation, and management of small to medium sized lexicons for health-related Natural Language Processing.

1

Introduction & Background

The Simple Knowledge Organization System (SKOS) standard was introduced to the wider community by a 2005 World Wide Web Consortium (W3C) working draft1 and further developed and refined in a 2009 W3C recommendation2 (Lacasta et al., 2010). Since then, SKOS has become the de facto standard for representing and sharing thesauri, lexica, vocabularies, taxonomies, and classification schemes, both as a useful data format in its own 1 2

www.webcitation.org/6QmPKaUaP www.webcitation.org/6QmPUa0m8

3

105

www.webcitation.org/6QmoFFjdc

Figure 1: Screenshot of the web interface

a hierarchical structure. The most important of these semantic relations are, skos:broader (which can be read as “has broader concept”) and skos:narrower (which can be read as “has narrower concept”). Further, each concept can be associated with a number of lexical labels, including skos:prefLabel (a preferred label provides a mechanism to link a canonical label to a concept), skos:altLabel (an alternative label provides a mechanism to specify synonyms for the concept), and skos:hiddenLabel (a hidden label provides a mechanism to specify non-standard synonyms like colloquialisms or misspellings). Given its lightweight semantics, SKOS is particularly suitable as a basis for the development and sharing of vocabularies to support NLP tasks. A key part of the workflow in developing some NLP systems – in particular NLP systems designed to process health-related text – is the development of custom lexicons, including common abbreviations, synonyms (including slang terms), and truncations (Wu et al., 2012; Liu et al., 2013; Wilson et al., 2010; Mysl´ın et al., 2013). Since its inception in 2005, significant effort has been expended on the development of software tools for the SKOS standard, in particular in editing and viewing SKOS vocabularies. Notable examples in-

clude a SKOS Application Programming Interface (API) and editing module (Jupp et al., 2009) for Prot´eg´e 44 (the Prot´eg´e SKOS Editor), PoolParty, an online SKOS editing and manipulation tool (Schandl and Blumauer, 2010), and SKOS functionality built into the TopBraid Composer RDF editing platform5 , all of which facilitate the creation, development, and utilization of SKOS vocabularies. However, to the best of our knowledge, until now no free web-based SKOS editor has been available to the research community (note that PoolParty, although web-based, is a commercial product). In this paper we present a web-based SKOS editing tool that is suitable for developing and modifying the health-related lexicons necessary for large-scale information extraction from clinical notes and other health-related text, yet is also general purpose enough for any small-tomedium sized SKOS vocabulary development or curation project.

2

Implementation

A key advantage of using a web-based editor, is that it can be used anywhere, on any machine, without complex user installation. Given that our target users are clinicians and domain experts — i.e. those with 4 5

106

www.webcitation.org/6QmsQg41G www.webcitation.org/6QmsXXNCc

SKOS Editor Architecture Fariba Fana, NLP Group @ UCSD Dept. of BiomedicaI Information Division

Client

Liferay Server (Running on Tomcat) user authorization & authentication and user & user groups management

Controller

View JSON AJAX Calls

XML

-- Upload SKOS file from URL or local machine -- Create SKOS file from scratch -- Download own work in RDFXML format

APP

Model Response Request

Technologies: JSP/JavaScript/ YUI/Dojo/CSS/HTML

Technologies: Java JSR 286 Portlet Web App

Technologies: Java SKOS and OWL 2 API

HTML forms rendered by dialogs as user interacts with different parts of the application and send AJAX calls to controller

Portlet processAction and serveResource methods to answer AJAX calls from the view JSP

Methods to call SKOS and OWL2 API to do various operations

Functionalities / Operations

File / IO -- Upload SKOS file from URL -- Upload SKOS file from local machine -- Create SKOS file from scratch and download -- Download own work in RDFXML format

Schema & Concept

Object Properties

Data Properties

-- Create and rename schema -- Display concepts in hierarchical view/tree -- Add top concept and child child concept -- Remove concept -- Rename concept -- Move concept within the hierarchy

-- Display object properties organized in a table -- Add new object properties -- Remove object properties -- Edit object properties -- Object properties are grouped according to SKOS standard object property types

-- Display data properties organized in a table -- Add new data properties -- Remove data properties -- Edit data properties -- Add new type for properties -- Data properties are grouped according to SKOS standard

Figure 2: System functionality

little or no experiences of semantic web languages — rather than informatics professionals, ease of use is an important requirement. We took the decision to simplify the editor’s user interface as much as possible, hiding much of the general OWL/RDF functionality available in tools like Prot´eg´e and TopBraid Composer. A screenshot of the system is shown in Figure 1. The screenshot shows a SKOS thesaurus designed to drive a NLP system for the automatic identification of biosurveillance-relevant symptoms from Electronic Health Records (Conway et al., 2011). After some experimentation, we adopted an interface that consists of three panels, from left to right: • C ONCEPT PANE: An editable taxonomic hierarchy of SKOS concepts representing skos:broader and skos:narrower relations, which the user can click on to expand and collapse the tree • R ELATIONS PANE: An editable list of relations between concepts, particularly the skos:related, skos:broader, and skos:narrower relations

• L INGUISTICS PANE: An editable list of lexical items related to each SKOS concept (e.g. skos:prefLabel, skos:altLabel, skos:hiddenLabel) We identified six core functions necessary for the editor, partially based on the requirements identified by Jupp et al., 2009: • Create, edit and delete SKOS entities • Assert SKOS relationships between SKOS concepts (e.g. broader/narrower) • Assert and edit skos:prefLabel, skos:altLabel, and skos:hiddenLabel data properties • Visualize broader and narrower relationships in a browsable hierarchical tree • Support for SKOS documentation properties and Dublin Core • Provide alternative renderings (e.g. multilingual prefLabels) within the editor In building our web-based SKOS Editor, we relied heavily on existing OWL, SKOS and RDF tooling, in particular, the SKOS API (Jupp et al., 2009) (developed by author SJ) and the OWL API (Horridge

107

and Bechhofer, 2011). The system is a Liferay Portlet application (see Figure 2).

3

International Workshop on Health Document Text Mining and Information Analysis, Slovenia (LOUHI 2011), pages 75–82. Matthew Horridge and Sean Bechhofer. 2011. The OWL API: A Java API for OWL ontologies. Semantic Web, 2(1):11–21, January. Simon Jupp, Sean Bechhofer, and Robert Stevens. 2009. A flexible API and editor for SKOS. In Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications, ESWC 2009 Heraklion, pages 506–520. Springer-Verlag, Berlin. Javier Lacasta, Javier Nogueras-Iso, and Francisco Javier Zarazaga-Soria. 2010. Terminological Ontologies: Design, Management and Practical Applications. Springer-Verlag, Berlin.. Vincent Liu, Mark P Clark, Mark Mendoza, Ramin Saket, Marla N Gardner, Benjamin J Turk, and Gabriel J Escobar. 2013. Automated identification of pneumonia in chest radiograph reports in critically ill patients. BMC Med Inform Decis Mak, 13:90. Christian Mader, Bernhard Haslhofer, and Antoine Isaac. 2012. Finding quality issues in SKOS vocabularies. In Panayiotis Zaphiris, George Buchanan, Edie Rasmussen, and Fernando Loizides, editors, Theory and Practice of Digital Libraries, volume 7489, pages 222–233. Springer-Verlag, Berlin, . Nor Azlinayati Abdul Manaf, Sean Bechhofer, and Robert Stevens. 2012. The current state of SKOS vocabularies on the web. In Elena Simperl, Philipp ´ Cimiano, Axel Polleres, Oscar Corcho, and Valentina Presutti, editors, The Semantic Web: Research and Applications, volume 7295, pages 270–284. SpringerVerlag, Berlin. Mark Mysl´ın, Shu-Hong Zhu, Wendy Chapman, and Mike Conway. 2013. Using Twitter to examine smoking behavior and perceptions of emerging tobacco products. J Med Internet Res, 15(8):e174. Thomas Schandl and Andreas Blumauer. 2010. PoolParty: SKOS thesaurus management utilizing linked data. In The Semantic Web: Research and Applications, pages 421–425. Springer-Verlag, Berlin. Richard A Wilson, Wendy W Chapman, Shawn J Defries, Michael J Becich, and Brian E Chapman. 2010. Automated ancillary cancer history classification for mesothelioma patients from free-text clinical reports. J Pathol Inform, 1:24. Yonghui Wu, Joshua C Denny, S Trent Rosenbloom, Randolph A Miller, Dario A Giuse, and Hua Xu. 2012. A comparative study of current clinical natural language processing systems on handling abbreviations in discharge summaries. AMIA Annu Symp Proc, 2012:997– 1003.

Current Limitations & Future Directions

While the SKOS editor is suitable for building and curating special purpose SKOS vocabularies to run bespoke health-related NLP systems, it does have several limitations. First, the system is not suitable for editing large SKOS taxonomies (i.e. > 1000 concepts). Second, although there have been recent efforts in developing best practice guidelines for building SKOS vocabularies (Mader et al., 2012; Manaf et al., 2012) there may still be situations in which particular language features may not work as expected. Our long term goal is to integrate the SKOS editor as a lexicon development and management module within a comprehensive platform for developing clinical NLP algorithms. As part of this long term goal — and informed by the comments and suggestions of our early users — we are currently implementing several system enhancements, including adding version control tools, developing “wizards” to support the rapid creation of concept schemes, and most importantly, building multi-user functionality and collaborative editing.

4

Acknowledgements

We would like to thank Drs Tania Tudorache, Natasha Noy, and Matthew Horridge of Stanford University’s Department of Biomedical Informatics for their valuable guidance in using the OWL API. We would also like to thank Artem Khojoyan for his help with Liferay development. This work was funded by grants from the United States Veterans Administration (VA HIR 08-204), and the United States National Library of Medicine (NLM 1R01LM010964).

References Thomas Baker, Sean Bechhofer, Antoine Isaac, Alistair Miles, Guus Schreiber, and Ed Summers. 2013. Key choices in the design of Simple Knowledge Organization System (SKOS). Journal of Web Semantics, 20:35–49. Mike Conway, John Dowling, and Wendy Chapman. 2011. Developing an application ontology for mining free text clinical reports: The extended syndromic surveillance ontology. In Proceedings of the Third

108

Suggest Documents