annotations and other tasks for which users deploy the ontologies. We have ... Specifically, we present the Notes ontology that we use to repre- sent the different types of ...... of diseases, injuries and causes of death: 10th revision,". Technical ...
Semantic Infrastructure to Enable Collaboration in Ontology Development Paul R. Alexander, Csongor Nyulas, Tania Tudorache, Patricia L. Whetzel, Natalya F. Noy, Mark A. Musen Stanford Center for Biomedical Informatics Research, Stanford University, US {palexander,csongor.nyulas,tudorache,whetzel,noy,musen}@stanford.edu ABSTRACT In many scientific disciplines, and in biomedicine in particular, researchers rely on ontologies to enable them to annotate and integrate their data. These ontologies are living and constantly evolving artifacts and the ontology authors must rely on their user community to ensure that the coverage of the ontologies is sufficient for annotations and other tasks for which users deploy the ontologies. We have developed a distributed collaborative mechanism to enable users to provide feedback to ontology authors, to request new terms, and to use provisional terms in their applications. The ontology authors can use the same infrastructure to explore this feedback in their ontology-editing environment, to update the ontology, to record their decisions on the users' requests, and to publish both the updated ontology and the information on how they acted on the requested changes. Specifically, we present the Notes ontology that we use to represent the different types of user feedback and change requests, the service-oriented Notes API to access the information that conforms to this ontology, and the two ontology editing and publishing environments, WebProtégé and NCBO BioPortal, that use this API to provide services for their users.
a reality. We then discuss the tools that we have developed to facilitate and to streamline this process. We perform our research for supporting the ontology lifecycle in the context of two tools: • WebProtégé is a web-based ontology-editing environment, which supports collaboration, enabling users to edit an ontology simultaneously, to carry out discussions, and to add comments to the terms, providing their rationale for changes [12]. • BioPortal is an ontology library, which enables users to publish their ontologies on the Web and enables community members to provide feedback on the ontologies [2]. Both WebProtégé and BioPortal focus on the idea of community collaboration in ontology development. The two tools support complementary stages in the ontology lifecycle: ontology publishing and ontology editing [10]. Indeed, we are working on providing seamless integration between these two stages, enabling an easy flow of information from the publishing tool to the editing one and vice versa. This paper makes the following contributions: • We identify the requirements for representing contributions from ontology users to ontology authors, based on a large number of use cases.
KEYWORDS: Ontology development, Collaborative software, Semantic Web, Web services
• We define an ontology that represents different types of notes and change requests for ontology evolution. • We present a service-oriented architecture and a Notes API that enables both ontology editing and ontology publishing tools to share the same infrastructure for change requests.
1. USER FEEDBACK AND ONTOLOGY DEVELOPMENT In many scientific disciplines, and in life sciences in particular, ontologies have become a staple of data management. Researchers actively use ontologies to tag (annotate) their data, to perform naturallanguage processing, or to describe the structure of their data [14, 5]. Usually, ontology authors develop an ontology, publish successive versions of it, and then the community members use the ontologies for their data-management tasks [17, 9, 13]. Critically, the scientists then try to contribute back their comments on the ontologies, and request new terms that the ontology does not have yet. Ontologies become living and actively evolving artifacts, with community actively contributing to them. In this paper, we report on a number of use cases, where such vision of ontology evolution has become
• We describe the integration of the Notes API into two production tools: WebProtégé and BioPortal. • We define the workflow for creating provisional ids to enable users' applications to use the requested terms immediately, before they get integrated into specific ontologies.
2.
USE CASES FOR COMMUNITY FEEDBACK
In this section we describe several use cases that we collected through our interactions with users of BioPortal and WebProtégé. These use cases guide our requirements (Section 3) and development.
Gene Ontology: The process of community contributions to the ontology has been ongoing for many years as part of the evolution of the Gene Ontology (GO) [8]. GO is arguably one of the most widely used ontologies, and one of the largest. It contains more than 33,000 classes. Over the past decade, a large number of researchers have contributed to GO in various ways. By account of the GO developers, there are several dozens of people who routinely submit term requests to the GO issue tracker, which the GO developers maintain on SourceForge. In 2010, for example, users have made more than 1,200 such requests. Essentially, any time a curator comes across a term in the literature that she thinks should be in GO, but is not, she submits a term request, and there are 2-3 people who act on these requests. New versions of GO are published daily. After GO developers add a term and update a tracker item, the curators can go back and create the annotation that they wanted to create in the first place, using this new term. Biomedical Resource Ontology: The Biomedical Resource Ontology (BRO) is an ontology developed as part of a large collaboration effort to enable semantic annotation and discovery of biomedical resources published in biositemaps1 [16]. BRO provides the shared vocabulary that authors of biositemaps can use to describe the resources that their laboratories provide. Search tools can then use the data in biositemaps to enable search for specific types of resources. Researchers at several institutions contribute directly to the development of BRO, suggesting not only new terms, but also synonyms of these terms, their position in the class hierarchy, and definitions. They also actively comment on existing terms, trying to reach consensus on the names, presentation, and so on. MIAMExpress: Publishing microarray data often requires submission of the data to a public repository. The MIAMExpress data submission tool is a Web-based tool that facilitates data submission to the ArrayExpress repository [7]. Users must follow the MIAME guidelines [6, 1] and put the information in required fields. MIAME guidelines also constrain the sets of allowed values for many of these fields to value sets from the MGED Ontology (MO). For example, values for experiment design type and experimental factor must come from MO. When a user cannot find an appropriate term in the value set, he can select ``other'' and put in the new value in a text field. In order for this new term to be used effectively in search and linking to other related submissions in the repository, a MO curator should add it to the ontology. However, our observation is that in practice, this linking happens very rarely, and the term remains a text string in the ``other'' category. Providing a layer that would submit such a term automatically to the MO curators for inclusion in the ontology would enable the use of this term in further annotations (see Section7 on enabling such use immediately after the request). ODIE: Researchers at the University of Pittsburgh Department of Biomedical Informatics are exploring the use of Natural Language Processing techniques for extracting data from pathology reports and using this information to enrich ontologies. They designed the Ontology Development and Information Extraction (ODIE) toolkit2 1 2
http://biositemaps.org/ http://www.bioontology.org/ODIE-project
with the following goals: 1) to annotate document sets with ontology concepts; 2) to visualize the relationships between documents and these ontology concepts; and 3) to suggest new concept additions to ontologies. For concept suggestion, ODIE uses both symbolic and statistical methods (lexico-syntactic patterns, similarity and mutual information); users may also include their own UIMA engines as additional method. The ODIE toolkit must be able to submit the suggested new concepts directly for review by the ontology curators through an API. Phenoscape: The Phenoscape project uses phenotype ontologies to describe evolutionary and mutant phenotypes of different species, and to link the phenotypes to genetic variations. The researchers use the Phenex annotation tool [3] to enable scientists to create the annotations with ontology terms. When the scientists encounter a term that does not exist in any of the ontologies that they are using, they must be able to (1) request the new term to be added to an ontology; and (2) create a provisional annotation with this requested term. The Phenex tool can then automatically update the annotation when the term is added to an ontology. This project needs the ability to post term requests programmatically, ideally through a web service. This term request must contain not only the term name, but also, potentially, synonyms, and possible definitions, since the scientists might want to provide more information about the requested term. Furthermore, in the case of Phenoscape, the terms can potentially go into any of several ontologies. Usually, there is a group of ontologies that the project uses routinely, and the curator is satisfied if the term is included in any of these ontologies. The submitters of the request may specify the set of ontologies where the term should go into. We can take the idea of requesting the term and specifying the set of ontologies where the term might go to its logical next step to develop a ``term marketplace.''3 In this case, users simply declare in some shared space that they want a term with a particular name and particular description, and they do not care at all which ontology author chooses to include the term. The ontology authors come to this marketplace and ``grab'' the terms that they believe should be in their ontology. The ontology authors can also indicate that the term already exists in their ontology, and one should simply use the existing term rather than create a duplicate elsewhere. The ability to use an existing term in lieu of the newly requested term is key to avoiding proliferation of similar terms across ontologies. The use cases that we just described cover a variety of scenarios: the use of comments and notes as a way to reach consensus and to record the decision-making process in ontology development (BRO); request for new terms resulting from manual annotation of literature (GO); the need for provisional term ids when such requests are created (Phenex); request from new terms that result from filling out web forms with pre-defined value sets for fields (MIAME); requests for new terms that result from automatic ontology-learning process (ODIE). In our surveys of Protégé and BioPortal users, the need for creating notes on classes and term requests has come up numerous times in different projects. These use cases provide a set of requirements for the support for community feedback in the on3
The term ``term marketplace'' was coined by Maryann Martone, personal communication.
tology editing and publishing. We describe these requirements in Section 3.
Annotation title author created modified
3. REQUIREMENTS FOR NOTES SUPPORT As the result of studying the use cases and discussing the requirements and missing features in today's tools with the representatives of the projects that we described in Section 2, we have identified the following requirements for an infrastructure that supports notes and change requests.
Change Proposal status archived comment
Structured term requests One of the most common types of feedback, at least for biomedical ontologies, is request for new terms to be added to an ontology. These requests may specify not only the preferred name for the requested term, but also its synonyms, definition, and other fields.
New Term Proposal Provisional ID Preferred name Synonym Definition Superclass Ontology
Read and write API access Users must be able to provide feedback not only through a form on the web, but also through an API. Such capability would enable developers to integrate access to feedback functionality in their own tools. For instance, a tool for tagging texts with ontology terms interactively can include both the search facility that enables users to search an ontology library to find an existing term, but also an interface to request a term if it does not exist. An API would allow the tool to post such a request back to the ontology library. Mechanism for archiving After ontology authors react to the user request---either by updating an ontology, or by commenting on the request, or by deciding not to act on it---there must be a clear indication of such status. For instance, a request may be archived, indicating that it should no longer be visible in the basic interface, but still available for specialized searches. Our users have indicated very strongly that these requests should never be deleted, but rather kept permanently, to provide provenance information for the terms. Metadata about change requests Different ontology projects have different workflows for making changes and creating new terms [15]. Thus, the term proposals may undergo different stages of review, and the information about these stages must be accessible as part of the metadata of the change requests. For example, the stages might be ``under review,'' ``approved by X,'' and so on. Similarly, the metadata can store the project-management information, such is who is assigned to work on the issue in the request. For each stage, we want to store additional provenance information, such as the name of the user who changed the status and the date. Support for proposing a hierarchy of terms In some cases, users may want to propose not just a single term, but a whole hierarchy of terms. Thus, the proposal includes not only a set of single terms with the corresponding proposed preferred names, synonyms, and so on, but also the hierarchical structure that should link these terms. Provisional ids that are available immediately For many annotation applications, users need to use the new terms immediately after they request them. In other words, they do not
Example
Comment
Explanation
New Attribute Value Proposal Attribute New value Replaces current value
New Relationship Proposal Relationship type Relationship source Relationship target
Figure 1. ChAO Design. Classes in ChAO are used to represent user notes and requests. want to wait until ontology authors act on their requests and include a proper term in the ontology. Thus, immediately after creating a term request, users should be able to get a provisional id for the new term. When the term is incorporated into an ontology, or ontology authors determine that an existing term should be used instead, this information must be accessible through the provisional id. The users must be either redirected to the new term when accessing the provisional id, or get the information on how to access the term and the status of the provisional term. We discuss more detailed requirements and considerations for provisional ids in Section 7. Domain-specific types of structured notes In many domains, there are specific types of structured user feedback that are relevant only for that domain. The mechanism for representing notes must be extensible to accommodate domain-specific note types. For instance, for biomedical ontologies, users may need to be able to add Semantic Type from the Unified Medical Language System (UMLS) [4] to characterize better terms in their ontology. In another example, the World Health Organization publishes guidelines for using the terms from its terminology, the International Classification of Diseases (ICD) [18], for coding and reporting.4 These guidelines can be another type of notes associated with the corresponding terms. 4
http://www.eicd.com/Guidelines/Default.htm
4. TYPES OF NOTES AND CHANGE REQUESTS
Biomedical Resource Ontology (BRO)
NCI Thesaurus (NCIT)
Class: Resource
As the result of our use case analysis (Section 2), we have defined the Note types described in the following list and presented in Figure 1. We represent different types of notes as classes in the Changes and Annotations Ontology (ChAO) [11]. ChAO provides the infrastructure to represent and store all the needed information about the notes. More specifically, we can represent the different types of notes by instantiating subclasses of the Annotation class from ChAO, while we store all other information as property values attached to those instances. Using an ontology-driven infrastructure for storing notes provides us yet again with a great flexibility when it comes to the need to add support in BioPortal for new note types, or for new information about existing note types.
Class: Physiology Facility
Term request: Note_X345 Preferred name: Exercise Study Facility Superclass: http://...#Physiology_Facility
Comment: Note_X895
Figure 2. Instances of Notes. The boxes at the top of the figure represent the ontologies and classes in them. The bottom of the figure shows the Notes knowledge base. Instances in this knowledge base refer to classes in different ontologies, if the note is on a class, or to other instances, if the note is a comment in a thread. New value: the new value for the property; Replaces the current value: a flag that indicates whether the new replaces the current value (which one, in case of multiple values) or in addition to the current value(s)
Provisional ID: a provisional id for the term (see Section 7);
Synonym: proposed synonyms for the class; Definition: a proposed textual definition; Superclass: location in the class hierarchy; Ontology: an ontology or a set of ontologies where the new term should be defined; Comment: user comment on the term request. New relationship proposal: a request to establish a link between two classes in an ontology. It contains the following parameters (the list is not exhaustive): Relationship type: is-a, part-of, etc.; can be a URI of the relationship; Relationship source: the class that is the source of the new relationship; Relationship target: the class that is the target of the new relationship; Comment: user comment on the request. New attribute value proposal: a request to add an attribute value to an attribute in a class. Attribute: for example, documentation, definition, synonyms, etc.; can be a URI of the attribute property;
Basic comment: Note_X565 Comment: Seems odd that Tobacco Use as a term is disconnected from the term Smoking Behavior.
Comment: Can you expand on what an exercise study is?
New term proposal: a request for a new term in an ontology, usually a new class. It contains the following parameters (the list is not exhaustive):
Preferred Name: a proposed preferred name for the class;
Class: Tobacco use
Notes Knowledge Base
Basic comment: a basic type of note that contains a comment from the user and the metadata about the comment, such as the user name and timestamp, the ontology term to which this comment is attached. Change Proposal: a proposal for a change in the ontology. In addition to the metadata as for the basic comments, the proposal also contains additional fields, corresponding to the contents of the proposal, its status in the review process (e.g., under review), and its archival status. There are different types of change proposals:
Class: Activity
We represent each note as an annotation attached to a specific class (in a specific ontology version) or to another annotation (if it is a response to a comment). A Notes knowledge base is an OWL ontology that imports ChAO and stores instances of the ChAO classes. Each instance in the Notes knowledge base corresponds to a userprovided annotation (a note) on an annotatable thing. Figure 2 gives an example of several instances of the classes in ChAO that refer to various classes in the ontologies in BioPortal.
5.
APIS FOR ONTOLOGY NOTES
We have implemented a service-oriented architecture to support user feedback through different stages of the ontology lifecycle. Specifically, our architecture enables different tools, such as WebProtégé and BioPortal, to share the user notes and feedback, and allows users of both tools, with appropriate permissions, to make changes to the notes. Figure 3 shows the main components of the architecture and the interaction between the tools. We provide two types of APIs for other applications to use in order to access the Notes: a Java Notes API and a service API, based on REST.
Figure 3. System Architecture. The client applications, such as WebProtégé and BioPortal access the notes through the REST services. The REST services wrap around a Java API for Notes (Notes API), which, in turn, uses the Manchester OWL API to access the OWL knowledge base that stores the note instances. Java applications can also access the Notes API directly, bypassing the REST layer.
5.1 Java Notes API
5.2
We have developed a Notes API, which is a Java API that provides an abstraction layer to access the notes and the corresponding structure. We designed the main Java classes in the Notes API to match the classes in the ChAO ontology. For example, the top level class in ChAO for representing notes is Annotation. We created a Java class Annotation that has methods for accessing the property values of the instances of the Annotation class, such as getAuthor(), getCreatedAt(), and getArchived().
In addition to the Java Notes API, we have developed a REST service API that wraps the Java API and provides the REST services with the same functionality as part of the REST services provided by BioPortal. BioPortal adds the version-management capabilities to the API.
Internally, the Notes API uses the OWL API5 to provide fast access to the Notes knowledge base. The Notes knowledge base can be stored either as OWL files or using the OWL API database backend. BioPortal uses the Notes API with the database backend to store the notes. The main entry point of the API is the class NotesManager that provides methods for the following types of actions: • create notes of different types that are initialized with different property values; because the ChAO is extendible the Notes API allows creation of notes of any user-defined type. • update a note, for example, to modify a property value on a note instance.
Because, BioPortal stores multiple versions of each ontology, it must enable both access to notes that were created for a specific ontology version and access to all the notes on the ontology, regardless of the version. In many cases, even when users are viewing a newer version of the ontology, they still want to see notes created for an older version because the vast majority of them are still valid. For each note, we record the specific version id for which the note was created. However, the API users can access all the notes associated with the ontology, regardless of the version. The caller can also request to filter the notes for a specific version, or the most recent version of the ontology. This feature allows ontology authors to aggregate feedback across ontology versions or to focus on a version of interest as needed. The REST service that wraps the Notes API provides standard GET, POST, UPDATE, and DELETE HTTP verbs for notes. For example, users can use the GET calls to query the notes knowledge base in several ways:
• delete or archive notes and note threads.
• Given an ontology id, get all notes for the ontology.
• retrieve all notes defined in an ontology; filter the result by author, by date range, by a set of ontologies, or any other attribute.
• Given a note id, get all information about a specific note.
• retrieve notes counts attached to an annotatable thing. 5
The REST API
http://owlapi.sourceforge.net/
• Given an ontology id in combination with a term id, get all notes associated with a particular term. When a GET request is processed, the REST service returns the notes represented as XML with a common set of elements shared
between all types of notes in what we call a 'noteBean.'6 Each note also has an optional element called 'values' that contains information specific to that note type. For example, details about a new term proposal are stored as children in the 'values' element while the note id and ontology id are stored as children of the noteBean. This structure allows us to have a single service endpoint for all of the note types and to provide for additional note types as user requirements evolve. Making a POST creates a new note using information provided in the parameters, including what type of note is being created, who is creating it, and what the contents of the note are. UPDATE calls can be used to edit the contents of a note, archive a note, or change its status. When a DELETE call is made with the note id and corresponding ontology id, that note is permanently removed from the system. We provide the DELETE call only for administrative functions, as we do not expect common workflows to require note deletion (as opposed to archiving). Future additions to the REST services will enable searching across the entire notes corpus, regardless of ontology association. This feature will enable keyword searches, including partial and wildcard matching, and aggregation of notes based on author.
6. VALIDATION: USE OF NOTES IN WEBPROTÉGÉ AND BIOPORTAL Both the BioPortal and WebProtégé user interfaces use the Notes REST service API (Section 5) to access the same set of notes provided by users. We will describe this use by walking through a use case scenario that brings the two tools together. Consider, for example, the development of BRO, an ontology that we mentioned earlier as one of our use cases (Section 2). The BRO developers publish the ontology in BioPortal to enable the user community to discuss it and to contribute to it. The ontology has close to 500 classes and there are currently 66 notes on BRO in BioPortal. These notes range from comments on the existing classes to new term requests. When submitting a request, a user fills in when specifying a new term request. The user specifies the preferred name for the term, synonyms, definition, and a reason for change. Furthermore, if the request is added to a class in the existing ontology, we assume that the requested term should be a subclass of this class. We do not currently have a user interface to request terms without specifying a single ontology where it should go. Such requests must be made through the REST API. Our goal is to support the entire ontology development lifecycle by providing a seamless transition between the publishing and editing phases. In the publishing phase, the users add feedback and create change proposals in BioPortal. In the editing phase, a curator of the ontology, or users with editing privileges, open the working copy of the BioPortal ontology in WebProtégé to review the change proposals and to implement the changes in the ontology. When they edit the ontology in WebProtégé, the curators will also change the 6 Examples of the returned XML for each notes type can be found at http://www.bioontology. org/wiki/index.php/Ontology_Notes
Figure 4. Term Request in WebProtégé. A user has entered a term request in BioPortal. WebProtégé uses the REST services to access the information about the term request. Curators may browse and edit the class hierarchy (1) and the properties of a class (2), as well as review and implement the BioPortal change proposals (3). status of a proposal to reflect their action, and may document their decision by attaching comments or other note types to the proposal. The status and explanation of the decision will be reflected back into BioPortal.7 Once the curators approve or reject the change proposals, they may create a new version of the ontology that will be uploaded into BioPortal, and a new revision iteration begins. To support this integration, we implemented a widget in WebProtégé that accesses the notes and proposals from BioPortal via the REST services and presents them to the user. Figure 4 shows the same new term request for a Exercise Study Facility class that a user made in BioPortalẆhen the user clicks on the Physiology Facility class in WebProtégé (Figure 4--1), she will see the BioPortal notes and proposals attached to that class in the BioPortal widget (Figure 4--3). The curator will also change the status of the proposal by using a drop-down list showing the allowable values.
7.
DEFINING PROVISIONAL TERMS
In many cases, after a user has requested a term, she would like to use it immediately, before the term gets incorporated into an ontology. For example, suppose a user is tagging a scientific text with ontology term and she needs a term that does not exist in the ontology. She requests the term, and gets a provisional id, which she can use immediately. She annotates the text with this provisional id. Eventually, ontology authors incorporate this term into the ontology and the user may (or may not) replace the provisional id with the permanent id.
7
This feature is not yet released and is still work in progress.
7.1 Required services Several services must exist to support provisional terms: • Create a provisional id (in the form of an HTTP URI). • Query to get the proposed information for the term, based on the provisional id (e.g., get the preferred name for the provisional id that was used as a tag for the text). • Query to get the status of the provisional term (e.g., has it been incorporated into an ontology? has it been linked to an already existing term?) • Given a provisional id, get an id for an ontology term that was created for it • Get all provisional terms requested for a particular ontology or a set of ontologies • Subscribe to notification when a provisional term gets linked to a permanent term
7.2 Implementation of Provisional Terms We represent provisional terms as instances in a special ontology.8 This ontology has only one class at the moment, ProvisionalTerm, and all provisional terms are instances of this class. Note that the provisional terms are different from the instances of the Proposal class; the former is the new term (and will become equivalent to the actual term later) and the latter corresponds to the request itself. Each instance will have the following properties: id: the provisional id for the term; we use the NCBO purl server and generate the ids of the form http://purl.bioontology. org/ontology/provisionalTerms/; preferredName: proposed preferred name for the term; ontology: the ontology where the term should be defined (can be empty if the requestor does not care about the specific ontology, or can contain multiple ontologies); request: link to the Note that represents the request for the term; this link provides access to the discussion that followed the term request, if any, as well as the additional metadata about the request. status: status of the term (e.g., under review, rejected, new term created, linked to existing term) permanentId: the id of the prominent term that implemented the term request We can easily get access to all provisional terms in the system, by querying for instances of this class. We can also limit the query by the value of the ontology field, if one needs the requested terms for the ontology. When an ontology author creates a permanent term to correspond to the provisional term, we update the status field and add the id of the
new term to the permanentId field. We also create an owl:sameAs statement to indicate that the provisional term is equivalent to the newly created term and can be used with it interchangeably.
8.
DISCUSSION
WebProtégé and BioPortal have different sets of user accounts. However, in order for the integration to be truly seamless, users must be able to preserve their identity from one tool to another. We are currently implementing the use of OpenID protocol for both tools, which would enable users to login to both WebProtégé and BioPortal with the same credentials. In most workflows, there will be at least two ``active'' versions of an ontology: one is a published version that users can view and comment on and another is the ``working'' version where authors are applying their changes. Thus, our mechanism for attaching notes to classes must be flexible enough to enable notes to show up in both cases. Notes must have both a reference to a class in a specific version (e.g., the published version, which was the one the user saw when she created a request), and a global ontology id, enabling the editing tool to access this note, even when the users are working with a different version. We use this approach in our Notes API. We have validated the Notes API not only by accessing it through the REST services as described in Section 6, but also by accessing it directly through the Java API in WebProtégé (Figure 3). WebProtégé provides support for collaboration that allows users to attach notes and proposals directly in WebProtégé and without linking it to BioPortal. We compared this Java implementation of the Notes API to a previous implementation used in a production setting for WebProtégé.9 The Notes API implementation proved to be suitable and more flexible than the previous implementation. We have made significant progress in supporting the integrated publishing and editing workflow, such as accessing the notes and proposals in both BioPortal and WebProtégé. To minimize the workload of the ontology curators, we would like to provide them with automation support for several steps of the workflow. For example, most change proposals are structured, and we imagine that in a future release of WebProtégé, curators will be able to review a change proposal, and then click on a button "Implement change" that will automatically create the new terms, or update the ontology based on the structured information available in the proposal. Another step that we would like to automate is the upload of a new version of the ontology from WebProtégé directly to BioPortal as it is required at the end of a revision iteration. We would like to emphasize that our solution is not limited to the biomedical domain. It so happens that our repository is a repository of biomedical ontologies. However, there is nothing in the Notes API itself or in our use of it that is specific to biomedicine. Finally, the software is open-source. The software is domain-independent and can be used for an ontology repository in any domain or for a domain-independent one. The Notes API is available in
8
At the time of this writing, the implementation is in progress. We expect the implementation to be in production by the time of the workshop
9
http://protegewiki.stanford.edu/wiki/Changes_Tab
BioPortal and can be accessed through the BioPortal web services.10
9. CONCLUSIONS We have implemented an ontology-based architecture to support comments and terms proposal for collaborative ontology development. Using an ontology to represent notes allowed us to easily expand the types of notes and term proposal that we have and provide for a flexible set of attributes for each note and proposal type. We have implemented an API that enables other clients to use the same Notes knowledge base that BioPortal and WebProtégé use. Indeed, we showed how the WebProtégé and BioPortal user interface act as clients in the service-oriented architecture for accessing notes. We are currently expanding the Notes capability to enable the creation and maintenance of provisional terms.
ACKNOWLEDGMENTS This work was supported by the National Center for Biomedical Ontology, under roadmap-initiative grant U54 HG004028 from the National Institutes of Health and by NIH grant HL087706 and by the NIGMS Grant 1R01GM086587-01. Protégé is a national resource supported by grant LM007885 from NLM. We are grateful to Chuck Borromeo, Sherri de Coronado, Peter Lyster, Maryann Martone and Hilmar Lapp for their help in defining the requirements for provisional terms.
REFERENCES [1] "Minimum Information About a Microarray Experiment MIAME," MAIME - Workgroups - FGED, 2009. Available: http://www.mged.org/Workgroups/ MIAME/miame.html [2] "BioPortal," Welcome to BioPortal, 2011. Available: http://bioportal.bioontology.org [3] J. P. Balhoff, W. M. Dahdul, C. R. Kothari, H. Lapp, J. G. Lundberg, P. Mabee, P. E. Midford, M. Westerfield, and T. J. Vision. "Phenex: Ontological annotation of phenotypic diversity," PLOS One, 5(5):e10500, 2010. [4] O. Bodenreider. "The Unified Medical Language System (UMLS): integrating biomedical terminology," Nucleic Acids Research, 32(suppl 1):D267--D270, 2004. [5] O. Bodenreider and R. Stevens. "Bio-ontologies: current trends and future directions," Briefings in Bioinformatics, 7:256--274, 2006. [6] A. Brazma, et.al. "Minimum information about a microarray experiment (MIAME)-toward standards for microarray data," Nature Genetics, 29(4):365--71, 2001. [7] A. Brazma, H. Parkinson, U. Sarkans, M. Shojatalab, J. Vilo, N. Abeygunawardena, E. Holloway, M. Kapushesky, 10
http://bioportal.bioontology.org
P. Kemmeren, G. G. Lara, A. Oezcimen, P. Rocca-Serra, and S. A. Sansone. "ArrayExpress--a public repository for microarray gene expression data at the EBI," Nucleic Acids Res, 31(1):68--71, 2003. [8] G. O. Consortium. "The Gene Ontology (GO) project in 2006," Nucleic Acids Research, 34(suppl 1):D322--D326, 2006. [9] J. S. Madin, S. Bowers, M. P. Schildhauer, and M. B. Jones. "Advancing ecological research with ontologies," Trends in Ecology & Evolution, 23(3):159--168, 2008. [10] N. Noy, T. Tudorache, C. I. Nyulas, and M. A. Musen. "The ontology life cycle: Integrated tools for editing, publishing, peer review, and evolution of ontologies," In AMIA Annual Symposium, Washington, DC, 2010. [11] N. F. Noy, A. Chugh, W. Liu, and M. A. Musen. "A framework for ontology evolution in collaborative environments," Fifth International Semantic Web Conference, ISWC, volume LNCS 4273, Athens, GA, 2006. Springer. [12] "Protégé," The Protégé Ontology Editor and Knowledge Acquisition System, 2011, Available: http://protege.stanford.edu/. [13] R. G. Raskin. "SWEET 2.1 Ontologies," AGU Fall Meeting Abstracts, pages B6+, Dec. 2010. [14] D. L. Rubin, N. H. Shah, and N. F. Noy. "Biomedical ontologies: a functional perspective," Briefings in Bioinformatics, 9(1):75--90, 2008. [15] A. Sebastian, N. F. Noy, T. Tudorache, and M. A. Musen. "A generic ontology for collaborative ontology-development workflows," 16th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2008), Catania, Italy, 2008. Springer. [16] J. D. Tenenbaum, et. al. "The biomedical resource ontology (bro) to enable resource discovery in clinical and translational research," Journal of Biomedical Informatics, 44:137--145, 2011. [17] T. Tudorache, N. F. Noy, S. Tu, and M. A. Musen. "Supporting collaborative ontology development in Protege," 7th International Semantic Web Conference (ISWC 2008), Karlsruhe, Germany, 2008. [18] World Health Organization. "International classification of diseases; manual of the international statistical classification of diseases, injuries and causes of death: 10th revision," Technical report, 1993.