Harvest: A Web-Based Biomedical Data Discovery ...

3 downloads 194846 Views 120KB Size Report
Harvest: A Web-Based Biomedical Data Discovery and Reporting Application. Development Platform. Michael J. Italia1, Jeffrey W. Pennington1, Byron Ruth1, ...
Harvest: A Web-Based Biomedical Data Discovery and Reporting Application Development Platform Michael J. Italia1, Jeffrey W. Pennington1, Byron Ruth1, Stacey Wrazien1, Jennifer G. Loutrel1, E. Bryan Crenshaw III4, Jeffrey Miller1, Peter S. White1,2,3 1 Center for Biomedical Informatics, The Children’s Hospital of Philadelphia, Philadelphia, PA; 2Division of Oncology, The Children’s Hospital of Philadelphia, Philadelphia, PA; 3Department of Pediatrics, University of Pennsylvania School of Medicine, Philadelphia, PA; 4Mammalian Neurogenetics Group, Center for Childhood Communication, Children's Hospital of Philadelphia, Philadelphia, PA; Abstract Biomedical researchers share a common challenge of making complex data understandable and accessible. This need is increasingly acute as investigators seek opportunities for discovery amidst an exponential growth in the volume and complexity of laboratory and clinical data. To address this need, we developed Harvest, an open source framework that provides a set of modular components to aid the rapid development and deployment of custom data discovery software applications. Harvest incorporates visual representations of multidimensional data types in an intuitive, web-based interface that promotes a real-time, iterative approach to exploring complex clinical and experimental data. The Harvest architecture capitalizes on standards-based, open source technologies to address multiple functional needs critical to a research and development environment, including domain-specific data modeling, abstraction of complex data models, and a customizable web client. Introduction Adoption of electronic health record (EHR) systems by academic medical centers has created significant potential for the re-use of data captured in these systems for clinical and translational research. However, biomedical researchers often have difficulty navigating the large volumes of clinical and experimental data available from current medical and research information systems. Data sets useful to biomedical research are typically complex, highly dimensional, and temporal, with significant variation in granularity and sparsity observed across data dimensions. Research data present a similar mix of categorical and quantitative data types. The high volume of data generated by high-throughput experimental modalities such as next generation sequencing further amplifies this complexity. We addressed these needs by developing the Harvest framework. Our primary design goal for Harvest was to author a series of small, complimentary, reusable components that can be rapidly combined to build customized, purposebuilt applications. These components give members of the biomedical informatics community a means to readily generate highly accessible data resources for biomedical researchers who have limited understanding and access to informatics expertise. Methods Harvest provides a three-tiered application architecture using a relational database management system (RDBMS), web application server, and a JavaScript/HTML5 web browser client. The web application server components are written in Python using the Django web framework. All JavaScript client interactions with the server use a Representational State Transfer (REST) Application Programming Interface (API) to asynchronously send and receive data, thereby populating the user interface in a dynamic fashion in order to provide an end-user experience similar to a desktop application. Results and Discussion We have deployed Harvest instances to support several data rich translational research projects such as the NIDCDsupported AudGenDB (http://audgendb.chop.edu), The NHLBI Pediatric Cardiac Genomics Consortium Data Hub, and an internal data warehouse to support a NHGRI-supported Clinical Sequencing Exploratory Research program. In the latter example, Harvest makes tens of millions of genomic variants accessible to researchers in real time, integrating them with patient phenotype information and gene annotation resources from the public domain. All Harvest source code and documentation are available under a BSD license at http://harvest.research.chop.edu/. A fully functional demonstration is available at http://harvest.research.chop.edu/demo.

82

Suggest Documents