cover page

7 downloads 181126 Views 416KB Size Report
for advanced data analysis. Healthcare Data Warehouses for Analytics. Healthcare data warehouses have been traditionally used as a means to practice ...
COVER PAGE AUTHORS: Anuradha Rangarajan, Indiana State University [email protected] Mailing Address: Attention: College of Technology Indiana State University College of Technology 101 North Sixth Street Terre Haute, IN 47809 Phone: 312.423.7893

David Batts, East Carolina University [email protected] Mailing Address: Department of Technology Systems College of Engineering and Technology Science and Technology Building, Suite 201 | Greenville, NC 27858-4353 USA Phone: 252.328.9673

Small Steps to Big Data - An SOA-Driven Framework for Meaningful Healthcare Analytics Implementation Anuradha Rangarajan, Indiana State University David Batts, East Carolina University

Abstract The quantity of healthcare related data that is captured, analyzed and used in decision making has grown substantially over the years, commensurate with the growth of the healthcare industry as whole. Hard copies of patient records and physician notes are being replaced by Electronic Health Records (EHRs). With the proliferation of social media, additional data is being generated through blogs, video-logs, twitter feeds and instant messages. The diversity and magnitude of data created cannot be effectively handled by traditional data warehouses for analytics purposes. The advent of Big Data technology has created immense prospects to leverage such data to make more meaningful diagnosis, predict outcomes, and reduce cost of care. However, not all healthcare organizations have the resources to migrate their existing data to a large-scale Big Data implementation overnight. To address this issue, we present a Service-Oriented Architecture (SOA) framework to integrate existing disparate data sources in a reusable manner for real-time use by Big Data technologies. Loosely-coupled service components abstract the data complexity that lie underneath, and expose it in a consistent manner for other services to utilize. We discuss building an eco-system that allows for an incremental approach to Big Data adoption with a view to maximize an organization’s analytics return-on-investment.

Introduction Healthcare data continues to grow in leaps and bounds. Historically, this data came from sources like hard copies of patient records and physician notes, and machine-generated data such as X-rays and Blood Pressure monitor readings. However, this is only a small portion of data captured. With the proliferation of the internet and computer technologies, the quantity, quality and rate of data captured has exponentially increased, especially in the last five years. The Health Information Technology for Economic and Clinical Health Act (HITECH) enacted under the American Recovery and Reinvestment Act of 2009 lays out several regulations to promote the

adoption and meaningful use of health information technology to improve health care quality, affordability, and outcomes (“HITECH Programs & Advisory Committees”, 2014). A key component of this act is the establishment of interoperable Health Information Exchanges (HIE) capable of exchanging and deriving use from patient-centered information in a secure manner (“Principles and Strategy for Accelerating Health Information Exchange”, 2013). This requires healthcare providers to enhance their current technology infrastructure to be able to process data in orders of magnitude that may be expected to be much greater than what they do today. The HITECH Act has established incentive programs for healthcare providers who adopt certified Electronic Health Records (EHR) technology. To be eligible for these incentives, providers must achieve specific objectives for meaningful use of EHRs over a five year period in three stages (“EHR Incentives and Certification”, 2013). In stage 1, healthcare providers must capture this information electronically in a standardized manner to track patient clinical conditions, share this information across healthcare providers and use this information to engage patients and their families. In stage 2, providers must adhere to increased e-prescribing requirements and transmit patient care summaries electronically in several formats. In stage 3, providers must use this information to aid decision support for nationwide highpriority conditions, provision self-management tools for patients thus improving quality and efficiency of care. All of the above place a significant burden on healthcare providers to not only plan for storing large amounts of data, but also to quickly build sophisticated technology to make inferences and improve patient and population health through this data. A 2009 Department of Health and Human Services (HHS) ruling requires that healthcare providers convert to the 10th revision of the International Statistical Classification of Diseases (ICD-10) medical classification by October 1st 2015 (“ICD-10”, 2014). Its goal is to provide enhanced clinical documentation of patient conditions from a clinical, operational, professional and financial perspective (“How will my practice benefit”, 2014). Benefits range from improved data analysis and treatment efficacy to more efficient medical reimbursements (“ICD-10 Implementation for Healthcare Providers”,

2010). This impacts not just healthcare providers, but also payers and vendors. While this ruling has generated prospects to share newer healthcare payer and provider transaction data which was not previously captured, it requires significant Information Technology (IT) investment to migrate systems from the current ICD-9 to ICD-10 format, thus making this yet another data integration challenge. Social media content like blogs, video-logs, twitter feeds, Facebook postings and instant messages provide yet another valuable data source. According to Squazzo (2010), patients are turning to the online media not just to obtain healthcare information, but to interact with other patients that have similar health issues and exchange opinions about provider services. Compiling and analyzing this type of information can provide valuable insights to providers.

Literature Review and Discussion Emerging Healthcare Industry Standards and Models. To help healthcare providers navigate this changing landscape, several standards have emerged. Health Level 7 International, a nonprofit ANSI-accredited standards development organization, has laid out several normative and informative standards that enable retrieval and sharing of electronic health information (“About HL7”, 2014). For example, its EHR Profile standards provide constructs for managing EHRs across pharmacy, clinical research, emergency departments, and individual settings. Its EHR records Management and Evidentiary Support Functional Support Profile provides specific requirements for management of unstructured health record information like text message to physician, scanned image of insurance card or voice-recording of a physician’s dictated report (“HL7 EHR Records Management”, 2010). This highlights the need for healthcare providers to enhance their existing IT systems to adapt to such emerging standards. The Healthcare Information and Management Systems Society (HIMSS) is a global non-profit organization .that focuses on optimizing health engagements using Information Technology. HIMSS Analytics, a division of HIMSS, has published an Electronic Medical Record Adoption Model to facilitate

a healthcare provider’s progress to a digital environment (“Structure and Stage Detail”, 2014). At Stage 7 of the model, mature providers utilize data warehouses to analyze clinical data and improve care delivery efficiency. As of third quarter 2014, 3.4% of 5,453 healthcare providers surveyed have attained Stage 7 certification. This shows that a small number of providers have already made the technology investment for advanced data analysis.

Healthcare Data Warehouses for Analytics. Healthcare data warehouses have been traditionally used as a means to practice evidence-based medicine (Sahama & Croll, 2007). Providers use them to obtain a proactive, longitudinal and integrated view of their data assets to enable intelligent decision-making for improved clinical outcomes. This technology gained momentum in late 1980’s when breakthroughs in computing technologies made it possible to collect large volumes of data in relational databases. Online Analytical Processing (OLAP) systems evolved that derived intelligence from this compiled data using statistical techniques. However, implementation of data warehouses in healthcare have been fraught with challenges over time. Chen (2012) has highlighted scalability, lack of standardization, maintainability, data errors and inherent complexity to healthcare data as the primary reasons for this. These impediments increase the difficulty and further reduce the ability of traditional database and analytic systems to handle the enormous and diverse data available today. Big Data and Benefits to Healthcare. The advent of Big Data technology has created opportunities to employ large amounts of diverse data in making more meaningful diagnosis, in predicting outcomes more accurately, and in reducing the cost of care (Raghupathi & Raghupathi, 2014). Additionally, detection of diseases at earlier stages, analysis and prediction of disease patterns, remote monitoring of in-home and in-hospital devices, and patient profile

analytics (e.g., proactive care or lifestyle changes of patients at-risk of developing a specific condition like diabetes) are possible with the use of Big Data technology. While no single acceptable definition of Big Data has emerged, one has been proposed by the TechAmerica Foundation, a non-profit that disseminates research in government and national research. This definition is as follows: “large volumes of high velocity, complex, and variable data that require advanced techniques and technologies to enable the capture, storage, distribution, management and analysis of the information” (Demystifying Big Data, 2012). The technologies inherent to Big Data are mostly open source technologies at this point in time. “Apache Hadoop” comprising of the “Hadoop Distributed File System” and “MapReduce” form the heart of the platform, with additional utilities like “Pig”, “PigLatin”, “Hive” and “Mahout” that add data assimilation, querying and machine-learning capabilities (Zikopoulos & Eaton, 2011). Deploying open source systems has its limitations. Since open source software is developed by the community of software developers at large, the warranty it comes with is limited in scope (and at times there is no warranty, expressed or implied). Organizations need to invest time and resources to train their IT personnel in supporting such technology. Well established vendors like IBM, Cloudera and Amazon provide a combination of supported open source versions and proprietary Big Data implementations. While these options provide indemnity, they are relatively expensive with respect to deployment costs. Regardless of the path chosen, existing healthcare analytics investments cannot be displaced overnight due to several reasons. For one, providers are typically cost constrained and cannot invest in full-fledged implementations of newer technologies overnight. Even if the promise of Big Data enables them to do so, it is not feasible to migrate vast repositories of data

built over several decades to a format compatible with Big Data implementation, in a short period of time. This presents a problem for healthcare organizations that want to incrementally build their Big Data footprint, while deriving useful analytics knowledge from their current data assets. Service-oriented Architecture. Service-Oriented Architecture (SOA) has been widely adopted as a framework that enables a component-based, reusable approach to an organization’s assets. It was born out of a need to solve for integration challenges arising from organizational mergers or acquisitions or just silos built over time, which resulted in disparate systems and network architectures and implementations. There was and still is value to be drawn from these legacy systems while continuing to extend and adapt them to changing business models. Many definitions of SOA have been extended over time. One that is most relevant to this discussion is “The policies, practices, frameworks that enable application functionality to be provided and consumed as sets of services published at a granularity relevant to the service consumer. Services can be invoked, published and discovered, and are abstracted away from the implementation using a single, standards-based form of interface” (Wilkes, 2004, para. 7). While SOA can be thought of as a pattern or framework, an Enterprise Service Bus (ESB) can be considered an enabler to realize the SOA vision. A relevant definition for ESB in this context is to “Provide a robust, manageable, distributed integration infrastructure with principles of SOA. Enable interaction through services that are defined by explicit implementation-independent interfaces, are loosely bound and invoked through communication protocols that stress location transparency and interoperability, and encapsulate reusable business function. Support service routing and substitution, protocol transformations, and other message processing” (Acharya et al., 2004, p. 100). In the next section we discuss how a healthcare

provider can combine SOA and ESB concepts towards a more meaningful analytics implementation.

Proposed SOA Framework for Healthcare Analytics We present a SOA driven conceptual framework for healthcare providers to migrate to a Big Data framework in an incremental manner. This conceptual framework is depicted in Figure 1. Multiple sources of data are employed by a typical healthcare provider. External sources include data from Health Information Exchanges, social media content, and other industrystandard benchmarking data. EHR systems, legacy data sources, and relational databases are examples of internal data sources. Traditionally, healthcare providers have stored this disparate data in a data warehouse and derived business intelligence from it over time. The movement of data from these systems to the warehouse was done in periodic intervals; in a batch-mode (i.e.) after-the-fact movement (Ponniah, 2011). This mechanism seemed to have served the purpose well when lower volumes of data were generated from the various sources (Sahama & Croll, 2007). Due to reasons cited earlier, the volume and diversity of healthcare data is likely to increase at an even faster rate in the future. Delayed use or untimely use of this data to derive meaningful insights limits a healthcare provider’s ability to respond to the quickly changing landscape and remain competitive. While migrating all of this data into the Big Data platform is an option, for most providers it is both

highly expensive as well as time consuming. Instead, a better approach would be to position the data warehouse as another data source feeding the Big Data Platform. To achieve this goal and perform real-time processing of incoming data, a mediation layer becomes critical. An SOAdriven platform provides a solution to this problem as illustrated in Figure 1 below. (Continued below)

Social Media Content

External Healthcare Benchmarking Data

EHR Source 1

Big Data EHR Data Cleansing & Transformation

EHR

Advanced Analytics, Dashboards, Reports

Healthcare

Platform

Data Warehouse

Processing

Source 2

Service-oriented Architecture-ESB Platform Data Transformation, Re-mapping, Standardization in Real-Time

Real-Time Processing

Legacy Healthcare Payments

Physician’s Research Database

Health Data Source

Non-real-Time, Asynchronous or Batch Processing

Database

Health Information Exchange

Figure 1. An SOA-driven framework that assimilates data from various sources

An ESB at the heart of this framework performs the much needed task of data transformation, re-mapping, and standardization from both internal and external sources in realtime. Doing so has multiple benefits. Numerous data anomalies are eliminated upfront. Due to a historical lack of standardization when EHR data was captured, there are many inconsistencies to contend with. Botsis, Hartvigsen and Weng (2010) conducted a survival analysis study for pancreatic cancer by examining data from the Columbia University Medical Center’s data warehouse. They extracted EHR elements for patient groups classified by the disease sub-types. They found numerous data quality issues including incompleteness, inconsistency, and inaccuracy. Cleaning massive amounts of data prior to consumption by Big Data technologies is both time-consuming and resource-intensive. An alternative is to have the ESB platform call out the data cleansing routines at run-time prior to forwarding this data, which reduces Big Data computing resources needed to parse and clean this data after ingestion. Such transformation happens in real-time which enables healthcare providers to incorporate holistic and relevant analytics feedback equipped with current data and statistics, into their decision-making processes. Additionally, this path enables existing IT systems to incrementally migrate to the newer healthcare standards described in the previous sections. For example, if a provider’s healthcare payments database or legacy billing systems have ICD-9 format data, the ESB layer can re-map this into ICD-10 format at run-time. Because this interaction is through well-defined service interfaces, the data format intrinsic to these databases is encapsulated and hence not modified. Another example is sharing internal data to a HIE. The ESB layer can orchestrate data extraction from the data warehouse, Big Data platform and other internal sources and re-format it to meet HL7 message formats prior to transmitting to the HIE. Armed with such capabilities through

SOA, a healthcare provider’s IT infrastructure is transformed into an eco-system that can coexist while continuing to make incremental progress towards the data analytics journey. The framework proposed above does have some limitations. Firstly, it does not consider the security and organizational policy aspects involved in such a setting. Both features are crucial to a holistic implementation. Secondly, there are a number of nuances in the Big Data set up including, but not limited to, the choice of vendor-supported vs. a fully open-source implementation, and the tools used to perform advanced analytics against the Big Data platform. Because this technology is still nascent, there are numerous options to select that can make for a complex decision-making process. Depending on the path taken, the end goal of meaningful healthcare analytics can be greatly enhanced or diminished. These aspects need to be explored in detail in the future. Concluding Comments While Big Data technology’s potential in healthcare is indisputable, providers cannot overhaul their IT systems overnight to make their current digital assets compatible with Big Data. It is also important to realize that Big Data is only a means to an end. In this paper, we have presented a SOA driven framework which will allow healthcare providers to gain meaningful insights from various data islands, even in the face of changing regulatory, technological and competitive environments. To obtain optimal returns, healthcare providers need to focus on a well-rounded approach to re-purpose their current IT investments. By incorporating the framework presented in this paper, they can position themselves well to achieve this goal.

References Acharya, A., Bishop, S., Hopkins, A., Milinski, S., Nott, C., Robinson, R., & Verschueren, P. (2004). Patterns: Implementing an SOA using an enterprise service bus. IBM, International Technical Support Organization.

Botsis, T., Hartvigsen, G., Chen, F., & Weng, C. (2010). Secondary use of EHR: data quality issues and informatics opportunities. AMIA summits on translational science proceedings, 2010, 1.

Centers for Medicare and Medicaid Services. (2014). How will my practice benefit?. Retrieved from http://www.roadto10.org/icd-10-benefits/

Centers for Medicare and Medicaid Services. (2014, October 20). ICD-10. Retrieved from http://www.cms.gov/Medicare/Coding/ICD10/index.html?redirect=/icd10

Chen, E. T. (2012). Implementation Issues of Enterprise Data Warehousing and Business Intelligence in the Healthcare Industry. Communications of the IIMA, 12(2), 39.

HealthIT.gov. (2013, January 15). EHR Incentives and Certification. Retrieved from http://www.healthit.gov/providers-professionals/how-attain-meaningful-use

HealthIT.gov. (2014, February 24). HITECH Programs & Advisory Committees. Retrieved from

http://www.healthit.gov/policy-researchers-implementers/hitech-programs-advisorycommittees

HealthIT.gov. (2010). ICD-10 Implementation for Healthcare Providers. Retrieved from http://www.healthit.gov/facas/FACAS/sites/faca/files/icd10implementationforhealthcarep roviders.pdf

Health Level 7 International. (2014). About HL7. Retrieved from http://www.hl7.org/about/index.cfm?ref=nav

Health Level 7 International. (2010, August 16). HL7 EHR Records Management and Evidentiary Support Functional Model, Release 1. Retrieved from http://www.hl7.org/documentcenter/private/standards_temp_88FFB80B-1C23-BA170CE0D27E3FAFC837/EHR/Functional_Profiles/EHR_RMES_FP_R1_2010AUG.pdf

HIMSS Analytics. (2014). Structure and Stage Detail. Retrieved from http://www.himssanalytics.org/emram/structure.aspx

Ponniah, P. (2011). Data warehousing fundamentals for IT professionals. John Wiley & Sons.

Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health Information Science and Systems, 2(1), 3.

Sahama, T. R., & Croll, P. R. (2007, January). A data warehouse architecture for clinical data warehousing. In Proceedings of the fifth Australasian symposium on ACSW frontiers Volume 68 (pp. 227-232). Australian Computer Society, Inc..

Squazzo, I. D. 2010. "Best Practices for Applying Social Media in Healthcare." Healthcare Executive 25(3): 34-39.

TechAmerica Foundation. (2012, October 3). Demystifying Big Data. Retrieved from http://www.techamericafoundation.org/bigdata

Understanding Big data: Analytics… Zikopoulos, P., & Eaton, C. (2011). Understanding big data: Analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media.

Wilkes, L. (2004). Understanding Service-Oriented Architecture. MSDN. http://msdn.microsoft.com/en-us/library/aa480021.aspx