A Cloud Computing Based Platform for Sharing Healthcare Research Information Mu-Hsing Kuo, Andre Kushniruk, Elizabeth Borycki School of Health Information Science, University of Victoria, Victoria, BC, Canada e-mail:
[email protected],
[email protected],
[email protected]
Feipei Lai Graduate Institute of Biomedical Electronics & Bioinformatics National Taiwan University, Taipei, Taiwan e-mail:
[email protected]
Sarangerel Dorjgochoo, Erdenebaatar Altangerel Computer Science & Management School, Mongolian University of Science and Technology, Ulaanbaatar, Mongolia e-mail:
[email protected]
proposes to apply sequence clustering methods to discover better clinical pathways and establish standardized clinical guidelines for liver cancer treatment.
Abstract— The aim of this paper is to propose a cloud-based data mining (DM) platform for researchers in three different geographic locations (Canada, Taiwan and Mongolia) for sharing research data/results through the Internet while remaining cost effective, flexible, secure and privacy-preserved. In addition, the study evaluates the implementation challenges of the cloud based platform and provides potential solutions to handle the identified issues so that other similar research can use this study as a reference to determine whether (or how) to migrate from traditional to cloud-based services. Keywords — Data Mining; Architecture.
However, as indicated in the paper by Anderson et al. [1], data-handling problems, complexity, and expensive or unavailable computational solutions to research problems are major issues in biomedical research data management and analysis. Currently, the Taiwan-Mongolia research team used traditional laboratory-hosted servers in a distributed architecture for data sharing and analysis. This architecture is expensive (e.g. a great deal of IT maintenance cost) and lower efficiency (e.g. difficulty in integration of diverse software and hardware).
Cloud Computing; Healthcare Data Sharing; Collaborative Research; Service-Oriented
I.
INTRODUCTION
Cloud computing refers to an on-demand self-service Internet infrastructure that enables the user to access computing resources anytime from anywhere. Research by Rosenthal et al. shows that the biomedical informatics community, especially consortiums that share data and applications, can take advantage of the new computing paradigm [2]. Several informatics innovations also have demonstrated that cloud computing has the potential to overcome health data management and analysis issues [3-10].
Since 2009, the National Taiwan University (NTU) and the Mongolian University of Science and Technology (MUST) have carried out a 3-years "Data Mining (DM) on Healthcare" joint research project. This year, researchers at the School of Health Information Science, University of Victoria (UVic) and the BC Cancer Agency plan to join the Taiwan-Mongolia data mining project as collaborators to form a three-country based research team (We plan to extend the study period to 5 years). Two of the principal investigator's graduate students have been involved in liver cancer data mining research. The main benefit of the collaboration is that researchers can apply DM algorithms to analyze diverse clinical diagnosing records contained in three distinct Electronic Health Record (EHR) systems to discover hidden knowledge related to liver cancer. The joint project expects to achieve the following two goals:
The main objective of this paper is to propose a cloudbased data mining platform for researchers in three different geographic locations to share research data/results through the Internet while remaining cost effective, flexible, secure and privacy-preserved. Also, we evaluate the cloud opportunities and challenges, provide potential solutions to challenges, and develop implementation guidelines based on a practical system implementation.
(1) To provide early detection of liver cancer Researchers will be able to infer relationships from a large number of medical records using the association algorithms (e.g. the Apriori algorithm). This analysis produces association rules that indicate what combinations of demographics, geographic locations and patient characteristics lead to liver cancer. As a consequence, the resulting system could provide early alerts to patients with high liver cancer risk.
II.
LITERATURE REVIEW
Cloud computing is a new model of delivering computing resources, not a new technology. However, compared with conventional computing, this model provides three new advantages: massive computing resources available on demand, elimination of an up-front commitment by users and payment for use on a short-term basis as needed [11].
(2) To establish clinical pathways and guidelines The study will collect medical records from three countries' EHR systems, including admissions/discharge diagnoses, chief complaints, physician orders, etc., and
978-1-4673-1382-7/12/$31.00 ©2012 IEEE
Chinburen Jigjidsuren Department of Hepatobiliarypancreatic Surgery, Mongolian National Cancer Center, Mongolia e-mail:
[email protected]
From a service point of view, cloud computing includes three archetypal models: Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service
504
(IaaS). To deploy cloud computing, the U.S. National Institute of Standards and Technology (NIST) listed four models: public cloud, private cloud, community cloud and hybrid cloud [12].
updating its clinical processes using cloud-based software from IBM Business Partners MedTrak Systems. The company now can provide faster and more accurate billing to individuals and insurance companies, shortening the average time to create a bill from 7 days to less than 24 hours, and reducing medical transcription costs by 80% [28]. In Europe, a consortium including IBM, Sirrix AG security technologies, Portuguese energy and solution providers Energias de Portugal and EFACEC, San Raffaele Hospital, and several European academic and corporate research organizations contracted Trustworthy Clouds - a patient-centered home health care service - to remotely monitor, diagnose, and assist patients outside of a hospital setting. The complete lifecycle, from prescription to delivery to intake to reimbursement, will be stored in the cloud and will be accessible to patients, doctors, and pharmacy staff [29].
In Kuo's paper [13], the author provided a very comprehensive study of the opportunities and challenges of cloud computing to improve health care services. The main advantages of this new computing model are: low cost, flexibility (i.e., rapid elasticity and ubiquitous access to computing resources), safety and benefits from so-called green computing [14]. However, there are also several management, technology, security and legal issues to be addressed. For example, lack of trust in data security and privacy by users, organizational inertia, loss of governance, and uncertain provider compliance are main management challenges [1517]; resource exhaustion, unpredictability of performance, data lock-in, data transfer bottlenecks and bugs in large-scale distributed cloud systems are technical challenges related to the use of cloud computing [18-20]; separation failure, public management interfacing, poor encryption key management, and privilege abuse are specific risks to cloud computing; and data jurisdiction and privacy issues are major legal issues [2125].
III.
THE CLOUD BASED DATA MINING PLATFORM
In this research, we propose a cloud computing based Service-Oriented Architecture (SOA) in order to facilitate the development and integration of heterogeneous database and application systems (see Figure 1). If any research team wants to access remote resources (data or applications), it merely requests web services through the SOA by applying the standardized programming grammar [30].
Despite several challenges associated with cloud computing applications, many previous studies reported successful employments of cloud computing in bioinformatics research [3-9,26]. For example, Avila-Garcia et al proposed a framework based on the cloud computing concept for colorectal cancer imaging analysis and research for clinical use [6]. Bateman and Wood used Amazon’s EC2 service with 100 nodes to assemble a full human genome with 140 million individual reads requiring alignment using a sequence search and alignment by hashing (SSAHA) algorithm [7]. Kudtarkar et al also used EC2 to compute orthologous relationships for 245,323 genome-to-genome comparisons. The computation took just over 200 hours and cost US $8,000, approximately 40% less than expected [8]. Memom et al applied cloud computing to evaluate the impact of G-quadruplexes on Affymetrix arrays [9]. The Laboratory for Personalized Medicine of the Center for Biomedical Informatics at Harvard Medical School used the benefits of cloud computing to develop genetic testing models that managed to manipulate enormous amounts of data in record time [26].
Figure 1. The Service-Oriented Architecture
Besides academic researchers, many world-class software companies have heavily invested in the cloud, extending their new offerings for medical records services, such as Microsoft’s HealthVault, Oracle’s Exalogic Elastic Cloud, and Amazon Web Services (AWS), promising an explosion in the storage of personal health information online. Also, the use of health cloud computing is reported worldwide. For example, the AWS plays host to a collection of health care IT offerings, such as Salt Lake City-based Spearstone’s health care data storage application, and DiskAgent uses Amazon Simple Storage Service (Amazon S3) as its scalable storage infrastructure [27]. The American Occupational Network is improving patient care by digitizing health records and
The cloud algorithm was used to virtualize the server with Virtual Machine ware (VMware) in the platform (see Figure 2). It allows researchers to use VMware to open a virtualized server to conduct the same research study at different locations. Different site researchers can share and transform outcomes via the virtualized server while keeping their original patient data in a local database. Also, we borrowed the technique called Ambiguity that was proposed by Wang [31] to protect both presence privacy and association privacy with low information loss. The benefit of this structure is that it has preserved the original data security and privacy.
505
(1) Loss of data governance - In some cases, a service level agreement (SLA) may not offer a commitment to allow us to audit in-cloud data. The loss of data governance is the main concern when our sensitive data and mission-critical applications move to a cloud computing paradigm where providers cannot guarantee the effectiveness of their security. (2) Data jurisdiction issues - Cloud computing is a shared resource and multi-tenancy environment for capacity, storage and network. Physical storages could be widely distributed across multiple jurisdictions. Different jurisdictions may have different laws regarding data security, privacy, usage, and intellectual property. For example, the U.S. Health Insurance Portability and Accountability Act (HIPAA) restricts companies from disclosing personal health data to non-affiliated third parties unless specific contractual arrangements have been put in place. The Patriot Act also deterred cloud adoption outside of the U.S. because the Act gives the U.S. government a right to demand data if it defines conditions as being an emergency or necessary to homeland security. The problem is that many main cloud providers such as Microsoft, Google and Amazon are U.S. based. This may cause legal issue for the proposed platform deployment. For example, the Canadian Personal Information Protection and Electronic Documents Act (PIPEDA) limits the powers of organizations to collect, use, or disclose personal information in the course of commercial activities. However, the provider may, without notice to us, move our data from jurisdiction to jurisdiction.
Figure 2. virtualization server workflows
The SOA and virtualization servers form a so called “cloud architecture” which allows researchers to easily share study resources such as patient health data, data mining applications and storage spaces through internet (see Figure 3). In the cloud architecture, the data mining applications can be viewed as a Software as a Service (SaaS) and the virtualization servers as a Platform as a Service (PaaS). In other words, this is a very flexible model for enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service-provider interaction.
(3) Data interoperability issues - There are many issues associated with health data interoperability such as functional, data instance and metadata interoperability issues [33]. Likewise, most cloud infrastructures provide very little capability on health data, application and service interoperability. This could be an issue for the study data/results migrating from one provider to another, or moving back to an in-house IT environment (i.e. data lock in) [18].
UVic/BC Cancer Agency Figure 3. The data mining platform cloud architecture
IV.
(4) Privacy issues - Cloud computing is a shared resource and multi tenancy environment for capacity, storage, and network. The privacy risk of this type of environment includes the failure of mechanisms for separating storage, memory, routing, and even reputation between different tenants of the shared infrastructure. The centralized storage and shared tenancy of physical storage space means that our sensitive data can have risk of disclosure to unwanted parties [20].
IMPLEMENTING PROCEDURES
The implementation of the proposed platform will include five steps as follows [32]: Step 1. Identify the service requirement This step is to analyze the current status of the research data sharing process and identify the fundamental objective of information service improvement. The analysis provides the proposed study with a well-defined scope for the service problem being faced. In addition, the stage will define service quality indicators and explain their purpose as well as the use of each indicator.
Many references are available for handling technical issues [18, 34-36]. The main providers (e.g., Microsoft, Google, Amazon) have commitments to develop best policies and practices to secure customer’s data and privacy. Some not-forprofit organizations, such as the Trusted Computing Group (http://www.trustedcomputinggroup.org/) and the Cloud Security Alliance [37], have developed comprehensive
Step 2. Evaluate and deal with cloud challenges As with any innovation, cloud computing should be rigorously evaluated before its adoption. We identified 4 specific challenges to the proposed study as follows [13]:
506
guidelines, hardware and software technologies to enable the construction of trustworthy cloud applications. In addition, most legal issues involved in cloud computing usually can be resolved through contract evaluation or negotiations [17, 38]. This study will evaluate the feasibility of those resources/guidelines for dealing with the challenges.
(2) Evaluate the opportunities and challenges of cloud computing applications to healthcare research data management and analysis through practical system implementation. The study can be used as a reference by other similar research to determine whether (or how) to migrate from traditional to cloud-based services.
Step 3. Compare different cloud providers Choosing a proper cloud provider is the most important part of the cloud implementation plan. Different providers may offer different service models, pricing schemes, audit procedures, and privacy and security policies. The study will compare different offerings and evaluate the provider’s reputation and performance. Also, the provider should be able to provide assurances of quality of service and follow sound privacy, security, and legal practices and regulations.
(3) Provide a practical international collaboration between academic institutes and healthcare providers. REFERENCES [1]
[2]
Step 4. Set up and test the new data mining platform As indicated before, NTU and MUST joint research projects have accomplished establishment of their study framework. After the UVic and BC Cancer Agency research teams join the collaborative project, the new cloud-based data mining platform (a community cloud, Figure 3) must be adjusted and tested to meet all parties’ requirement. Each part of the platform will be examined by unit tests during the setup stage. Upon integrating all parts into a single unit, we will test the process of data sharing.
[3]
[4]
[5] [6]
Step 5. Develop a follow-up plan The last step is to develop a follow-up plan. The plan indicates when to measure and how to measure the service improvements. We will set up performance indicators and targets beforehand, and the results of the new services are measured against the specified targets or performance indicators to assess the magnitude of the improvement. If the new service condition is not satisfied, the we will review what factors influence the objective achievement.
[7] [8]
[9]
[10]
V. CONCLUSION Cloud computing is a model for enabling convenient, ondemand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service-provider interaction [12]. Many managers, experts and studies have suggested that it has the potential to overcome health data management and analysis issues [2-10, 39-41]. However, few researches have systematically studied the impact of cloud computing on healthcare IT (i.e., its opportunities and challenges) based on a practical system implementation. In this paper, we propose a cloud-based data mining platform for researchers in three different geographic locations for sharing health care research data/results through internet. The main contributions of this study are:
[13]
(1) Develop a cost effective, flexible, secure and privacypreserved cloud based platform for researchers in different geographic locations to allow for easily exploring diverse medical data for liver cancer research through the Internet.
[17]
[11]
[12]
[14]
[15] [16]
507
N.R. Anderson, E.S. Lee, J.S. Brockenbrough, M.E. Minie, S. Fuller, J. Brinkley and P. Tarczy-Hornoch, "Issues in biomedical research data management and analysis: needs and barriers," Journal of the American Medical Informatics Association, vol.14, pp. 478–88, 2007. A. Rosenthal, P. Mork, M.H. Li, J. Stanford, D Koester, and P. Reynolds, "Cloud computing: A new business paradigm for biomedical information sharing," Journal of Biomedical Informatics, vol. 43, pp. 342-353, 2010. J.T. Dudley, and A.J. Butte, "In silico research in the era of cloud computing," Nature Biotechnology, vol. 28, no. 11, pp. 1181-1185, 2010. D.P. Wall, P. Kudtarkar, V.A. Fusaro, R. Pivovarov, P. Patil, and P.J. Tonellato, "Cloud computing for comparative genomics," BMC Bioinformatics, vol. 11, no. 259, pp. 1-12, 2010. M.C. Schatz, B. Langmead, and S.L .Salzberg, "Cloud computing and the DNA data race," Nature Biotechnology, vol. 28, pp. 691-693, 2010. M.S. Avila-Garcia, A.E. Trefethen, M. Brady, F. Gleeson, and D. Goodman, "Lowering the Barriers to Cancer Imaging," Proceedings of the 4th IEEE International Conference on eScience, pp. 63-70, 2008. A. Bateman, and M. Wood, "Cloud computing," Bioinformatics, vol. 25, no. 12, pp. 1475, 2009. P. Kudtarkar, T.F. DeLuca, V.A. Fusaro, P.J. Tonellato, and D.P. Wall, "Cost-Effective Cloud Computing: A Case Study Using the Comparative Genomics Tool, Roundup," Evolutionary Bioinformatics, vol. 6, pp. 197–203, 2010. F.N. Memon, A.M. Owen, O. Sanchez-Graillet, G.J.G. Upton, and A.P. Harrison, "Identifying the impact of G-Quadruplexes on Affymetrix 3' Arrays using Cloud Computing," Journal of Integrative Bioinformatics, vol. 7, no. 2, pp. 111, 2010. N. Botts, B. Thoms, A. Noamani, and T.A. Horan, "Cloud Computing Architectures for the Underserved: Public Health Cyber Infrastructures through a Network of HealthATMs," Proceedings of the 43rd Hawaii International Conference on System Sciences, pp. 1-10, 2010. M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, "A View of Cloud Computing," Communications of the ACM, vol. 53, no. 4, pp. 50-58, 2010. P. Mell, and T. Grance, "The NIST Definition of Cloud Computing," Communications of the ACM, vol. 53, no. 6, pp. 50, 2010. M.H. Kuo, "Opportunities and Challenges of Cloud Computing to Improve Health Care Services," Journal of Medical Internet Research (JMIR), vol. 13, no. 3, pp. e67, 2011. J. Baliga, R.W.A. Ayre, K. Hinton, and R.S. Tucker, "Green Cloud Computing: Balancing Energy in Processing, Storage, and Transport.," Proceedings of the IEEE, vol. 99, no.1, pp. 149-167, 2010. C. Everett, "Cloud computing - A question of trust," Computer Fraud & Security, pp. 5-7, June 2009. W. Jansen, and T. Grance, NIST Guidelines on Security and Privacy in Public Cloud Computing. National Institute of Standards and Technology, 2011. The European Network and Information Security Agency (ENISA), Cloud Computing- Benefits, risks and recommendations for information security, 2009.
[18] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R.H. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, "Above the clouds: A Berkeley view of cloud computing," Technical Report, No. UCB/EECS-2009-28, EECS Department, U.C. Berkeley, 2009. [19] R. Zhang, and L. Liu, "Security Models and Requirements for Healthcare Application Clouds," Proceedings of the 3rd IEEE International Conference on Cloud, pp. 268-275, 2010. [20] D. Durkee, "Why Cloud Computing Will Never Be Free," Communications of the ACM, vol. 55, no. 5, pp. 62-69, 2010. [21] S. Pearson, "Taking Account of Privacy when Designing Cloud Computing Services," Proceedings of the IEEE First international workshop on software engineering challenges for Cloud Computing (ICSE CLOUD’09), Vancouver, BC, Canada, pp. 44-52, 2009. [22] D. Svantesson, and R. Clarke, "Privacy and consumer risks in cloud computing," Computer Law & Security Review, vol. 26, pp. 391-397, 2010. [23] Official Google Blog. An update on Google Health and Google PowerMeter URL: http://googleblog.blogspot.com/2011/06/update-ongoogle-health-and-google.html [accessed 2012-01-10] [24] C. Kuner, "Internet Jurisdiction and Data Protection Law: An International Legal Analysis," International Journal of Law and Information Technology, vol. 18, pp. 176-201, 2010. [25] B.T, Ward, and J.C. Sipior, "The Internet Jurisdiction Risk of Cloud Computing," Information Systems Management, vol. 27, no. 4, pp. 334339, 2010. [26] AWS case study: Harvard Medical School. Amazon Web Services. URL: http://aws.amazon.com/solutions/case-studies/harvard/ [accessed 2012-01-5] [27] DiskAgent Launches New Remote Backup and Loss Protection Software as a Service Offering. URL: http://www.thefreelibrary.com/DiskAgent(TM)+Launches+New+Remo te+Backup+and+Loss+Protection+Software...-a0182194404 [accessed 2012-01-6] [28] R. StrukhoffR, M. O'Gara, N. Moon, P. Romanski, and E. White, "Healthcare Clients Adopt Electronic Health Records with Cloud-Based Services," Cloud Computing Expo 2009. URL: http://cloudcomputing.sys-con.com/node/886530 [accessed 2012-01-6] [29] EU consortium launches advanced cloud computing project with hospital and smart power grid provider. URL: http://www03.ibm.com/press/us/en/pressrelease/33067.wss [accessed 2012-01-6] [30] C.H. Shen, C. Jigjidsuren, S. Dorjgochoo, C.H. Chen, W.H. Chen, C.K. Hsu, J.M. Wu, C.W. Hsueh, M.S. Lai, C.T. Tan, E. Altangerel, and F.
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
508
Lai, "A Data-mining Framework for Transnational Healthcare System," Journal of Medical Systems, May 2011. H. Wang, "Privacy-Preserving Data Sharing in Cloud Computing," Journal of Computer Science and Technology," vol. 25, no.3, pp. 401– 414, 2010. M.H. Kuo, "A Healthcare Cloud Computing Strategic Planning Model," Proceedings of the 3rd FTRA International Conference on Computer Science and its Applications (CSA-11), pp.769-775, Jeju, Korea,, December 2011. M.H. Kuo, A. Kushniruk, and E. Borycki, "A Comparison of National Health Data Interoperability Approaches in Taiwan, Denmark and Canada," Electronic Healthcare, vol. 10, no. 2, pp. 14-25, 2011. M.H. Kuo, A.W. Kushniruk, and E.M. Borycki, "Design and implementation of a health data interoperability mediator," Proceedings of the International Conference on Challenges of Interoperability and Patient Safety in Healthcare (STC2010), pp. 101-107, Iceland, 2010. R. Buyya, and R. Ranjan, "Special section: Federated resource management in grid and cloud computing systems," Future Generation Computer Systems, vol. 26, pp. 1189-1191, 2010. W. Jansen, and T. Grance, NIST Guidelines on Security and Privacy in Public Cloud Computing. National Institute of Standards and Technology, 2011. Cloud Security Alliance. Security Guidance for Critical Areas of Focus in Cloud Computing (V2.1). 2009. URL: http://www.cloudsecurityalliance.org/csaguide.pdf [accessed 2012-017] E.J. Schweitzer, "Reconciliation of the cloud computing model with US federal electronic health record regulations," Journal of the American Medical Informatics Association, vol. 24, no. 2, pp. 203-207, 2011. J. Wagener, O. Spjuth, E.L. Willighagen, and J.E.S. Wikberg, "XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services," BMC Bioinformatics, vol. 10, pp. 279, 2009. The Australian academy of Technological sciences and engineering (aTse). CLOUD COMPUTING: Opportunities and Challenges for Australia. URL: http://cloudinnovation.com.au/docs/ATSE_cloudcomputing.pdf [accessed 2012-01-10] R. Geambasu, S.D. Gribble, and H.M. Levy, "CloudViews: Communal Data Sharing in Public Clouds," Proceedings of the Workshop on Hot Topics in Cloud Computing (HotCloud '09), San Diego, CA, USA, 2009.