Future Generation Computer Systems 43–44 (2015) 61–73
Contents lists available at ScienceDirect
Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs
Accessing medical image file with co-allocation HDFS in cloud✩ Chao-Tung Yang a,∗ , Wen-Chung Shih b , Lung-Teng Chen a , Cheng-Ta Kuo a , Fuu-Cheng Jiang a , Fang-Yie Leu a a
Department of Computer Science, Tunghai University, Taichung, 40704, Taiwan ROC
b
Department of Applied Informatics and Multimedia, Asia University, Taichung, 41354, Taiwan ROC
highlights • The motivation of this paper is to attempt to resolve the problems of storing and sharing electronic medical records and medical images between different hospitals.
• Specifically, this study develops a Medical Image File Accessing System (MIFAS) based on HDFS of Hadoop in cloud. • The proposed system can improve medical imaging storage, transmission stability, and reliability while providing an easy-to-operate management interface.
• This paper focuses on the cloud storage virtualization technology to achieve high-availability services. • The experimental results show that the high reliability data storage clustering and fault tolerance capabilities can be achieved.
article
info
Article history: Received 31 December 2013 Received in revised form 31 July 2014 Accepted 15 August 2014 Available online 28 September 2014 Keywords: EMR PACS Hadoop HDFS Co-allocation Cloud computing
abstract Patient privacy has recently become the most important issue in the World Health Organization (WHO) and the United States and Europe. However, inter-hospital medical information is currently shared using paper-based operations, and this is an important research issue for the complete and immediate exchange of electronic medical records to avoid duplicate prescriptions or procedures. An electronic medical record (EMR) is a computerized medical record created by a care-giving organization, such as a hospital and doctor’s surgery. Using electronic medical records can improve patient’s privacy and health care efficiency. Although there are many advantages to electronic medical records, the problem of exchanging and sharing medical images remains to be solved. The motivation of this paper is to attempt to resolve the problems of storing and sharing electronic medical records and medical images between different hospitals. Cloud Computing is enabled by the existing parallel and distributed technology, which provides computing, storage and software services to users. Specifically, this study develops a Medical Image File Accessing System (MIFAS) based on HDFS of Hadoop in cloud. The proposed system can improve medical imaging storage, transmission stability, and reliability while providing an easy-to-operate management interface. This paper focuses on the cloud storage virtualization technology to achieve high-availability services. We have designed and implemented a medical imaging system with a distributed file system. The experimental results show that the high reliability data storage clustering and fault tolerance capabilities can be achieved. © 2014 Elsevier B.V. All rights reserved.
1. Introduction
✩ This work is sponsored by Tunghai University the U-Care ICT Integration
Platform for the Elderly, No. 103GREEnS004-2, Aug. 2014. This work was supported in part by the Ministry of Science and Technology, Taiwan ROC, under grant numbers MOST 101-2218-E-029-004 and MOST 102-2218-E-029-002. ∗ Corresponding author. E-mail addresses:
[email protected],
[email protected] (C.-T. Yang),
[email protected] (W.-C. Shih),
[email protected] (L.-T. Chen),
[email protected] (C.-T. Kuo),
[email protected] (F.-C. Jiang),
[email protected] (F.-Y. Leu). http://dx.doi.org/10.1016/j.future.2014.08.008 0167-739X/© 2014 Elsevier B.V. All rights reserved.
Medical records are important documents storing patient health care data and information. And it works for production of records to all medical institutions that performing clinical team in the business. Although most hospitals have established computerized medical record systems, they still keep written records stored on paper. This practice creates many problems, including the cost of space management, time-consuming human access and transmission, difficultly in backing up data, and increasing paper costs and waste.
62
C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73
An Electronic Medical Records (EMR) system refers to a paperless, digital, and computerized system of maintaining patient data. An EMR system is designed to increase efficiency and reduce documentation errors by streamlining the process. Implementing an EMR system is a complex, expensive investment that has created a demand for Healthcare IT professionals and accounts for a growing segment of the healthcare workforce [1]. According to the Medical Records Institute, an EMR system includes Automated Medical Records, A Computerized Medical Record Provider, Electronic Medical Records, Electronic Patient Records, and Electronic Health Records [1,2]. Using EMR can protect patients’ privacy and reduce paper usage and management of medical problems. This helps improve the efficiency and financial management of medical institutions, integrates the resources of medical institutions, provides mutual support management and decision-making systems, and ensures the cross-organizational sharing of resources. As a result EMR is becoming increasingly popular. It is worth noting that while storage cost per Terabyte is declining, the overall cost to manage storage is growing. The common misperception is that storage is inexpensive, but the reality is that storage volumes continue to grow faster than hardware price declines. Using of cloud computing promises to reduce costs, high scalability, availability and disaster recoverability, can we solve the long-term face the problem of medical image archive [3,1,4–6]. Building cloud computing in this large-scale parallel computing cluster is growing with thousands of processors [7,8,3,6,9–21]. In such a large number of compute nodes, faults are becoming common place. Current virtualization fault tolerance response plan, focusing on recovery failure, usually relies on a checkpoint/restart mechanism. However, in today’s systems, node failures can often be predicted by detecting the deterioration of health conditions [22–33]. The need for medical image retrieval and replication is increasing, as hospitals must handle the increasing number of medical photographs and images. However, the increase in high-quality imaging devices is currently outpacing the related infrastructure. Current picture archiving and communication systems (PACS) are unable to provide efficient query response services. It is difficult to sustain huge queries and file retrievals with limited bandwidth. Conventional access strategies and bandwidth restrict the quality of communication in the Web PACS network for exchanging and downloading a large amount of images. To enhance the quality of medical treatment, medical imaging requires an efficient file transfer strategy to achieve high-speed access. This paper tries to solve the EMR issues of exchanging, storing and sharing medical images. Based on the ‘‘Medical Images Exchange’’ concept of EMR, this study presents a medical image file accessing system (MIFAS) built on the Hadoop platform [34] to solve problems of exchanging, storing, and sharing medical images. This study also presents a new strategy for inspecting the medical image that involves a coallocation mechanism in a cloud environment. The Hadoop platform and proposed co-allocation mechanism establish a cloud environment for MIFAS. MIFAS helps users retrieve, share, and store medical images between different hospitals. The remainder of this paper is organized as follows. Section presents a background review. Section 3 introduces the system architecture. Section 4 presents experimental results. Finally, Section 5 concludes the article. 2. Background 2.1. Challenges in medical image exchanging For over a decade, most hospitals and private radiology practices have transformed from film-based image management systems to fully digital (filmless and paperless) environments but
subtly dissimilar (in concept only) to convert from a paper medical chart to a health electronic record (HER). Film and film libraries have given ways to modern picture archiving and communication systems (PACS). These systems offer highly redundant archives that tightly integrate with historical patient metadata derived from radiology information systems. These systems are more efficient than film and paper, and are more secure because they incorporate safeguards to limit access and sophisticate auditing systems to track scanned data. Although radiologists prefer efficient access to the comprehensive imaging records of their patients, there are no reliable methods to discover or obtain access to similar records that might be stored elsewhere [31,32,35]. A literature review reveals that there are very few cloud-based medical image implementations. However, previous research discusses the benefits of cloud-based medical images, including scalability, cost effectiveness, and replication [29]. The same study presented a Hadoop-based PACS system, but failed to provide an appropriate management interface. 2.2. Hadoop and HDFS Hadoop is one of the most salient pieces of the data-mining renaissance and offers the ability to tackle large datasets in ways that were not previously possible due to time and cost constraints. It is a part of the Apache Software Foundation and is being built by a global community of contributors. The Hadoop project promotes the development of open-source software and supplies a framework for the development of highly scalable distributed computing applications [30,36]. Hadoop is the top-leveled project in Apache Software Foundation and it supports the development of open source software [34]. Hadoop provides a framework for developing highly scalable distributed applications. The developer focuses on applying logic to datasets instead of processing details. The Hadoop Distributed File System (HDFS) stores large files on multiple machines. This approach achieves reliability by replicating data across multiple hosts, and hence does not require RAID storage on hosts. The HDFS is built from a cluster of data nodes, each of which serves blocks of data over the network using a block protocol. The HDFS also serves the data over HTTP, allowing access to all content from a web browser or other client. Data nodes can connect to each other to rebalance data, move copies around, and ensure high replication of data. A file system requires one unique server, or name node. This is a single point of failure for any HDFS installation. If the name node goes down, the file system will go off-line. When it comes back up, the name node must replay all outstanding operations. This replay process can take over half an hour for a large cluster [37]. 2.3. PACS PACS is an acronym that stands for Picture Archiving and Communication System. PACS has revolutionized the field of radiology, which now consists of all digital, computer-generated images as opposed to the analog film of yesteryear. Analog film took up space and time for filing and retrieval and storage, and was prone to being lost or misfiled. PACS saves time and money, and reduces the liability caused by filing errors and lost films. A PACS consists of four major components: imaging modalities such as CT and MRI, a secure network for transmitting patient information, workstations for interpreting and reviewing images, and archives for storing and retrieving images and reports. Combined with available and emerging Web technology, PACS can deliver timely and efficient access to images, interpretations, and related data. PACS breaks down the physical and time barriers associated with traditional film-based image retrieval, distribution, and display.
C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73
PACS is primarily responsible for the inception of virtual radiology, as images can now be viewed from across town, or even from around the world. PACS also acts as a digital filing system to store patients’ images in an organized way that enables records to be retrieved with ease as needed for future reference. 2.4. DICOM DICOM is short for Digital Imaging and Communications in Medicine, a standard in the field of medical informatics for exchanging digital information between medical imaging equipment and other systems, ensuring interoperability. The standard specifies a set of protocols for devices communicating over a network the syntax and semantics of commands and associated information that can be exchanged using these protocols a set of media storage services and devices claiming conformance to the standard, as well as a file format and a medical directory structure to facilitate access to the images and related information stored on media that share information. The standard was developed jointly by the American College of Radiology (ACR) and National Electrical Manufacturers Association (NEMA) as an extension of an earlier standard for exchanging medical imaging data that did not include provisions for networking or offline media formats [38]. DICOM integrates scanners, servers, workstations, printers, and network hardware from multiple manufacturers into a picture archiving and communication system (PACS). These different devices come with DICOM conformance statements that clearly state the DICOM classes they support. DICOM has been widely adopted by hospitals and is making inroads in dentists’ and doctors’ offices. DICOM differs from some, but not all, data formats in that it groups information into datasets. That means that a file of a chest X-ray image actually contains the patient ID within the file, so that the image can never be separated from this information by mistake. This is similar to the way that image formats such as JPEG can also have embedded tags to identify and otherwise describe the image. A DICOM data object consists of a number of attributes, such as its name and ID, and one special attribute containing the image pixel data (i.e., logically, the main object has no ‘‘header’’ as such— merely a list of attributes, including the pixel data). The same basic format is used for all applications, including network and file usage, but when written to a file, usually a true ‘‘header’’ (containing copies of a few key attributes and details of the application which wrote it) is added. A single DICOM object can only contain one attribute containing pixel data. For many modalities, this corresponds to a single image. However, the attribute may contain multiple ‘‘frames’’, which makes it possible to store cine loops or other multi-frame data. Another example is NM data, where an NM image by definition is a multi-dimensional multiframe image. In these cases, three- or four-dimensional data can be encapsulated in a single DICOM object. Pixel data can be compressed using a variety of standards, including JPEG, JPEG Lossless, JPEG 2000, and Run-length encoding (RLE). LZW (zip) compression can be used for the whole dataset (not just the pixel data), but this is rarely implemented. 2.5. Co-allocation mechanism A co-allocation architecture enables parallel downloading from data nodes. It can also speed up downloads and overcome network faults. A previously proposed architecture [13] consists of three main components: an information service, a broker/co-allocator, and local storage systems. Co-allocation of data transfers is an extension of the basic template for resource management [5]. Various applications specify the characteristics of the desired data and pass
63
attribute descriptions to a broker. The broker searches for available resources, obtains replica locations from the Information Service [4] and Replica Management Service [10], and then obtains the lists of physical file locations. We have implemented the following eight co-allocation schemes: Brute-Force (Brute), Historybased (History), Conservative Load Balancing (Conservative), Aggressive Load Balancing (Aggressive), Dynamic Co-allocation with Duplicate Assignments (DCDA), Recursively-Adjusting Mechanism (RAM), Dynamic Adjustment Strategy (DAS), and Anticipative Recursively-Adjusting Mechanism (ARAM) [23–25]. 2.6. Related works HDFS servers (i.e., DataNodes) and traditional streaming media servers are both used to support client applications that have access patterns characterized by long sequential reads and writes. As such, both systems are designed to favor high storage bandwidth over low access latency [39]. ‘‘Cloud computing’’ has recently become a hot topic in this field. S. Sagayaraj [29] proposed that Apache Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides reliability and data motion. Hadoop implements a Map/Reduce computational paradigm that divides an application into many small fragments of work. Each fragment of work may be executed or re-executed on any node in the cluster. Thus, simply replacing the PACS Server with Hadoop Framework can produce a good, scalable, and cost effective tool for health care system imaging. The same study presents an HPACS system, but fails to provide an appropriate management interface. J. Shafer, et al.’s [40] study ‘‘The Hadoop distributed filesystem: Balancing portability and performance’’, was favorably received in this field. The poor performance of HDFS can be attributed to challenges in maintaining portability, including disk scheduling under concurrent workloads, file system allocation, and file system page cache overhead. Using application-level I/O scheduling can significantly improve HDFS performance under concurrent workloads while preserving portability. Further improvements by reducing fragmentation and cache overhead are also possible, at the expense of reducing portability. However, maintaining Hadoop portability whenever possible will simplify development and benefit users by reducing installation complexity, thus encouraging the spread of this parallel computing paradigm. Previous research [11,13,22,24,25,27] shows that co-allocation can solve the data transfer problem in grid environments; these results are categorized: RAM [12,24,22], DAS, [27,13] and ARAM [28,23]. This concept is the foundation for the current study. However, the significant difference between this paper and previous works is that the proposed system is implemented on a cloud environment. 3. System design and implementation Fig. 1 illustrates the current overview of a distribution file system. The MIFAS has three HDFS groups. The first group, THU1, and the second group, THU2, are both hosted at Tunghai University. The third group is hosted at Chung Shan Medical University (CSMU) Hospital. All of the groups are connected with a 100 Mbps network bandwidth in a TANET (Taiwan Academic Network) network environment. The HDFS group number can be very flexible. The minimum is one but the maximum can be many. More HDFS groups increase the amount of duplication available. This means that the PCAS images source is from the HDFS. Thus, if we increase the source number (i.e., build more HDFS group), the effects will definitely be different based on source numbers.
64
C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73
Fig. 1. Overview of distribution file system.
Fig. 2. System architecture of MIFAS.
3.1. System architecture Fig. 2 shows the proposed MIFAS, based on a cloud environment. The distribution file system was built on HDFS of Hadoop environment (Section 2.2). This Hadoop platform can be described as PaaS (Platform as a Service), effectively extending SaaS (Software as Service) to platforms. The top level of MIFAS consists of a webbased interface. MIFAS provides a user-friendly interface for medical image queries. IaaS: in our previous work, we used OpenNebula to manage our VMs [29]. In order to achieve our goal, we can migrate the servers into OpenNebula environment. And it is also a key feature to develop the VFT approach on virtualization [41]. Middleware: this mechanism handles the transmission issues in MIFAS and is called MIFAS Middleware in this study. The Middleware’s purpose is to assign and acquire the best transmission path to the distributed file system. This Middleware also collects necessary information, such as bandwidth between server and server, the server utilization rate, and network efficiency. The information provided entirety MIFAS Co-allocation Distributed Files System to determine the best solution of downloading allocation jobs. The software components of MIFAS are shown in Fig. 3. Information service: to obtained analysis in the status of host. The MIFAS Middleware fetches the information of hosts, called
Fig. 3. The software components of MIFAS.
information service. The experiments in this study installed the Ganglia [42] in each Hadoop node to determine the real-time state from all members. Therefore, we could get the best strategy of transmission data from information service, which is one of the components of MIFAS Middleware. Co-allocation: as mentioned in Section 2.5, the proposed coallocation mechanism allows parallel downloading from data nodes, speeds up downloading, and solves network problems. Due to user using MIFAS to access Medical Images, the co-allocation will be enabled automatically. To realize parallel downloading approaches, the system splits those files into different parts and obtains data from different clouds, depending on their status. This allows the best downloading strategy. Earlier research [11,13,22] adopts the same co-allocation mechanism. Replication location service: the experiments in this study involved three HDFS groups in different locations, and each HDFS owned a certain amount of data nodes. The replication location service means that the service automatically makes duplicates
C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73
65
Fig. 4. System workflow of MIFAS.
Fig. 5. Authorization interface.
from one private cloud to one another when medical images are uploaded to MIFAS. 3.2. System workflow Fig. 4 depicts workflow of MIFAS operations. This study analyzes a real system to determine the performance of the proposed approach. First, users input the username and password for authentication. Second, users input search terms to query patient information. Third, users can also view patients’ medical images. Fourth, users can configure in MIFAS. Fifth, if users can present the MIFAS downing mechanism, the Middleware is compatible with MIFAS. 3.3. System interface MIFAS offers an authentication interface (Fig. 5) that users must pass before logging in to the MIFAS. After passing the validation, users will see the MIFAS Portal (Fig. 6). Fig. 6 shows that there are three main functional blocks, each with its functional purpose. Examination Type: this is the catalog of Medical Examination Type. Medical images can come from various medical imaging instruments, including ultrasound (US), magnetic resonance (MR), positron emission tomography (PET), computed tomography (CT), endoscopy (ENDO), mammograms (MG), direct radiography (DR),
computed radiography (CR), etc. The examination type is cataloged depending on the DICOM definition. Filter: this block provides a search function; users can obtain patient information through inputted keyword. In a web-based interface system, it is easy to achieve this goal. Users can capture any information that they want using filters. Thus, like other system on the internet, MIFAS provides multidimensional information for the users. There are four main options in the filter function: ‘‘Chart NO’’ (Examination No), ‘‘Patient Name’’, ‘‘Start of Examination Date’’, and ‘‘End of Examination Date’’. Patient information list: This block shows detailed information according to the search condition of Block B and Block A. This block also possesses several important functions. Fig. 7 shows the functional items, which include the Image Status, Thumbnail viewer, PACS Reporting, and Download File. Fig. 8 displays the information of file distribution status, including photographic description and photographic catalog. Fig. 9 shows the thumbnail viewer function, which displays a thumbnail of the medical image and examination report. Fig. 10 shows PACS Reporting, which is a detailed patient medical record. For more detail of Medical Images, we can through Download File function then utilize other professional DICOM viewer. Regarding how to upload medical images to MIFAS, see Fig. 11. According to our thesis, the Replication Location Service will duplicate images to each cloud. Finally the 4th icon in Fig. 7 is for downloading DICOM
66
C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73
Fig. 6. Portal of the MIFAS system.
Fig. 7. MIFAS functions.
Fig. 8. File status.
Fig. 9. Medical image preview.
C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73
67
Fig. 10. Patient records.
Fig. 11. Upload medical images.
Fig. 12. The node summary of the cloud test-bed.
format medical images from MIFAS. MIFAS uses a co-allocation mechanism to allocate files through an optimal strategy. 4. Experimental and results 4.1. Environments
Chung Shan Medical University Hospital. It is a production PACS in CSMU and there are total three PACS systems: CSMU, CSUM Chung Kang Branch Hospital, and CSUM Tai-Yuan Branch Hospital. Each PACS system has a synchronization mechanism. The network bandwidth between each PACS is under 100 Mbps as listed in Table 1. 4.2. Compare proximal images retrieval times from PCSA and MIFAS
In this section, we will compare the performance of MIFAS with PACS. A cloud computing test-bed, consisting of three distributed file systems, has been built by the High-performance Computing laboratory of computer science department in Tunghai University, using Hadoop. The summary of the Hadoop nodes is shown in Fig. 12. Fig. 13 shows the topology of three PACS systems used in
This experiment analyzes the same medical images as Table 2 in PCAS and MIFAS. Fig. 14 depicts the general interaction between the PACS vs. MIFAS. This illustration shows the experiment 1 environment. All the nodes are under a good situation, and then downloaded the same files from each PACS and MIFAS. The purpose
68
C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73
Fig. 13. The topology of three PACS systems in Chung Shan Medical University Hospital.
Fig. 14. System workflow of MIFAS. Table 2 Experimental medical images.
Table 1 Bandwidth of PACS in CSMU. End-to-end transmission rates (Mbps) of PACS in CSMU
Image type
Node from
Node to
Bandwidth (Mbps)
CSMU CSMU Chung Kang branch
Chung Kang branch Tai-Yuan branch Tai-Yuan branch
100 100 100
of this experiment is to compare the image retrieval times from PCAS and MIFAS. Fig. 15 shows the average results after testing each site 500 times. Fig. 16 shows that downloading from the proximal site be more time-consuming. The results of Figs. 15 and 16 show that PACS has better retrieval efficiency than MIFAS. This is primarily because PACS was built on a high-performance server, and is also a high cost medical image system. MIFAS cannot easily cross
CR chest A series of CT
Medical image attribute Pixel
QTY.
Total size (MB)
Testing times
2804 × 2931 512 × 512
1 389
7.1 65
500 times 500 times
this threshold. However, experiment 2 reveals the advantages of MIFAS. This study also compares PACS and MIFAS in terms of proximal failure problems. Because MIFAS provides a co-allocation strategy, the single site failure issue does not affect MIFAS operation. On the other hand, if the PACS system encounters the same problem, the only one solution is to use another PACS site. Thus, experiment 2 assumes that PACS in CSMU encounters system failure, and the medical staff must send medical images to another CSMU branch
C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73
69
Fig. 15. Image retrieval times result under 500 times testing.
Fig. 18. Hardware broken in both environments.
Fig. 16. Image retrieval times result in proximal.
Fig. 19. User face to PACS and MIFAS on WAN.
Unlike the MIFAS, it cannot distribute the workload from users accessing under the co-allocation mechanism. This section shows that MIFAS can effectively reduce the single site failure, and the problems of broken network and hardware. 4.3. Compare retrieve times from MIFAS and PACS cross hospitals Fig. 17. Hardware failures in both environments.
hospital. In the same situation, an HDFS group failure problem occurs in MIFAS. Even if the MIFAS has only two left HDFS groups, users still can retrieve medical images from the other HDFS group. This experiment assumes that both failures occur at a proximal site. Fig. 17 illustrates these experiment results, where PACS in CSMU is not available, but the MIFAS can still function well. Fig. 18 depicts a hardware broken state in both systems. Obviously, only one PACS is left in CSUM Tai-Yuan branch, but a strong contrast could be seen in MIFAS. MIFAS still supports medical image transfers to users under a co-allocation mechanism. In other words, the benefit of MIFAS is that it conquers network/hardware failures. PACS has its own synchronization mechanism. However, if there is a hardware or network broken in PACS, the only way is to use the survival site and try to reduce the resume time. The PACS system also has limitations in the number of concurrent users.
Because HDFS has good performance under WAN, we simulated the environment as shown in Fig. 19. Generally, a hospital PACS provides a single download node with many potential problems, while the MIFAS is better suited for multi-user access on a WAN due to its flexible architecture. Fig. 20 simulates PACS and MIFAS accessing the same CR medical record (7.1 MB) with different user numbers. MIFAS provides comparatively better performance as the number of user increases. In the same experiment, we tested it again, and this time we enabled the co-allocation mechanism in MIFAS as shown in Fig. 21. 4.4. Using DRBD with heartbeat increase the reliability of MIFAS This section describes the access HDFS in order to increase the reliability of MIFAS to THU1 HDFS in the experiment, in the Name Node failure of the system’s fault tolerance. Due to the system to view or download DICOM files are required in the HDFS’s Name
70
C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73
Fig. 20. Retrieval CR image on both systems.
Fig. 21. Retrieval CR image on PACS and MIFAS (Co-allocation).
Node through to each Data Node to fetch files. Therefore, Name node failure state, the node HDFS is invalid. Service IP: stand for provide service channel to external users, users through this IP to access services. VM2 is the primary node, and VM1 is secondary. For the difference between primary and secondary, please refer to Section 2. Debian1, Debian2 and Debian3 are the hosts: VM2 is living on Debian1, and the secondary node VM1 is living on Debian2. The whole environment is as shown in Fig. 22. In Fig. 23, shutting down Debian1 does not cause the Service IP to stop providing service. You can see the connection status of Service IP in the bottom of Fig. 23; it lost only one pack. The reason is under our VFT mechanism, the secondary node is replaced with the primary node immediately. In HA speak it is called FAILOVER. It is a good practice of the HA mechanism. And VFT also boot on the VM2 to on-line host Debian3. In this case, we can show that VFT is a good solution to solve the HA problem on virtualization. The GUI is shown in Fig. 24.
Fig. 22. Experimental environments.
Fig. 23. Shutdown host and VM2.
Fig. 24. The GUI of VFT mechanism.
C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73
71
Fig. 25. Experimental environments.
Fig. 26. Comparison of physical host and virtual machine throughput.
In this MIFAS with VFT environment, HDFS build three clusters (THU1, THU2 and CSMU). For each Namenode are done in two VMs configuration, and use the DRBD with heartbeat sync to do this part of the configuration, were configured for each Namenode four Datanodes. Fig. 25 shows the details and the environment. At this part, we conduct stress testing with JMeter. We set 10 Threads, and Loop count 5 times more physical machines and virtual machine on the environment were to download 1, 10 and 50 MB file sizes, etc., the resulting throughput and the ability to download data. The results of Fig. 26 show that the smaller of file size will enable greater throughput, and physical machines and VMs will be more obvious differences. Fig. 27 shows that we download a small file, and VMs transmission performance will be better than physical machines. In this experiment, we download the same files from each PACS and MIFAS. The purpose of this experiment is to compare the PACS and MIFAS Networking Performance. Fig. 28 shows the results, a smaller download file, MIFAS better transmission capacity, whereas in downloading large files, PACS has better transmission performance.
Fig. 27. Comparison of physical host and VM networking performance.
Fig. 28. Comparison of PACS and MIFAS networking performance.
5. Conclusions and future work This work develops a Medical Image File Accessing System (MIFAS) based on HDFS of Hadoop in cloud. MIFAS is a flexible,
72
C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73
stable, and reliable system and proves medical images sharing, storing and exchanging issues. Therefore, medical images can easily be shared between different hospitals. MIFAS offers the following advantages: Scalability: ability to extend the nodes as required. Adding a node to the network is as simple as hooking a Linux box to the network and copying a few configuration files. Hadoop provides details about the available space in the cluster. This makes it easy to decide if a node must be added. Cost effective: since the Linux nodes are relatively cheap; there is no need to invest much on the hardware and OS. Best strategy: the Hadoop platform offers a distributed file system and uses a co-allocation mechanism to retrieve medical images. Replication: because the replication location service in MIFAS middleware data can be saved, it can be easily shared in different private clouds. Easy management: we provide friendly management interface. This interface makes it easy to set and manage the private cloud environment in MIFAS. The experimental results show that the high reliability data storage clustering and fault tolerance capabilities can be achieved. The MIFAS system achieves acceptable redundancy in medical resources with much less expense. Furthermore, the MIFAS will be improved by enhancing the performance of file accessing. The future study will be a good example of cloud-based medical image file accessing, and achieves the goal of medical image exchanging between patients and their caregivers. Acknowledgments This work is sponsored by Tunghai University the U-Care ICT Integration Platform for the Elderly, No. 103GREEnS004-2, Aug. 2014. This work was supported in part by the Ministry of Science and Technology, Taiwan ROC, under grant numbers MOST 1012218-E-029-004 and MOST 102-2218-E-029-002. References [1] Electronic Medical Record. http://en.wikipedia.org/wiki/Electronic_health_ record. [2] Canada Health Infoway. http://www.infoway-inforoute.ca. [3] Mental Research Institute. http://www.mri.org. [4] Chervenak, E. Deelman, I. Foster, L. Guy, W. Hoschek, A. Iamnitchi, C. Kesselman, P. Kunszt, M. Ripeanu, B. Schwarz, H. Stockinger, K. Stockinger, B. Tierney, Giggle: a framework for constructing scalable replica location services, in: Proc. SC, 2002, pp. 1–17. [5] Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke, The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets, J. Netw. Comput. Appl. 23 (3) (2001) 187–200. [6] G.V. Koutelakis, D.K. Lymperopoulos, Member, IEEE, a grid PACS architecture: providing data-centric applications through a grid infrastructure, in: Proceedings of the 29th Annual International Conference of the IEEE EMBS Cite International, Lyon, France, ISBN: 978–1-4244-0787-3, August 23–26, 2007. [7] American Recovery and Reinvestment Act(ARRA). http://www.ama-assn.org. [8] Department of Health, Executive Yuan, R.O.C. (TAIWAN) ‘‘Electronic medical record project’’. http://emr.doh.gov.tw. [9] N.E. King, B. Liu, Z. Zhou, J. Documet, H.K. Huang, The data storage grid: the next generation of fault-tolerant storage for backup and disaster recovery of clinical images, in: Osman M. Ratib, Steven C. Horii (Eds.), Medical Imaging 2005: PACS and Imaging Informatics, in: Proceedings of SPIE, vol. 5748, 2005, pp. 208–217. [10] S. Vazhkudai, Enabling the co-allocation of grid data transfers, in: Proceedings of Fourth International Workshop on Grid Computing, 17 November 2003, pp. 44–51. [11] C.T. Yang, Chun-Pin Fu, Ching-Hsien Hsu, File replication, maintenance, and consistency management services in data grids, J. Supercomput. 53 (3) (2010) 411–439. [12] C.T. Yang, I-Hsien Yang, Chun-Hsiang Chen, RACAM: design and implementation of a recursively-adjusting co-allocation method with efficient replica selection in data grids, Concurr. Comput.: Pract. Exper. 22 (15) (2010) 2170–2194.
[13] Chao-Tung Yang, Chih-Hao Lin, Ming-Feng Yang, Wen-Chung Chiang, A heuristic QoS measurement with domain-based network information model for grid computing environments, Int. J. Ad Hoc Ubiquitous Comput. 5 (4) (2010) 235–243. [14] Lizhe Wang, Dan Chen, Jiaqi Zhao, Jie Tao, Resource management of distributed virtual machines, Int. J. Ad Hoc Ubiquitous Comput. 10 (2) (2012) 96–111. [15] Lizhe Wang, Marcel Kunze, Jie Tao, Gregor von Laszewski, Towards building a cloud for scientific applications, Adv. Eng. Softw. 42 (9) (2011) 714–722. [16] Lizhe Wang, Dan Chen, Fang Huang, Virtual workflow system for distributed collaborative scientific applications on grids, Comput. Electr. Eng. 37 (3) (2011) 300–310. [17] Lizhe Wang, Gregor von Laszewski, Marcel Kunze, Jie Tao, Jai Dayal, Provide virtual distributed environments for grid computing on demand, Adv. Eng. Softw. 41 (2) (2010) 213–219. [18] Lizhe Wang, Gregor von Laszewski, Jie Tao, Marcel Kunze, Virtual data system on distributed virtual machines in computational grids, Int. J. Ad Hoc Ubiquitous Comput. 6 (4) (2010) 194–204. [19] Lizhe Wang, Jie Tao, Rajiv Ranjan, Holger Marten, Achim Streit, Jingying Chen, Dan Chen, G-Hadoop: MapReduce across distributed data centers for dataintensive computing, Future Gener. Comput. Syst. 29 (3) (2013) 739–750. [20] Wanfeng Zhang, Lizhe Wang, Dingsheng Liu, Weijing Song, Yan Ma, Peng Liu, Dan Chen, Towards building a multi-datacenter infrastructure for massive remote sensing image processing, Concurr. Comput.: Pract. Exper. 25 (12) (2013) 1798–1812. [21] Yan Ma, Lizhe Wang, Dingsheng Liu, Tao Yuan, Peng Liu, Wanfeng Zhang, Distributed data structure templates for data-intensive remote sensing applications, Concurr. Comput.: Pract. Exper. 25 (12) (2013) 1784–1797. [22] Chao-Tung Yang, Yao-Chun Chi, Ming-Feng Yang, Ching-Hsieh Hsu, An anticipative recursively-adjusting mechanism for parallel file transfer in data grids, Concurr. Comput.: Pract. Exper. 22 (15) (2010) 2144–2169. [23] C.T. Yang, Ming-Feng Yang, Wen-Chung Chiang, Enhancement of anticipative recursively adjusting mechanism for redundant parallel file transfer in data grids, J. Netw. Comput. Appl. 32 (4) (2009) 834–845. [24] C.T. Yang, I-Hsien Yang, Shih-Yu Wang, Ching-Hsien Hsu, Kuan-Ching Li, A recursively-adjusting co-allocation scheme with a cyber-transformer in data grids, Future Gener. Comput. Syst. 25 (7) (2009) 695–703. [25] C.T. Yang, I.H. Yang, K.C. Li, S.Y. Wang, Improvements on dynamic adjustment mechanism in co-allocation data grid environments, J. Supercomput. 40 (3) (2007) 269–280. [26] Z. Zhou, et al., A data grid for imaging-based clinical trials, in: Steven C. Horii, Katherine P. Andriole (Eds.), Medical Imaging 2007: PACS and Imaging Informatics, in: Proc. of SPIE, vol. 6516, 2007, p. 65160U. [27] Chao-Tung Yang, Shih-Yu Wang, William C. Chu, Implementation of a dynamic adjustment strategy for parallel file transfer in co-allocation data grids, J. Supercomput. 54 (2) (2010) 180–205. [28] Chao-Tung Yang, Chiu-Hsiung Chen, Ming-Feng Yang, Implementation of a medical image file accessing system in co-allocation data grids, Future Gener. Comput. Syst. 26 (8) (2010) 1127–1140. [29] Gopinath Ganapathy, S. Sagayaraj, Circumventing picture archiving and communication systems server with Hadoop framework in health care services, J. Soc. Sci. 6 (2010) 310–314. [30] J. Venner, Pro Hadoop, first ed., Apress, 2009, p. 440. 13:9781430219439. [31] L. Faggioni, et al., The future of PACS in healthcare enterprises, Eur. J. Radiol. (2010). [32] E. Bellon, et al., Trends in PACS architecture, Eur. J. Radiol. (2010). [33] Lizhe Wang, Dan Chen, Yangyang Hu, Yan Ma, Jian Wang, Towards enabling cyberinfrastructure as a service in clouds, Comput. Electr. Eng. 39 (1) (2013) 3–14. [34] Apache Hadoop Project. http://hadoop.apache.org/hdfs/. [35] L.N. Sutton, PACS and diagnostic imaging service delivery—a UK perspective, Eur. J. Radiol. 78 (2) (2011) 243–249. http://dx.doi.org/10.1016/j.ejrad.2010. 05.012. [36] L. Xuhui, et al., Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS, in: Cluster Computing and Workshops, 2009, CLUSTER’09, IEEE International Conference on, 2009, pp. 1–8. [37] R. Grossman, et al., Compute and storage clouds using wide area high performance networks, Future Gener. Comput. Syst. 25 (2009) 179–183. [38] DICOM. http://medical.nema.org/. [39] A.L.N. Reddy, J. Wyllie, K.B.R. Wijayaratne, Disk scheduling in a multimedia I/O system, ACM Trans. Multimedia Comput. Commun. Appl. 1 (1) (2005) 37–59. [40] J. Shafer, et al., The Hadoop distributed filesystem: Balancing portability and performance, in: Performance Analysis of Systems & Software, ISPASS, 2010 IEEE International Symposium on, White Plains, NY, 2010, pp. 122–133. [41] Chao-Tung Yang, Jung-Chun Liu, Ching-Hsien Hsu, Wei-Li Chou, On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism, J. Supercomput. (2013) in press. [42] Ganglia Monitoring System. http://ganglia.sourceforge.net/.
C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73 Chao-Tung Yang He is a Professor of Computer Science at Tunghai University in Taiwan. He received the Ph.D. in Computer Science from National Chiao Tung University in July 1996. In August 2001, he joined the Faculty of the Department of Computer Science at Tunghai University. He is serving in a number of journal editorial boards, including International Journal of Communication Systems, Journal of Applied Mathematics, Journal of Cloud Computing, ‘‘Grid Computing, Applications and Technology’’ Special Issue of Journal of Supercomputing, and ‘‘Grid and Cloud Computing’’ Special Issue of International Journal of Ad Hoc and Ubiquitous Computing. Dr. Yang has published more than 250 papers in journals, book chapters and conference proceedings. His present research interests are in cloud computing and service, grid computing, parallel computing, and multicore programming. He is a member of the IEEE Computer Society and ACM. Wen-Chung Shih received a B.S. degree in Computer and Information Science from National Chiao Tung University in 1992 and an M.S. degree in Computer and Information Science from National Chiao Tung University in 1994. He received the Ph.D. degree in Computer Science from National Chiao Tung University in 2008. He passed the second class of the National Higher Examination in Information Processing field in 1994 and in Library Information Management field in 2004, respectively. Since 2008, he has worked as an Assistant Professor in Asia University, Taiwan. Since August 2014, he is an Associate Professor and Chairman of the department of Computer Science and Information Engineering in Asia University. His research interests include e-Learning, data mining and expert systems. Lung-Teng Chen received the master’s degree in Information Engineering Dragon from Tunghai University in 2011. Since 2006, he served as a director in the information chamber system group of Chung Shan Medical University Hospital, mainly work for to the use of information system to assist medical clinical affairs. His interests include information computing architecture, storage architecture, computer networks, networking applications, etc.
73
Cheng-Ta Kuo received the M.S. degree in Department of Computer Science from Tunghai University in 2011. Since 2007, he has worked as a Software Engineer in WE CAN MEDICINES CO., LTD. His research interests include cloud computing, virtualization and data mining.
Fuu-Cheng Jiang Currently, he is with the department of computer science at Tunghai University in Taiwan. His research interests include network modeling, cloud computing, wireless networks and simulation. Dr. Jiang was the recipient of the Best Paper Award at the 5th International Conference on Future Information Technology 2010 (FutureTech2010), which ranked his paper first among the 201 submittals. He has served the TPC of the ICCCT 2011–2012 International Conference, BWCCA2010, IEEE CloudCom 2012 and CSE2011 Session Chair. Moreover, he served as a journal reviewer of the Computer Journal, Ad Hoc Networks and the International Journal of Communication Systems.
Fang-Yie Leu received his B.S., M.S. and Ph.D. degrees from National Taiwan University of Science and Technology, Taiwan, in 1983, 1986 and 1991, respectively, and another M.S. degree from Knowledge System Institute, USA, in 1990. His research interests include wireless communication, network security, Grid applications and Chinese natural language processing. He is currently a professor of Tunghai University, Taiwan, the director of database and network security laboratory of the University, the workshop chair of MCNCS and CWECS workshops, and the editorial board member of several international journals. He is also a member of IEEE Computer Society.