IEEE TRANSACTIONS ON SERVICES COMPUTING,
VOL. 8, NO. 1,
JANUARY/FEBRUARY 2015
47
Designing High Performance Web-Based Computing Services to Promote Telemedicine Database Management System Ismail Hababeh, Issa Khalil, and Abdallah Khreishah Abstract—Many web computing systems are running real time database services where their information change continuously and expand incrementally. In this context, web data services have a major role and draw significant improvements in monitoring and controlling the information truthfulness and data propagation. Currently, web telemedicine database services are of central importance to distributed systems. However, the increasing complexity and the rapid growth of the real world healthcare challenging applications make it hard to induce the database administrative staff. In this paper, we build an integrated web data services that satisfy fast response time for large scale Tele-health database management systems. Our focus will be on database management with application scenarios in dynamic telemedicine systems to increase care admissions and decrease care difficulties such as distance, travel, and time limitations. We propose three-fold approach based on data fragmentation, database websites clustering and intelligent data distribution. This approach reduces the amount of data migrated between websites during applications’ execution; achieves costeffective communications during applications’ processing and improves applications’ response time and throughput. The proposed approach is validated internally by measuring the impact of using our computing services’ techniques on various performance features like communications cost, response time, and throughput. The external validation is achieved by comparing the performance of our approach to that of other techniques in the literature. The results show that our integrated approach significantly improves the performance of web database systems and outperforms its counterparts. Index Terms—Web telemedicine database systems (wtds), database fragmentation, data distribution, sites clustering
Ç 1
INTRODUCTION
T
HE rapid growth and continuous change of the real world software applications have provoked researchers to propose several computing services’ techniques to achieve more efficient and effective management of web telemedicine database systems (WTDS). Significant research progress has been made in the past few years to improve WTDS performance. In particular, databases as a critical component of these systems have attracted many researchers. The web plays an important role in enabling healthcare services like telemedicine to serve inaccessible areas where there are few medical resources. It offers an easy and global access to patients’ data without having to interact with them in person and it provides fast channels to consult specialists in emergency situations. Different kinds of patient’s information such as ECG, temperature, and heart rate need to be accessed by means of various client devices in heterogeneous communications environments. WTDS enable high quality continuous delivery of patient’s information wherever and whenever needed.
I. Hababeh is with the Faculty of Computer Engineering & Information Technology, German-Jordanian University, Amman, Jordan. E-mail:
[email protected]. I. Khalil is with the Qatar Computing Research Institute (QCRI), Qatar Foundation, Doha, Qatar. E-mail:
[email protected]. A. Khreishah is with the Department of Electrical & Computer Engineering, New Jersey Institute of Technology, 181 Waren Street, Newark, NJ 07102. E-mail:
[email protected].
Manuscript received 15 Feb. 2013; revised 17 Nov. 2013; accepted 7 Jan. 2014. Date of publication 15 Jan. 2014; date of current version 6 Feb. 2015. For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TSC.2014.2300499
Several benefits can be achieved by using web telemedicine services including: medical consultation delivery, transportation cost savings, data storage savings, and mobile applications support that overcome obstacles related to the performance (e.g., bandwidth, battery life, and storage), security (e.g., privacy, and reliability), and environment (e.g., scalability, heterogeneity, and availability). The objectives of such services are to: (i) develop large applications that scale as the scope and workload increases, (ii) achieve precise control and monitoring on medical data to generate high telemedicine database system performance, (iii) provide large data archive of medical data records, accurate decision support systems, and trusted event-based notifications in typical clinical centers. Recently, many researchers have focused on designing web medical database management systems that satisfy certain performance levels. Such performance is evaluated by measuring the amount of relevant and irrelevant data accessed and the amount of transferred medical data during transactions’ processing time. Several techniques have been proposed in order to improve telemedicine database performance, optimize medical data distribution, and control medical data proliferation. These techniques believed that high performance for such systems can be achieved by improving at least one of the database web management services, namely—database fragmentation, data distribution, websites clustering, distributed caching, and database scalability. However, the intractable time complexity of processing large number of medical transactions and managing huge number of communications make the design of such methods a non-trivial task. Moreover, none of the
1939-1374 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
48
IEEE TRANSACTIONS ON SERVICES COMPUTING,
existing methods consider the three-fold services together which makes them impracticable in the field of web database systems. Additionally, using multiple medical services from different web database providers may not fit the needs for improving the telemedicine database system performance. Furthermore, the services from different web database providers may not be compatible or in some cases it may increase the processing time because of the constraints on the network [1]. Finally, there has been lack in the tools that support the design, analysis and cost-effective deployments of web telemedicine database systems [1]. Designing and developing fast, efficient, and reliable incorporated techniques that can handle huge number of medical transactions on large number of web healthcare sites in near optimal polynomial time are key challenges in the area of WTDS. Data fragmentation, websites clustering, and data allocation are the main components of the WTDS that continue to create great research challenges as their current best near optimal solutions are all NP-Complete. To improve the performance of medical distributed database systems, we incorporate data fragmentation, websites clustering, and data distribution computing services together in a new web telemedicine database system approach. This new approach intends to decrease data communication, increase system throughput, reliability, and data availability. The decomposition of web telemedicine database relations into disjoint fragments allows database transactions to be executed concurrently and hence minimizes the total response time. Fragmentation typically increases the level of concurrency and, therefore, the system throughput. The benefits of generating telemedicine disjoint fragments cannot be deemed unless distributing these fragments over the websites, so that they reduce communication cost of database transactions. Database disjoint fragments are initially distributed over logical clusters (a group of websites that satisfy a certain physical property, e.g., communications cost). Distributing database disjoint fragments to clusters where a benefit allocation is achieved, rather than allocating the fragments to all websites, have an important impact on database system throughput. This type of distribution reduces the number of communications required for query processing in terms of retrieval and update transactions; it has always a significant impact on the web telemedicine database system performance. Moreover, distributing disjoint fragments among the websites where it is needed most, improves database system performance by minimizing the data transferred and accessed during the execution time, reducing the storage overheads, and increasing availability and reliability as multiple copies of the same data are allocated. Database partitioning techniques aim at improving database systems throughput by reducing the amount of irrelevant data packets (fragments) to be accessed and transferred among different websites. However, data fragmentation raises some difficulties; particularly when web telemedicine database applications have contradictory requirements that avert breakdown of the relation into mutually exclusive fragments. Those applications whose views are defined on more than one fragment may suffer performance ruin. In this case,
VOL. 8, NO. 1,
JANUARY/FEBRUARY 2015
it might be necessary to retrieve data from two or more fragments and take their join, which is costly [31]. Data fragmentation technique describes how each fragment is derived from the database global relations. Three main classes of data fragmentation have been discussed in the literature; horizontal [2], [3], vertical [4], [5], and hybrid [6], [7]. Although there are various schemes describing data partitioning, few are known for the efficiency of their algorithms and the validity of their results [33]. The Clustering technique identifies groups of network sites in large web database systems and discovers better data distributions among them. This technique is considered to be an efficient method that has a major role in reducing the amount of transferred and accessed data during processing database transactions. Accordingly, clustering techniques help in eliminating the extra communications costs between websites and thus enhances distributed database systems performance [32]. However, the assumptions on the web communications and the restrictions on the number of network sites, make clustering solutions impractical [16], [31]. Moreover, some constraints about network connectivity and transactions processing time bound the applicability of the proposed solutions to small number of clusters [9], [10]. Data distribution describes the way of allocating the disjoint fragments among the web clusters and their respective sites of the database system. This process addresses the assignment of each data fragment to the distributed database websites [8], [17], [18], [21]. Data distribution related techniques aim at improving distributed database systems performance. This can be accomplished by reducing the number of database fragments that are transferred and accessed during the execution time. Additionally, Data distribution techniques attempt to increase data availability, elevate database reliability, and reduce storage overhead [11], [27]. However, the restrictions on database retrieval and update frequencies in some data allocation methods may negatively affect the fragments distribution over the websites [20]. In this work, we address the previous drawbacks and propose a three-fold approach that manages the computing web services that are required to promote telemedicine database system performance. The main contributions are:
Develop a fragmentation computing service technique by splitting telemedicine database relations into small disjoint fragments. This technique generates the minimum number of disjoint fragments that would be allocated to the web servers in the data distribution phase. This in turn reduces the data transferred and accessed through different websites and accordingly reduces the communications cost. Introduce a high speed clustering service technique that groups the web telemedicine database sites into sets of clusters according to their communications cost. This helps in grouping the websites that are more suitable to be in one cluster to minimize data allocation operations, which in turn helps to avoid allocating redundant data. Propose a new computing service technique for telemedicine data allocation and redistribution services based on transactions’ processing cost functions.
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE...
2
49
RELATED WORK
Many research works have attempted to improve the performance of distributed database systems. These works have mostly investigated fragmentation, allocation and sometimes clustering problems. In this section, we present the main contributions related to these problems, discuss and compare their contributions with our proposed solutions.
Fig. 1. IFCA computing services architecture.
These functions guarantee the minimum communications cost among websites and hence accomplish better data distribution compared to allocating data to all websites evenly. Develop a user-friendly experimental tool to perform services of telemedicine data fragmentation, websites clustering, and fragments allocation, as well as assist database administrators in measuring WTDS performance. Integrate telemedicine database fragmentation, websites clustering, and data fragments allocation into one scenario to accomplish ultimate web telemedicine system throughput in terms of concurrency, reliability, and data availability. We call this scenario Integrated-Fragmentation-Clustering-Allocation (IFCA) approach. Fig. 1 depicts the architecture of the proposed telemedicine IFCA approach. In Fig. 1, the data request is initiated from the telemedicine database system sites. The requested data is defined as SQL queries that are executed on the database relations to generate data set records. Some of these data records may be overlapped or even redundant, which increase the I/O transactions’ processing time and so the system communications overhead. To solve this problem, we execute the proposed fragmentation technique which generates telemedicine disjoint fragments that represent the minimum number of data records. The web telemedicine database sites are grouped into clusters by using our clustering service technique in a phase prior to data allocation. The purpose of this clustering is to reduce the communications cost needed for data allocation. Accordingly, the proposed allocation service technique is applied to allocate the generated disjoint fragments at the clusters that show positive benefit allocation. Then the fragments are allocated to the sites within the selected clusters. Database administrator is responsible for recovering any site failure in the WTDS. The remainder of the paper is organized as follows. Section 22 summarizes the related work. Basic concepts of the web telemedicine database settings and assumptions are discussed in Section 3. Telemedicine computation services and estimation model are discussed in Section 4. Experimental results and performance evaluation are presented in Section 35. Finally, in Section 6, we draw conclusions and outline the future work.
2.1 Data Fragmentation With respect to fragmentation, the unit of data distribution is a vital issue. A relation is not appropriate for distribution as application views are usually subsets of relations [31]. Therefore, the locality of applications’ accesses is defined on the derivative relations subsets. Hence it is important to divide the relation into smaller data fragments and consider it for distribution over the network sites. The authors in [8] considered each record in each database relation as a disjoint fragment that is subject for allocation in a distributed database sites. However, large number of database fragments is generated in this method, causing a high communication cost for transmitting and processing the fragments. In contrast to this approach, the authors in [11] considered the whole relation as a fragment, not all the records of the fragment have to be retrieved or updated, and a selectivity matrix that indicates the percentage of accessing a fragment by a transaction is proposed. However, this research suffers from data redundancy and fragments overlapping. 2.2 Clustering Websites Clustering service technique identifies groups of networking sites and discovers interesting distributions among large web database systems. This technique is considered as an efficient method that has a major role in reducing transferred and accessed data during transactions processing [9]. Moreover, grouping distributed network sites into clusters helps to eliminate the extra communication costs between the sites and then enhances the distributed database system performance by minimizing the communication costs required for processing the transactions at run time. In a web database system environment where the number of sites has expanded tremendously and amount of data has increased enormously, the sites are required to manage these data and should allow data transparency to the users of the database. Moreover, to have a reliable database system, the transactions should be executed very fast in a flexible load balancing database environment. When the number of sites in a web database system increases to a large scale, the problem of supporting high system performance with consistency and availability constraints becomes crucial. Different techniques could be developed for this purpose; one of them is websites clustering. Grouping websites into clusters reduces communications cost and then enhances the performance of the web database system. However, clustering network sites is still an open problem and the optimal solution to this problem is NP-Complete [12]. Moreover, in case of a complex network where large numbers of sites are
50
IEEE TRANSACTIONS ON SERVICES COMPUTING,
connected to each other, a huge number of communications are required, which increases the system load and degrades its performance. The authors in [13] have proposed a hierarchical clustering algorithm that uses similarity upper approximation derived from a tolerance(similarity) relation and based on rough set theory that does not require any prior information about the data. The presented approach results in rough clusters in which an object is a member of more than one cluster. Rough clustering can help researchers to discover multiple needs and interests in a session by looking at the multiple clusters that a session belongs to. However, in order to carry out rough clustering, two additional requirements, namely, an ordered value set of each attribute and a distance measure for clustering need to be specified [14]. Clustering coefficients are needed in many approaches in order to quantify the structural network properties. In [15], the authors proposed higher order clustering coefficients defined as probabilities that determine the shortest distance between any two nearest neighbors of a certain node when neglecting all paths crossing this node. The outcomes of this method declare that the average shortest distance in the node’s neighborhood is smaller than all network distances. However, independent constant values and natural logarithm function are used in the shortest distance approximation function to determine the clustering mechanism, which results in generating small number of clusters.
2.3 Data Allocation (Distribution) Data allocation describes the way of distributing the database fragments among the clusters and their respective sites in distributed database systems. This process addresses the assignment of network node(s) to each fragment [8]. However, finding an optimal data allocation is NP-complete problem [12]. Distributing data fragments among database websites improves database system performance by minimizing the data transferred and accessed during execution, reducing the storage overhead, and increasing availability and reliability where multiple copies of the same data are allocated. Many data allocation algorithms are described in the literature. The efficiency of these algorithms is measured in term of response time. Authors in [19] proposed an approach that handles the full replication of data allocation in database systems. In this approach, a database file is fully copied to all participating nodes through the master node. This approach distributes the sequences through fragments with a round-robin strategy for sequence input set already ordered by size, where the number of sequences is about the same and number of characters at each fragment is similar. However, this replicated schema does not achieve any performance gain when increasing the number of nodes. When a non-previously determined number of input sequences are present, the replication model may not be the best solution and other fragmentation strategies have to be considered. In [20], the author has addressed the fragment allocation problem in web database systems. He presented an integer programming formulations for the non-redundant version of the fragment allocation problem. This formulation is extended to
VOL. 8, NO. 1,
JANUARY/FEBRUARY 2015
address problems, which have both storage and processing capacity constraints. In this method, the constraints essentially state that there has been exactly one copy of a fragment across all sites, which increase the risk of data inconsistency and unavailability in case of any site failure. However, the fragment size is not addressed while the storage capacity constraint is one of the major objectives of this approach. In addition, the retrieval and update frequencies are not considered in the formulations, they are assumed to be the same, which affects the fragments distribution over the sites. Moreover, this research is limited by the fact that none of the approaches presented have been implemented and tested on a real web database system. A dynamic method for data fragmentation, allocation, and replication is proposed in [25]. The objective of this approach is to minimize the cost of access, re-fragmentation, and reallocation. DYFRAM algorithm of this method examines accesses for each replica and evaluates possible re-fragmentations and reallocations based on recent history. The algorithm runs at given intervals, individually for each replica. However, data consistency and concurrency control are not considered in DYFRAM. Additionally, DYFRAM doesn’t guarantee data availability and system reliability when all sites have negative utility values. In [28], the authors present a horizontal fragmentation technique that is capable of taking a fragmentation decision at the initial stage, and then allocates the fragments among the sites of DDBMS. A modified matrix MCRUD is constructed by placing predicates of attributes of a relation in rows and applications of the sites of a DDBMS in columns. Attribute locality precedence ALP; the value of importance of an attribute with respect to sites of distributed database is generated as a table from MCRUD. However, when all attributes have the same locality precedence, the same fragment has to be allocated in all sites, and a huge data redundancy occurs. Moreover, the initial values of frequencies and weights don’t reflect the actual ones in real systems, and this may affect the number of fragments and their allocation accordingly. The authors in [29] presented a method for modeling the distributed database fragmentation by using UML 2.0 to improve applications performance. This method is based on a probability distribution function where the execution frequency of a transaction is estimated mainly by the most likely time. However, the most likely time is not determined to distinguish the priorities between transactions. Furthermore, no performance evaluations are performed and no significant results are generated from this method. A database tool shown in [30] addresses the problem of designing DDBs in the context of the relational data model. Conceptual design, fragmentation issues, as well as the allocation problem are considered based on other methods in the literature. However, this tool doesn’t consider the local optimization of fragment allocation problem over the distributed network sites. In addition, many design parameters need to be estimated and entered by designers where different results may be generated for the same application case. Our fragmentation approach circumvents the problems associated with the aforementioned studies by introducing
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE...
51
TABLE 1 Comparison between Existing Methods in the Literature and the Proposed Approach
a computing service technique that generates disjoint fragments, avoids data redundancy and considers that all records of the fragment are retrieved or updated by a transaction. By such a technique, less communication costs are required to transmit and process the disjoint database fragments. In addition, by applying the clustering service technique, the complexity of allocation problem is significantly reduced. Therefore, the intractable solution of fragment allocation is turned out into fragment distribution among the clusters and then replicating it among the related sites.
2.4 Commercial Outsource Databases This section investigates current commercial outsource databases and compares them with our IFCA. The commercial outsource databases support enormous data storage architecture that distributes storage and processing across several servers. It can be used to address web database system performance and scalability requirements. Amazon Dynamo [34] is a cloud distributed database system with high available and scalable capabilities. In contrast to traditional relational distributed database management systems, Dynamo only supports database applications with simple read and write queries on data records. However, in Dynamo availability is more important than consistency. Accordingly, in certain times, data consistency can’t be guaranteed. MongoDB [35] is an open source, document oriented cloud distributed database developed by the 10gen software company. MongoDB is highly available and scalable system that supports search by fields in the documents. However, MongoDB uses a special query optimizer to make queries more efficient, and executes different query plans concurrently, which increases the search time complexity.
Google BigTable [36] is a cloud distributed database system built on Google file system and a highly available and persistent distributed lock service. The BigTable data model is designed as a distributed multidimensional (row key, column key, timestamps) sorted map, and implemented on three parts: tablet servers, client library, and master server. However, as the failure of the master server results in the failure of the whole database system, and thus another server usually backs up the master server. This incurs additional costs to the cloud distributed database system in addition to the costs of operating and maintaining the new servers. Due to their contrast in priorities, architecture, data consistency, and search time complexity compared to our IFCA, the previous outsource databases can’t optimize and secure telemedicine data in terms of data fragmentation, clustering websites, and data distribution altogether as our approach successfully does. Table 1 compares our approach with outsource approaches in terms of integrity, reliability, communications overhead, manageability, data consistency, security, and performance evaluation.
3
TELEMEDICINE IFCA ASSUMPTIONS AND DEFINITIONS
Incorporating database fragmentation, web database sites’ clustering, and data fragments computing services’ allocation techniques in one scenario distinguishes our approach from other approaches. The functionality of such approach depends on the settings, assumptions, and definitions that identify the WTDS implementation environment, to guarantee its efficiency and continuity. Below are the
52
IEEE TRANSACTIONS ON SERVICES COMPUTING,
VOL. 8, NO. 1,
JANUARY/FEBRUARY 2015
description of the IFCA settings, assumptions, and definitions.
3.1
Web Architecture and Communications Assumptions The telemedicine IFCA approach is designed to support web database provider with computing services that can be implemented over multiple servers, where the data storage, communication and processing transactions are fully controlled, costs of communication are symmetric, and the patients’ information privacy and security are met. We propose fully connected sites on a web telemedicine heterogeneous network system with different bandwidths; 128 kbps, 512 kbps, or multiples. In this environment, some servers are used to execute the telemedicine queries triggered from different web database sites. Few servers are run the database programs and perform the fragmentationclustering-allocation computing services while the other servers are used to store the database fragments. Communications cost (ms/byte) is the cost of loading and processing data fragments between any two sites in WTDS. To control and simplify the proposed web telemedicine communication system, we assume that communication costs between sites are symmetric and proportional to the distance between them. Communication costs within the same site are neglected. 3.2 Fragmentation and Clustering Assumptions Telemedicine queries are triggered from web servers as transactions to determine the specific information that should be extracted from the database. Transactions include but not limited to: read, write, update, and delete. To control the process of database fragmentation and to achieve data consistency in the telemedicine database system, IFCA fragmentation service technique partitions each database relation according to the Inclusion-Integration-Disjoint assumptions where the generated fragments must contain all records in the database relations, the original relation should be able to be formed from its fragments, and the fragments should be neither repeated nor intersected. The logical clustering decision is defined as a Logical value that specifies whether a website is included or excluded from a certain cluster, based on the communications cost range. The communications cost range is defined as a value (ms/byte) that specifies how much time is allowed for the websites to transmit or receive their data to be considered in the same cluster, this value is determined by the telemedicine database administrator. 3.3 Fragments Allocation Assumptions The allocation decision value ADV is defined as a logical value (1, 0) that determines the fragment allocation status for a specific cluster. The fragments that achieve allocation decision value of (1) are considered for allocation and replication process. The advantage that can be generated from this assumption is that, more communications costs are saved due to the fact that the fragments’ locations are in the same place where it is processed, hence improve the WTDS performance. On the other hand, the fragments that carry out allocation decision value of (0) are considered for
Fig. 2. Data fragmentation service architecture.
allocation process only in order to ensure data availability and fault-tolerant in the WTDS. In this case, each fragment should be allocated to at least one cluster and one site in this cluster. The allocation decision value ADV is assumed to be computed as the result of the comparison between the cost of allocating the fragment to the cluster and the cost of not allocating the fragment to the same cluster. The allocation cost function is composed of the following sub-cost functions that are required to perform the fragment transactions locally: cost of local retrieval, cost of local update to maintain consistency among all the fragments distributed over the websites, and cost of storage, or cost of remote update and remote communications (for remote clusters that do not have the fragment and still need to perform the required transactions on that fragment). The not allocation cost function consists of the following sub-cost functions: cost of local retrieval and cost of remote retrievals required to perform the fragment transactions remotely when the fragment is not allocated to the cluster.
4
TELEMEDICINE IFCA COMPUTATION SERVICES AND ESTIMATION MODEL
In the following sections, we present our IFCA and provide mathematical models of its computations’ services.
4.1 Fragmentation Computing Service To control the process of database fragmentation and maintain data consistency, the fragmentation technique partitions each database relation into data set records that guarantee data inclusion, integration and non-overlapping. In a WTDS, neither complete relation nor attributes are suitable data units for distribution, especially when considering very large data. Therefore, it is appropriate to use data fragments that would be allocated to the WTDS sites. Data fragmentation is based on the data records generated by executing the telemedicine SQL queries on the database relations. The fragmentation process goes through two consecutive internal processes: (i) Overlapped and redundant data records fragmentation and (ii) Non-overlapped data records fragmentation. The fragmentation service generates disjoint fragments that represent the minimum number of data records to be distributed over the websites by the data allocation service. The proposed fragmentation Service architecture is described through Input-Processing-Output phases depicted in Fig. 2. Based on this fragmentation service, the global database is partitioned into disjoint fragments. The overlapped and redundant data records fragmentation process is described in Table 2. In this algorithm, database fragmentation starts with any two random data fragments
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE...
53
TABLE 2 Overlapped and Redundant Data Records Fragmentation Algorithm
TABLE 3 Non-Overlapped Data Records Fragmentation Algorithm
TABLE 4 Fragmentation Process over Patient Relation
(i, j) with intersection records and proceeds as follows: If intersection exists between the two fragments, three disjoint fragments will be generated: Fk ¼ Fi \ Fj , Fkþ1 ¼ Fi – Fk , and Fkþ2 ¼ Fj –F . The first contains the common records, the second contains the records in the first fragment but not in the second fragment, and the third contains the records in the second fragment but not in the first. The intersecting fragments are then removed from the fragments list. This process continues until no more intersecting fragments or data set records still exist for this telemedicine relation, and so for the other relations. The Non-overlapped data records fragmentation process is shown Table 3. The new derived fragments from Table 2 and the fragments from non-overlapping fragments resulted in totally disjoint fragments. The non-overlapped fragments that do not intersect with any other fragment in Table 2 are considered in the final set of the relation disjoint fragments. For example, consider that the transactions triggered from the telemedicine database sites hold queries that require extracting records from different telemedicine database relations. Let us assume three transactions for patient relation (T1, T2, T3) on sites S1, S2, and S3 respectively. T1 defines Patients who medicated by Doctor #1, T2 defines Patients who took medicine #4, and T3 defines Patients who paid more than $1,000. The data set 1 in patient relation intersects with data set 2 in the same relation. This result in new disjoint fragments: F1, F2 and F3.
The data sets 1 and 2 are omitted. Fragment F1 and data set 3 in the same relation are intersected. This results in the new disjoint fragments F4, F5, and F6. Then F1 and the data set 3 are omitted, and the next intersection is between F2 and F6. The result is new disjoint fragments; F7, F8, and F9. Then F2 and F6 are omitted. The final disjoint fragments generated from the transactions over patient relations are: F3, F4, F5, F7, F8, and F9. Table 4 depicts the fragmentation process over patient relation.
4.2 Clustering Computing Service The benefit of generating database disjoint fragments can’t be completed unless it enhances the performance of the WTDS. As the number of database sites becomes too large, supporting high system performance with consistency and availability constraints becomes crucial. Several techniques are developed for this purpose; one of them consists of clustering websites. However, grouping sites into clusters is still an open problem with NP-Complete optimal solution [12]. Therefore, developing a near-optimal solution for grouping web database sites into clusters helps to eliminate the extra communication costs between the sites during the process of data allocation. Clustering web telemedicine database sites speeds up the process of data allocation by distributing the fragments at the clusters that accomplish benefit allocation rather than distributing the fragments once at all websites. Thus, the communication costs are minimized and the
54
IEEE TRANSACTIONS ON SERVICES COMPUTING,
VOL. 8, NO. 1,
JANUARY/FEBRUARY 2015
TABLE 5 Clustering Algorithm
WTDS performance is improved. In this work, we introduce a high speed clustering service based on the least average communication cost between sites. The parameters used to control the input/output computations for generating clusters and determining the set of sites in each are computed as follows:
Communications cost between sites CC(Si, Sj) ¼ data creation cost þ data transmission cost between Si, Sj. Communication cost range CCR (ms/byte) which is determined by the telemedicine database system administrator. Clustering Decision Value (cdv):
cdvðSi ; Sj Þ ¼ f1 : IF CCðSi ; Sj Þ CCR ^ i 6¼ j and : (1) 0 : IF CCðSi ; Sj Þ > CCR _ i ¼ j Accordingly, if cdv(Si , Sj ) is equal to 1, then the sites Si , Sj are assigned to one cluster, otherwise they are assigned to different clusters. If site Si can be assigned to more than one cluster, it will be considered for the cluster of the least average communication cost. Based on this clustering service, we develop the clustering algorithm shown in Table 5. The communications cost within and between clusters is required for the fragment allocation phase where fragments are allocated to web clusters and then to their respective websites. The optimal way to compute this cost is to find the cheapest path between the clusters which is an NP-complete problem [12]. Instead, we use the cluster symmetry average communication cost as it has been shown to be fast, reliable
and efficient method for the computation of the fragments’ allocation and replication service in many heterogeneous web telemedicine database environments. To test the validity of our clustering service technique, and based on the assumptions on the communications costs between sites (symmetric between any two sites, zero within the same site, and proportional to the distance between the sites), we collect a real sample of communication costs (multiple of 0.125 ms/byte) between 12 websites (hospitals) and present it in Table 6. We then apply the clustering service in our IFCA tool on the communication costs in Table 6 with clustering range of 2.5. Fig. 3 depicts the experimental results that generate the clusters and their respective sites. It is inferred from the figure that sites 1 and 3 are located in different clusters because the communication costs of the other sites can’t match the communications cost range with them. Moreover, the number of clusters is increased as the communication cost between sites increased or the communication cost range becomes small.
4.3 Data Allocation and Replication Data allocation techniques aim at distributing the database fragments on the web database clusters and their respective sites. We introduce a heuristic fragment allocation and replication computing service to perform the processes of fragments allocation in the WTDS. Initially, all fragments are subject for allocation to all clusters that need these fragments at their sites. If the fragment shows positive allocation decision value (i.e., allocation benefit greater
TABLE 6 The Communication Costs between Websites
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE...
55
Fig. 3. Clustering telemedicine websites.
than zero) for a specific cluster, then the fragment is allocated to this cluster and tested for allocation at each of its sites, otherwise the fragment is not allocated to this cluster. This fragment is subsequently tested for replication in each cluster of the WTDS. Accordingly, the fragment that shows positive allocation decision value for any WTDS cluster will be allocated at that cluster and then tested for allocation at its sites. Consequently, if the fragment shows positive allocation decision value at any site of cluster that already shows positive allocation decision value, then the fragment is allocated to that site, otherwise, the fragment is not allocated. This process is repeated for all sites in each cluster that shows positive allocation decision value. Fig. 4 illustrates the structure of our data allocation and replication technique. In case a fragment shows negative allocation decision value at all clusters, the fragment is allocated to the cluster that holds the least average communications cost, and then to the site that achieve the least communications cost with other sites in the current cluster. In order to better understand the computation of the queries processing cost functions, a mathematical model will be used to formulate these cost functions. The allocation decision value ADV is computed as the logical result of the comparison between two compound cost functions components; the cost of allocating the fragment to the cluster and the cost of not allocating the fragment to the same cluster. The cost of allocating the fragment Fi issued by the transaction Tk to the cluster Cj , denoted as CA(Tk , Fi , Cj ), is defined as the sum of the following sub-costs: retrievals and updates issued by the transaction Tk to the fragment Fi at cluster Cj , storage
Fig. 4. Data allocation and replication technique.
occupied by the fragment Fi in the cluster Cj , and remote updates and remote communications sent from remote clusters. The mathematical model for each sub-cost function is detailed below:
Cost of local retrievals issued by the transaction Tk to the fragment Fi at cluster Cj . This cost is determined by the frequency and cost of retrievals. The frequency of retrievals represents the average number (0) of retrieval transactions that occur on each fragment at all sites of a certain cluster. The cost of retrievals is the average cost of retrieval transactions that occur on each fragment at all sites of a specific cluster, which is set by the telemedicine database system administrator in time units (millisecond, microsecond,. . .,etc.) / byte. The cost of local retrievals is computed as the multiplication of the average cost of local retrievals for all sites (m) at cluster Cj and the average frequency of local retrievals issued by the transaction Tk to the fragment Fi for all sites at cluster Cj :
CLRðTk ; Fi ; Cj Þ ¼
m X CLRðTk ; Fi ; Cj ; Sq Þ
m m X FREQLRðTk ; Fi ; Cj ; Sq Þ : m q¼1 q¼1
(2)
Cost of local updates issued by the transaction Tk to the fragment Fi at cluster Cj . This cost is determined by the frequency and cost of updates. The frequency of updates is computed as the average number of update transactions that occur on each fragment at all sites of a certain cluster. The cost of updates is the average cost of update transactions that occur on each fragment at all sites of a specific cluster, and is determined by the WTDS administrator in time units per byte. The cost of local updates is computed as the multiplication of the average cost of local updates for all sites (m) at cluster Cj and the average frequency of local
56
IEEE TRANSACTIONS ON SERVICES COMPUTING,
updates issued by the transaction Tk to the fragment Fi for all sites at cluster Cj : CLUðTk ; Fi ; Cj Þ ¼
m X CLUðTk ; Fi ; Cj ; Sq Þ
m m X FREQLUðTk ; Fi ; Cj ; Sq Þ : m q¼1 q¼1
(3)
Cost of storage occupied by the fragment Fi in the cluster Cj . This cost is determined by the storage cost and the fragment size. The storage cost is the cost of storing a data byte at a certain site in a specific cluster. The fragment size is the storage occupied by the fragment in bytes. The cost of storage is computed as the result of multiplication of the average storage cost for all sites ðmÞ at cluster Cj occupied by the fragment Fi and the fragment size:
SCP ðTk ; Fi ; Cj Þ ¼
m X SCP ðTk ; Fi ; Cj ; Sq Þ
m
q¼1
Cost of remote updates issued by the transaction Tk to the fragment Fi sent from all clusters ðnÞ in the WTDS except the current cluster Cj . This cost is determined by the cost of local updates and the frequency of remote updates. The frequency of remote updates is the number of update transactions that occur on each fragment at remote web clusters in the WTDS. The cost of remote update is the multiplication of average cost of local updates at cluster Cq and the average frequency of remote updates issued by the Tk to the fragment Fi at each remote cluster in the WTDS: n X
CRUðTk ; Fi ; Cj Þ ¼
Uratio ¼
m
Uratio :
According to the definition of the cost of allocating fragments, and based on the previous allocation subcosts, the mathematical model that represents the cost of fragment allocation to a certain cluster in the WTDS is computed as:
CAðTk ; Fi ; Cj Þ ¼ CLRðTk ; Fi ; Cj Þ þ CLUðTk ; Fi ; Cj Þ þ SCP ðTk ; Fi ; Cj Þ þ CRUðTk ; Fi ; Cj Þ þ CRCðTk ; Fi ; Cj Þ:
(8)
To complete the process of ADV computation, the cost of not allocating fragment to a certain cluster (the fragment retrieved remotely) should be defined and computed. The cost of not allocating the fragment Fi issued by the transaction Tk to the cluster Cj , denoted CN(Tk , Fi , Cj ), is defined as the sum of the following sub-costs: cost of local retrievals issued by the transaction Tk to the fragment Fi at cluster Cj (computed in equation 2), and the cost of remote retrievals:
CLUðTk ; Fi ; Cq Þ
n X X FREQRUðTk ; Fi ; Cq ; Sa Þ : m q¼1;q6¼j a
Retrieval ratio is used in the evaluation of the cost of remote retrievals and represents the estimated percentage of retrieval transactions in the WTDS. The retrieval ratio is computed by dividing the number of local retrieval frequencies in all WTDS sites by the total number of all local retrievals and update frequencies in all WTDS sites: Rratio ¼
m X
FREQLRðTk ; Fi ; Cj ; Sq Þ
q¼1
X m
FREQLRðTk ; Fi ; Cj ; Sq Þ
(9)
q¼1
! m X FREQLUðTk ; Fi ; Cj ; Sq Þ : þ q¼1
FREQLUðTk ; Fi ; Cj ; Sq Þ
q¼1
X m
þ
m X FREQLUðTk ; Fi ; Cj ; Sq Þ
(7)
Update ratio is used in the evaluation of cost of remote communications and represents the estimated percentage of update transactions in the web telemedicine database system. It is computed by dividing the number of local update frequencies in all WTDS sites by the total number of all local retrievals and update frequencies in all sites: m X
! CRCðTk ; Fi ; Cq ; ðSa ; Sb ÞÞ m a¼1;b¼1;a6¼b m X
q¼1
(5)
n X q¼1;q6¼j
q¼1;q6¼j
JANUARY/FEBRUARY 2015
Cost of remote communications issued for the transaction Tk to the fragment Fi sent from remote clusters ðnÞ in the WTDS. This cost defines the required cost needed for updating the fragment from remote clusters. It is computed as the product of the average cost of remote communication, the average frequency of local update, and update ratio:
CRCðTk ; Fi ; Cj Þ ¼
Fsize ðTk ; Fi Þ: (4)
VOL. 8, NO. 1,
q¼1 m X
FREQLUðTk ; Fi ; Cj ; Sq Þ :
FREQLUðTk ; Fi ; Cj ; Sq Þ
q¼1
(6)
Cost of remote retrievals issued by the transaction Tk to the fragment Fi sent from remote clusters ðnÞ in the WTDS. The cost of remote retrievals is determined by retrieval ratio, retrieval frequency, and communications cost that represents the communications between all clusters. This cost is computed as the product of the average cost of
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE...
communications between all clusters, the average frequency of remote retrieval, and the retrieval ratio: CRRðTk ; Fi ; Cj Þ ¼
n X
X
CCCðTk ; Fi ; Cq ; ðSa ; Sb ÞÞ m q¼1 a¼1;b¼1;a6¼b n m X X FREQRRðTk ; Fi ; Cq ; Sa ÞÞ m q¼1;q6¼j a¼1
Rraio : (10)
Based on the previous not allocation sub-costs, the mathematical model that represents the cost of not allocating fragment to a certain cluster in the WTDS is computed as:
CNðTk ; Fi ; Cj Þ ¼ CLRðTk ; Fi ; Cj Þ þ CRRðTk ; Fi ; Cj Þ:
(11)
Accordingly, the allocation decision value ADV determines the allocation status of the fragment at the cluster and its sites. ADV is computed as a logical value from the comparison between the cost of allocating the fragment to the cluster CA(Tk , Fi , Cj ) and the cost of not allocating the fragment to the same cluster CN(Tk , Fi , Cj ):
fragments, clustering websites, fragments allocation, computing average retrieval, and update frequencies. The time complexity of each entity is computed as follows: The complexity of defining fragments is O(R(R.size ()–1)2 ), where R is the number of relations, and R. size() is the average current number of records in each relation. The complexity of clustering websites is O((N2 – N) log2 n), where N is the number of sites in the WTDS. The complexity of fragments allocation is O(R.size ()RS), where R.size() is the average current number of records in each relation, R is the number of relations, and S is the number of sites. The complexity of computing average retrieval and update frequencies is O(FSAA.size()) where F is the number of the fragments in the database, S is the number of sites, A is the number of applications at each site, and A.size() is the average current number of records in each application. Based on the complexity computations of the previous incorporated entities, the time complexity of this approach is bounded by: O(R(R.size() – 1)2 þ O((N2 – N) log2 n) þ R. size()RS þ FSAA.size()).
5
ADV ðTk ; Fi ; Cj Þ ¼ f1; CNðTk ; Fi ; Cj Þ CAðTk ; Fi ; Cj Þ and 0; CNðTk ; Fi ; Cj Þ : < CAðTk ; Fi ; Cj Þ (12) Therefore, if CN(Tk , Fi , Cj ) is greater than or equal to CA(Tk , Fi , Cj ), then the allocation status ADV(Tk , Fi , Cj ) shows positive allocation benefit and the fragment is allocated at that cluster, otherwise, CN(Tk , Fi , Cj ) is less than CA(Tk , Fi , Cj ) where the allocation status ADV(Tk , Fi , Cj ) shows non-positive allocation benefit, so the fragment is not allocated at that cluster. To formalize the fragment allocation procedures, we developed the allocation and replication algorithm shown in Table 7. Once the fragments are allocated and replicated at the web clusters, then the investigation of the benefit of allocating the fragments at all sites of each cluster becomes crucial. To get better WTDS performance, fragments should be allocated over the sites of clusters where a clear allocation benefit is accomplished or where data availability is required. The same allocation and replication algorithm will be applied in this case, taking into consideration the costs between sites in each cluster. For example, Consider fragment 1 for allocation in clusters 1, 2, and 3 respectively. Based on Equations (2)-(12) that determine the allocation decision value, Tables 8, 9, and 10 depict the calculations of such process. The allocation decision value for fragment 1 will be computed in the same way for the other clusters and for cluster 1 sites (5, 6, and 7) to determine which cluster/site can allocate the fragment and which one will not.
4.4 Complexity of Computation The time complexity of our approach is bounded by the time complexity of the following incorporated entities: defining
57
EXPERIMENTAL RESULTS AND PERFORMANCE EVALUATION
To analyze the behavior of its computing service techniques, we develop an IFCA software tool that is used to validate the telemedicine database system performance. This tool is not only experimental but it also supports the use of knowledge extraction and decision making. In this tool, we test the feasibility of the three proposed computing services’ techniques. The experimental results and the performance evaluation of our approach are discussed in the following sections.
5.1 Evaluation of Fragmentation Service The internal validity of our fragmentation service technique is tested on multiple relations of a WTDS. We apply the fragmentation computing service in our IFCA tool on a set of data records obtained from eight different transactions as (Table 11). Fig. 5 depicts the experimental results that generate the disjoint fragments and their respective records. Fig. 5 shows that our fragmentation computing service satisfies the optimal solution of data fragmentation in WTDS and generates the minimum number of fragments for each relation. This helps in reducing the communications cost and the data records storage, hence improving WTDS performance. We now evaluate the fragmentation performance in terms of fragments storage size reduction. Let Fperr represent the fragmentation performance percentage of the relation r, Rdrsr represent the relation data records size generated from the database queries, and Rfsr represent the relation final fragment data records size, then Fperr is computed as: Fperr ¼ ðRdrsr Rfsr Þ=Rdrsr :
(13)
58
IEEE TRANSACTIONS ON SERVICES COMPUTING,
VOL. 8, NO. 1,
JANUARY/FEBRUARY 2015
TABLE 7 Fragments Allocation and Replication Algorithm
For example, when the fragmentation is performed for relation 1 on data records sets generated from the database queries of total size 115 kb and a final number of fragments of a total size 33 kb, then the storage size that will be
considered for this relation is reduced from 115 to 33 kb with no data redundancy. This action helps in improving the system performance by reducing the data storage size of about 71 percent.
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE...
59
TABLE 8 Cluster 1 Costs of Retrieval, Update, and Storage
TABLE 9 Average # of Retrieval, Update Frequencies, and Communication Costs
TABLE 10 Computation of Fragment Allocation Decision Value
To externally validate our approach, we implemented the two fragmentation techniques that are proposed by Ma et al. [8] and Huang and Chen [12] respectively and compare their performance against our approach. Fig. 6 shows number of fragments against the relation number for the three techniques. The results in the figure show that more large sized fragments are produced by methods in [8] and [12] due to their fragmentation assumptions (See Section 2). The results also show that less number of small size fragments is generated by our computing service. Therefore, it is clear that our fragmentation technique significantly outperforms the approaches proposed by Ma et al. [8] and Huang and Chen [12]. TABLE 11 Computation of Fragment Allocation Decision Value
Fig. 5. Generating database fragments.
5.2 Evaluation of Clustering Service To evaluate the performance accomplished by grouping the telemedicine websites running under our clustering service technique, we introduce a mathematical model to calculate the performance gain in terms of the reduced communication costs that can be saved from clustering websites. The clustering performance gain is computed as the result of the reduced costs of communications divided by the sum of communications costs between sites. The reduced communications costs are specifically defined as the difference between the sum of costs that are required for each site to communicate with remote sites in the web system and the sum of costs that is required for each cluster to communicate with remote clusters. The performance evaluation of such mathematical model is expressed as follows: Let CPE represent (clustering performance evaluation) and CC(Si , Sj )
Fig. 6. Database fragmentation performances.
60
IEEE TRANSACTIONS ON SERVICES COMPUTING,
Fig. 7. Clustering performance comparisons.
represent the communication cost between any two sites; Si and Sj in the cluster Ci where i, j ¼ 1; 2; 3; . . . ; n. Let C C(Ci , Cj ) represent the communication cost between any two clusters; Ci and Cj where i, j ¼ 1; 2; 3; . . . ; n. CPE can be defined as: CPE ¼
n X n X
CCðSi ; Sj Þ
i¼1 j¼1
, n X n X
n X n X i¼1 j¼1
! CCðCi ; Cj Þ (14)
CCðSi ; Sj Þ:
i¼1 j¼1
For example, consider a telemedicine database system simulated on 10 websites grouped in four clusters, each site communicates with the other nine sites, and each cluster communicate with the other three clusters taking into consideration the communications within the cluster itself. For simplicity, we assume the sum of communications costs between sites is 697.5 ms/byte and between clusters is 181.42 ms/byte. By applying CPE (Equation (14)), the
VOL. 8, NO. 1,
JANUARY/FEBRUARY 2015
communications cost is decreased to 74 percent of the original communications cost. To externally validate our clustering service, we implement the clustering techniques by Kumar et al. [13] and Franczak et al. [15] and compare them with our clustering service in terms of the number of generated clusters against the number of websites (Fig. 7). The results in Fig. 7 show that our clustering service generates more clusters for smaller number of sites; hence it induces less communication costs within the clusters. On the other hand, the other techniques generate less clusters for large number of sites, thus, they induce more communication costs within the clusters. Fig. 7 shows that the clustering trend increases with the increase in the number of network sites in [13] and ours. In contrast, the number of clusters generated by the clustering approach in [15] is less due to their clustering approximation function that uses natural logarithmic function. This in turn results in maximizing the number of sites in each cluster which increases the communications cost.
5.3 Evaluation of Fragment Allocation Service To evaluate our fragment allocation computing, we use a set of disjoint fragments obtained from our fragmentation service, a set of the clusters and their sites generated from our clustering service, and a set of average number of retrievals and updates at each site (Table 12a). We apply the fragmentation service in our IFCA software too using the site costs of storage, retrieval, and update depicted in Table 12b. the results are shown in Fig. 8. We propose the following mathematical model to evaluate the performance of fragment allocation and replication service. Let IAFS represent the initial size of allocated fragments at site S, n represent the maximum number of sites in the current cluster, FAFS represent the final size of allocated fragments at site S, and APEC represents the fragment allocation and replication performance evaluation at cluster C, then APEC is computed as:
TABLE 12 (a) Fragments Retrieval and Update Frequencies (b) Costs of Storage, Retrieval, and Update
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE...
61
Fig. 8. Fragments allocation at clusters.
APECc ¼
n X s¼1
IAFSs
n X
FAFSs
s¼1
, n X
! IAFSs : (15)
s¼1
For example, assume that 35 fragment allocations with a total size of 91 kb are initially allocated to the WTDS clusters. The final fragment allocation using our approach is 14 with a total size of 27 kb. Therefore, the storage gain of our allocation technique is about 70 percent which in turn reduces fragments allocation average computation time. Our proposed allocation technique is compared with the allocation methods by Ma et al. [8] and Menon [20]. These two approaches allocate large numbers of fragments to each site and hence consume more storage which in turn increase the fragments allocation average computation time in each cluster. Fig. 9 illustrates the effectiveness of our fragment allocation and replication technique in terms of average computation time compared to the fragment allocation methods in [8] and [20]. The figure shows that our fragment allocation and replication technique incurs the least average computation time required for fragment allocation. This is because our clustering technique produces more clusters of small number of websites which reduces the fragment allocation average computation time in each cluster. We Infer from Fig. 9 that the average computation time of fragment allocation increases as the number of websites in the
Fig. 9. Fragments allocation after clustering.
cluster increases. Most importantly, the figure shows that that our technique outperforms the ones in [8] and [20].
5.4 Evaluation of Web Servers Load and Network Delay The telemedicine database system network workload involves queries from a number of database users who access the same database simultaneously. The amount of data required per second and the time in which the queries should be processed depend on the database servers running under such network. For simplicity, 12 websites grouped in four clusters are considered for network simulation to evaluate the performance of a telemedicine database system. The performance factors under evaluation are the servers load and the network delay that is simulated in OPNET [23]. The proposed network simulation for building WTDS is represented by web nodes. Each node is a stack of two 3Com Super Stack II 1100 and two Super stack II 3300 chassis (3C_SSII_1100_3300) with four slots (4s), 52 auto-sensing Ethernet ports (ae52), 48 Ethernet ports (e48), and 3 Gigabit Ethernet ports (ge3). The center node model is a 3Comswitch, the periphery node model is internet workstation (Sm_Int_wkstn), and the link model is 10BaseT. The center nodes are connected with internet servers (Sm_Int_server) by 10BaseT. The servers are also connected with a router (Cisco 2514, node 15) by 10BaseT. Fig. 10 depicts the network topology for the simulated WTDS which consists of 12 websites nodes (0,
Fig. 10. The WTDS network topology.
62
IEEE TRANSACTIONS ON SERVICES COMPUTING,
VOL. 8, NO. 1,
JANUARY/FEBRUARY 2015
Fig. 11. WTDS servers’ load comparison. Fig. 12. Servers’ Max. load for clustering techniques.
1, 2, 3, 4, 9, 10, 11, 16, 17, 18, 20) and four clusters (6, 13, 14, and 22) connected with internet servers (7 and 8) that are also connected with a router (Cisco 2514, node 15) by 10BaseT. The following sections evaluate the effect of server load and network delay on the distributed database performance.
5.4.1 Server Load The server load determines the speed of the server in terms of (bits/sec). Fig. 11 shows server’s load comparison between our approach and the approaches in Kumar et al. [13] and Franczak et al. [15]. The servers are denoted by the cluster legends C1, C2, C3, and C4 respectively. The results of the comparison are drawn from Figs. 11 and 12, and are summarized in Table 13. Table 13 shows that the server load increases as the number of represented nodes (sites) decreases; this is due to load distribution over the nodes assuming that all nodes have the same capacity. On the other hand, the server load decreases as the number of represented nodes increases. The results state that the network is almost balanced throughout processing the web services. However, some variations are expected due to the possible variations in the number of processing transactions on each node. Fig. 12 shows the maximum load (bits/sec) on the servers’ clusters in our proposed clustering technique, and the clustering methods in [13] and [15]. The results clearly show that our approach outperforms the approaches in [13] and [15]. 5.4.2 Network Delay The network delay is the delay caused by the transactions traffic on the web database system servers. The network delay is defined as the maximum time (millisecond) required for the network system to reach the steady state. Fig. 13 shows the network delay caused by the WTDS servers represented by the legends (C1, C2, C3, C4). The figure indicates that the web database system reaches the
steady state after 0.065 milliseconds. It shows that the network delay is less when distributing the websites over four servers compared to the delay consumed when all websites connect 1, 2, or 3 servers. Fig. 14 shows the maximum network delay (sec) caused by web servers against distance range for our clustering technique, and the techniques in [13] and [15]. Note that the network delay in our approach is always less than in [13] and [15]; this is due to the better clustering computations of our technique.
5.5 Threat to Validity Threats to external validity limit the ability to generalize the results of the experiments to industrial practice. In order to avoid such threats in evaluation of our approach, we have compared each of our proposed computing services’ techniques; fragmentation, clustering and data allocation with two other similar techniques proposed by different groups of researchers. Each of our proposed techniques is implemented with two comparable techniques well accepted by the database systems community. The performance of our proposed techniques is compared. The results show that our fragmentation, clustering and allocation techniques outperform their counterparts proposed in the literatures.
6
CONCLUSION
In this work, we proposed a new approach to promote WTDS performance. Our approach integrates three enhanced computing services’ techniques namely, database fragmentation, network sites clustering and fragments allocation. We develop these techniques to solve technical challenges, like distributing data fragments among multiple web servers, handling failures, and making tradeoff between data availability and consistency. We propose an estimation model to compute communications cost which helps in finding cost-effective data allocation solutions. The novelty of our approach lies in the integration of web database sites clustering as a new component of the process of
TABLE 13 WTDS Servers’ Load Comparison
HABABEH ET AL.: DESIGNING HIGH PERFORMANCE WEB-BASED COMPUTING SERVICES TO PROMOTE TELEMEDICINE DATABASE...
[6]
[7] [8] [9]
Fig. 13. The WTDS network delay.
[10] [11] [12] [13] [14]
Fig. 14. Max. Net. delay for clustering techniques.
[15]
WTDS design in order to improve performance and satisfy a certain level of quality in web services. We perform both external and internal evaluation of our integrated approach. In the internal evaluation, we measure the impact of using our techniques on WTDS and web service performance measures like communications cost, response time and throughput. In the external evaluation, we compare the performance of our approach to that of other techniques in the literature. The results show that our integrated approach significantly improves services requirement satisfaction in web systems. This conclusion requires more investigation and experiments. Therefore, as future work we plan to investigate our approach on larger scale networks involving large number of sites over the cloud. We will consider applying different types of clustering and introduce search based technique to perform more intelligent data redistribution. Finally, we intend to introduce security concerns that need to be addressed over data fragments.
[23] [24]
REFERENCES
[25]
[1] [2]
[3] [4] [5]
J.-C. Hsieh and M.-W. Hsu, “A Cloud Computing Based 12-Lead ECG Telemedicine Service,” BMC Medical Informatics and Decision Making, vol. 12, pp. 12-77, 2012. A. Tamhanka and S. Ram, “Database Fragmentation and Allocation: An Integrated Methodology and Case Study,” IEEE Trans. Systems, Man and Cybernetics, Part A: Systems and Humans, vol. 28, no. 3, pp. 288-305, May 1998. L. Borzemski, “Optimal Partitioning of a Distributed Relational Database for Multistage Decision-Making Support systems,” Cybernetics and Systems Research, vol. 2, no. 13, pp. 809-814, 1996. J. Son and M. Kim, “An Adaptable Vertical Partitioning Method in Distributed Systems,” J. Systems and Software, vol. 73, no. 3, pp. 551-561, 2004. S. Lim and Y. Ng, “Vertical Fragmentation and Allocation in Distributed Deductive Database Systems,” J. Information Systems, vol. 22, no. 1, pp. 1-24, 1997.
[16] [17]
[18] [19] [20] [21]
[22]
[26]
[27]
[28] [29]
63
S. Agrawal, V. Narasayya, and B. Yang, “Integrating Vertical and Horizontal Partitioning into Automated Physical Database Design,” Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 359-370, 2004. S. Navathe, K. Karlapalem, and R. Minyoung, “A Mixed Fragmentation Methodology for Initial Distributed Database Design,” J. Computer and Software Eng., vol. 3, no. 4, pp. 395-425, 1995. H. Ma, K. Scchewe, and Q. Wang, “Distribution Design for Higher-Order Data Models,” Data and Knowledge Eng., vol. 60, pp. 400-434, 2007. W. Yee, M. Donahoo, and S. Navathe, “A Framework for Server Data Fragment Grouping to Improve Server Scalability in Intermittently Synchronized Databases,” Proc. ACM Conf. Information and Knowledge Management (CIKM), 2000. A. Jain, M. Murty, and P. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999. Lepakshi Goud, “Achieving Availability, Elasticity and Reliability of the Data Access in Cloud Computing,” Int’l J. Advanced Eng. Sciences and Technologies, vol. 5, no. 2, pp. 150-155, 2011. Y. Huang and J. Chen, “Fragment Allocation in Distributed Database Design,” J. Information Science and Eng., vol. 17, pp. 491-506, 2001. P. Kumar, P. Krishna, R. Bapi, and S. Kumar, “Rough Clustering of Sequential Data,” Data and Knowledge Eng., vol. 63, pp. 183-199, 2007. K. Voges, N. Pope, and M. Brown, “Cluster Analysis of Marketing Data Examining Online Shopping Orientation: A comparison of K-means and Rough Clustering Approaches,” Heuristics and Optimization for Knowledge Discovery, H.A. Abbass, R.A. Sarker, and C. S. Newton, eds., pp. 207-224, Idea Group Publishing, 2002. A. Fronczak, J. Holyst, M. Jedyank, and J. Sienkiewicz, “Higher Order Clustering Coefficients,” Barabasi-Albert Networks, Physica A: Statistical Mechanics and Its Applications, vol. 316, no. 1-4, pp. 688-694, 2002. M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “Clustering Algorithms and Validity Measures,” Proc. 13th Int’l Conf. Scientific and Statistical Database Management (SSDBM), 2001. A. Ishfaq, K. Karlapalem, and Y. Kaiso, “Evolutionary Algorithms for Allocating Data in Distributed Database Systems,” Distributed and Parallel Databases, vol. 11, pp. 5-32, Kluwer Academic Publishers, 2002. C. Danilowicz and N. Nguyen, “Consensus Methods for Solving Inconsistency of Replicated Data in Distributed Systems,” Distributed and Parallel Databases, vol. 14, pp. 53-69, 2003. R. Costa and S. Lifschitz, “Database Allocation Strategies for Parallel BLAST Evaluation on Clusters,” Distributed and Parallel Databases, vol. 13, pp. 99-127, 2003. S. Menon, “Allocating Fragments in Distributed Databases,” IEEE Trans. Parallel and Distributed Systems, vol. 16, no. 7, pp. 577-585, July 2005. N. Daudpota, “Five Steps to Construct a Model of Data Allocation for Distributed Database Systems,” J. Intelligent Information Systems: Integrating Artificial Intelligence and Database Technologies, vol. 11, no. 2, pp. 153-68, 1998. Microsoft SQL Server Available from: , 2012. MySQL 5.6 , 2013. OPNET IT Guru Academic, OPNET Technologies, Inc. , 2003. J. Hauglid, N. Ryeng, and K. Norvag, “DYFRAM: Dynamic Fragmentation and Replica Management in Distributed Database Systems,” Distributed and Parallel Databases, vol. 28, pp. 157-185, 2010. Z. Wang, T. Li, N. Xiong, and Y. Pan, “A Novel Dynamic Network Data Replication Scheme Based on Historical Access Record and Proactive Deletion,” J. Supercomputing, vol. 62, pp. 227-250, Oct. 2011, DOI: 10.1007/s11227-011-0708-z. Published online. S. Khan and I. Ahmad, “Replicating Data Objects in Large Distributed Database Systems: An Axiomatic Game Theoretic Mechanism Design Approach,” Distributed and Parallel Databases, vol. 28, no. 2/3, pp. 187-218, 2010. H. KhanS and L. Hoque, “A New Technique for Database Fragmentation in Distributed Systems,” Int’l J. Computer Applications, vol. 5, no. 9, pp. 20-24, 2010. S. Jagannatha, M. Mrunalini, T. Kumar, and K. Kanth, “Modeling of Mixed Fragmentation in Distributed Database Using UML 2.0,” IPCSIT, V. 2, pp. 190-194, 2011.
64
IEEE TRANSACTIONS ON SERVICES COMPUTING,
[30] A. Morffi, C. Gonzalez, W. Lemahieu, and L. Gonzalez, “SIADBDD: An Integrated Tool to Design Distributed Databases,” Revista Facultad de Ingenieria Universidad de Antioquia ISSN (Version impresa): 0120-6230, No. 47, pp. 155-163, Mar. 2009. € [31] M.T. Ozsu and P. Valduriez, Principles of Distributed Databases. Third ed., 2011. [32] G. Mao, M. Gao, and W. Yao, “An Algorithm for Clustering XML Data Stream Using Sliding Window,” Proc. the Third Int’l Conf. Advances in Databases, Knowledge, and Data Applications, pp. 96-101, 2011. [33] M.P. Paix~ ao, L. Silva, and G. Elias, “Clustering Large-Scale, Distributed Software Component Repositories,” Proc. the Fourth Int’l Conf. Advances in Databases, Knowledge, and Data Applications, pp. 124-129, 2012. [34] G. Decandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, “Dynamo: Amazon’s Highly Available Key-Value Store,” Proc. ACM Symp. Operating Systems Principles, pp. 205-220, 2007. [35] http://en.wikipedia.org/wiki/MongoDB, Nov. 2013. [36] F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R.E. Gruber, “Big Table: A Distributed Storage System for Structured Data,” Proc. Seventh Symp. Operating Systems Design and Implementation, pp. 205-218, 2006. Ismail Hababeh received the bachelor’s degree in computer science from the University of Jordan, the master’s degree in computer science from Western Michigan University, and the PhD degree in computer science from Leeds Metropolitan University, United Kingdom. His research areas of particular interest include, but are not limited to the following: distributed databases, cloud computing, network security, and systems performance.
VOL. 8, NO. 1,
JANUARY/FEBRUARY 2015
Issa Khalil received the BSc and MS degrees from Jordan University of Science and Technology in 1994 and 1996, and the PhD degree from Purdue University, in 2007, all in computer engineering. Immediately thereafter, he joined the College of Information Technology (CIT) of the United Arab Emirates University (UAEU) where he was promoted to an associate professor in September 2011. In June 2013, he joined Qatar Computing Research Institute (QCRI) as a senior scientist with the cyber security group. His research interests span the areas of wireless and wire-line communication networks. He is especially interested in security, routing, and performance of wireless sensor, ad hoc and mesh networks. His recent research interests include malware analysis, advanced persistent threats, and ICS/SCADA security. He served as the technical program co-chair of the Sixth International Conference on Innovations in Information Technology, and was appointed as a Technical Program Committee member and a reviewer for many international conferences and journals. In June 2011, he was granted the CIT outstanding professor award for outstanding performance in research, teaching, and service.
Abdallah Khreishah received the BS degree with honors from Jordan University of Science & Technology in 2004. He received the MS and PhD degrees in electrical and computer engineering from Purdue University in 2006 and 2010, respectively. In Fall 2012, he joined the ECE Department of New Jersey Institute of Technology as an assistant professor. His research spans the areas of network coding, wireless networks, cloud computing, and network security.