A Method for Fragment Allocation Design in the ... - Semantic Scholar

2 downloads 0 Views 68KB Size Report
[8] have proposed a heuristic methodology for determining file and workload ... We propose the communication costs between the DDBs sites in table 1. Table 1: ...
University General Requirements Unit

A Method for Fragment Allocation Design in the Distributed Database Systems Ismail O. Hababeh, UGRU, E -mail: [email protected] U.A.E University, Al-Ain, P.O.Box 17172, U.A.E Nicholas Bowring, Muth u Ramachandran Faculty of Information and Engineering Systems - School of Computing Leeds Metropolitan University Leeds - LS6 3QS, U.K

Abstract The fragment allocation design is an essential issue that improves the performance of the applications processing in the Distributed Database systems (DDBs). The database queries access the applications on the distributed database sites and should be performed effectively. Therefore, the fragments that accessed by queries are needed to be allocated to the DDBs sites so as to reduce the communication cost during the applications execution and handle their operational processing. We present a method for grouping the sites of the DDBs according to their communication cost in order to determine the fragment allocation to a group of sites instead of allocating the fragments to site by site. Optimizing the cost of the fragment allocation functions to reduce the queries processing time and determining the fragments to be allocated in the DDBs sites are also main objectives in our research.

1. INTRODUCTION Various approaches have already been described the data allocation technique in distributed systems. Some methods are limit in their theoretical and implementation parts. Other strategies are ignoring the optimization of the transaction response time. The other approaches present exponential time of complexity and test their performance on specific types of network connectivity. The design of the DDBs enhances the performance of applications by minimizing the amount of ir relevant data accessed by the applications [1], and by minimizing the amount of data transferred in processing the applications [2]. There are two ways by which the performance of applications can be enhanced: grouping sites and fragment allocation. Grouping sites of DDBs holds relevant data accessed by an application into a group of sites. Fragment allocation is the process of placing the fragments at the sites of DDBs to minimize the data transfer cost and the number of messages during the applications processing. This paper presents an approach designed for allocating database fragments in DDBs and shows a way of grouping sites to which fragments would be allocated. This approach describes a method which minimizing the transactions communication cost by distributing the database fragments over the DDBs sites, increases data availability and integrity by allocating multiple copies of the same database fragments over the sites where it possible, and minimizes the transactions total response time. The remainder of this paper is organized as follows. We will present a background of grouping sites and fragment allocation in section 2. Description of our grouping method will be done in section 3. In section 4, we will describe how to allocate fragments to the groups. Performance evaluation will be done in section 5. Finally, we will have some concluding remarks.

2. BACKGROUND Several approaches have been proposed for database partitioning and fragment allocations in DDBs. Son J. et al. [3] have introduced an adaptable vertical partitioning method in distributed systems. Yee W. et al. [4] proposed a method of grouping fragments based on a measure of inter-client sharing, which describes how many fragments each client’s subscription has in common with those of others.

UGRU - 4

The Sixth Annual U.A.E. University Research Conference

University General Requirements Unit Chun-Hung C. et al. [5] have explored the use of a genetic search -based clustering algorithm for data partitioning to achieve high database retrieval performance. They formulate the clustering problem in data partitioning as a Travelling Salesman Problem (TSP) and propose 2 genetic operators SE and SP, as well as modified version of ER operators to solve the associated (TSP). A vertical partitioning technique is used in their algorithm, and they show that their model is applied to solve the horizontal partitioning problem. Tamhankar A. et al. [6] have developed a comprehensive methodology for fragmentation and distribution of data across multiple sites such that design objectives in terms of response time and availability for transactions, and constraints on storage space are adequately addressed. Daudpota N. et al. [7] have constructed a formal model of data allocation and have derived an algorithm to fragment and allocate the relations. Their work is not applied to the distributed applications, which have different network connectivity (LAN/WAN). H.Lee et al. [8] have proposed a heuristic methodology for determining file and workload allocation simultaneously on a LAN. This method minimizes the response time for processing transactions. Only transactions with same properties are routed to the same server, which does not guarantee the minimization of the communication cost. Their assumption of Non-redundant allocation decreases the reliability of the system, and the impact of storing fragment copies on the sites of the LAN is not very significant. Bellatreche L. et al. [9] formulated the combined methods and class allocation problem and developed a model to calculate the total data transfer cost incurred, their allocation algorithm generates near optim al solution to the problem. Peddemors A. et al. [10] have described the first phase realization of a distributed database system in which an iterative process is used to build the distributed database system. Each phase has a set of objectives, spans a lim ited amount of time, ads functionality, and the output of every phase serves as input for the next phase. However, in their work the generic server interface is not easily usable; for every application, a new server interface has to be written. Yin-Fu H. et al. [11] have proposed a heuristic algorithm that reflects transaction behavior in distributed database. Their model determines the replicated number of each fragment and finds a nearoptimal allocation of all fragments in a WAN such that the total communication cost is minimized. The fragments accessed by a transaction are all assumed independent, which is not the case in the real world. This method neglects site information like storage and processing capacity and it is applied only on a LAN network. They consider the CPU processing time and I/O access time as minor factors in minimizing the total cost in the environment of WAN.

3. GROUPING SITES Grouping sites (clustering) is a method of grouping sites according to a certain criteria to increase the system I/O performance and reduce storage overheads. Grouping sites into clusters helps in reducing the communication costs between the sites during the process of data allocation. We proposed a method for clustering sites according to their communication cost, which determines whether or not a set of sites is assigned to a certain cluster, and it considered as a fast way to determine the data allocation to a set of sites rather than site by site. Two sites (Si ,Sj ) are grouped in one cluster if the communication cost between them is less than or equal to a Communication Cost Range (CCR); the number of communication units which is allowed for the maximum difference of the communication cost between the sites to be grouped in the same cluster, this number is determined by the network of the DDBs (Hababeh I. et al. [12]). Following is the definition of our clustering algorithm: Input: Sites communication cost matrix CCR value The sites of DDBs Output: the set of clusters and their respective sites

The Sixth Annual U.A.E. University Research Conference

UGRU - 5

University General Requirements Unit Begin Repeat For I = 1 to the number of sites in the database For J = 1 to the number of sites in the database If I ≠ J and communication cost between site I and site J = CA(T k,F i,C j))

(9)

We define our fragment allocation algorithm as follows:

UGRU - 8

The Sixth Annual U.A.E. University Research Conference

University General Requirements Unit Input: Number of transactions issued in the database Number of fragments used for allocation in the database Number of clusters used for allocation in the database Output: The fragments that are allocated to the clusters Begin For k = 1 to the number of transactions do For i = 1 to the number of fragments do For j = 1 to the number of clusters at fragment I do CRUsum(T k,Fi,Cj ) = 0; CRCsum(T k,Fi,Cj ) = 0; CRRsum(T k,Fi,Cj ) = 0; For x = 1 to the number of clusters at fragment I do If x ? j Then CRUsum(T k,Fi,Cj ) = CRUsum(T k,Fi,Cj ) + CLU(T k,Fi ,Cx) * FREQRU(T k,Fi ,Cx) CRCsum(T k,Fi,Cj ) = CRCsum(T k,Fi,Cj ) + Uratio * FREQLU (T k,Fi ,Cx) * CRC(T k,Fi,Cx ) CRRsum(T k,Fi,Cj ) = CRRsum(T k,Fi ,Cj ) + Rratio * FREQRR(T k,Fi ,Cx) * CCC End if; End for; CA(T k,Fi,Cj ) = CLRsum(T k,Fi ,Cj ) + CLUsum(T k,Fi ,Cj ) + CSPsum(T k,Fi,Cj ) + CRUsum(T k,Fi,Cj ) + CRCsum(T k,Fi,Cj ) CN(T k,Fi,Cj ) = CLRsum(T k,Fi ,Cj ) + CRRsum(T k,Fi ,Cj ) D(T k,F i,C j ) = (CN(T k,Fi,Cj ) >= CA(T k,Fi ,C j )) If D(T k,F i,C j) = True Then Allocate the fragment to the current cluster Else Cancel the fragment from the current cluster End if; End for; End for; End for; End. We illustrate our fragment allocation method in the following example, in which we propose the fragments and their number of frequencies of retrieval and update requested from each cluster and its res pective sites (table 4), the costs of space, retrieval, and update (table 5), and the following number of bytes which required for the computation of the update and retrieval ratios according to their use in the DDBs: 2 bytes in each unit of retrieval, 3 bytes in each unit of update, and 5 bytes in each unit of communication. Table 4: Fragments and their frequencies of retrievals and updates in the clusters and their respective sites Fragment Cluster Site Retrieval Update # # # Frequency Frequency F1 C1 S1 80 10 S2 60 26 C2 S3 60 16 S4 0 0 C3 S5 35 5 S6 25 5 F2 C2 S3 20 4 S4 20 6 C3 S5 5 30 S6 105 20 F3 C1 S1 0 20 S2 0 10 C2 S3 30 0 S4 0 0 C3 S5 40 30 S6 30 10

The Sixth Annual U.A.E. University Research Conference

UGRU - 9

University General Requirements Unit Fragment # F4

Cluster # C1 C2

F5

C1 C2 C3

F6

C1 C3

F7

C2 C3

F8

C1 C2 C4

Cluster # C1 C2 C3

Site # S1 S2 S3 S4 S1 S2 S3 S4 S5 S6 S1 S2 S5 S6 S3 S4 S5 S6 S1 S2 S3 S4 S5 S6

Retrieval Frequency 10 10 65 5 70 6 20 20 35 45 0 0 25 5 25 35 10 30 10 80 20 60 0 20

Update Frequency 20 20 12 12 20 10 10 10 10 20 10 0 5 5 5 10 0 0 20 20 0 10 20 0

Table 5: Cost of space, retrieval, and update Site Cost of Cost of Cost of # space Retrieval Update S1 0.004 0.15 0.25 S2 0.006 0.25 0.35 S3 0.005 0.15 0.25 S4 0.007 0.17 0.27 S5 0.003 0.13 0.23 S6 0.005 0.15 0.25

After applying the formulas described in 4.1, 4.2 and 4.3 on the given data, we determine the allocated and cancelled fragments in all clusters. Table 6 describes the allocated and cancelled fragments in all clusters.

Fragment # F1

F2 F3

F4 F5

UGRU - 10

Table 6: Allocated and cancelled fragments in all clusters Cluster Cost of Cost of not Decision # allocation allocation Value

Allocation Status

C1 C2 C3 C2 C3 C1 C2 C3 C1

59.45 74.83 85.5 74.26 30.01 60.32 103.23 54.72 47.13

177.24 74.76 74.16 49.84 135.96 0 37.38 86.52 25.32

1 0 0 0 1 0 0 1 0

Allocated Cancelled Cancelled Cancelled Allocated Cancelled Cancelled Allocated Cancelled

C2 C1 C2 C3

68.73 86.56 92.66 86.80

87.22 96.21 49.84 98.88

1 1 0 1

Allocated Allocated Cancelled Allocated

The Sixth Annual U.A.E. University Research Conference

University General Requirements Unit Fragment # F6 F7 F8

Cluster # C1 C3 C2 C3 C1 C2 C3

Cost of allocation 15.46 18.31 7.41 34.71 59.22 95.63 80.12

Cost of not allocation 0 37.08 74.76 37.08 113.94 99.68 24.72

Decision Value 0 1 1 1 1 1 0

Allocation Status Cancelled Allocated Allocated Allocated Allocated Allocated Cancelled

Figure 3 shows the distribution of the fragments over the clusters.

Fig. 3: Fragment allocation at the clusters

5. PERFORMANCE EVALUATION Grouping sites into clusters minimizes the communication costs bet ween the sites and improves the system performance. The average communication cost between clusters and sites, and the average number of retrievals and updates are considered in the computations of our fragment allocation method because the processing time needed for average computations is less than the processing time when other techniques are used which depend on sorting sites according to specific fields. The system performance is enhanced by removing (cancel) the redundant fragments from the database clusters and by increasing availability and reliability where multiple copies of the same fragment are allocated, this will reduce the communication costs where the fragments are needed frequently. Table 7 shows the performance of allocating fragments to the DDBs clusters before and after applying our method. Cluster # C1 C2 C3

Table 7: Performance evaluation of fragment allocation Initial # of alloc. frag. Final # of alloc. frag. Improvement % 6 3 50 % 7 3 57.14 % 7 5 28.57 %

Before applying our clustering method, allocating fragments to all clusters having applications requesting those fragments generates 20 allocations, while 11 allocations are generated after applying our clustering algorithm, which improves the system performance by 45.00 %. Figure 4 shows the improvement of the system performance achieved by our clustering and allocating methods on clusters.

The Sixth Annual U.A.E. University Research Conference

UGRU - 11

University General Requirements Unit

Number of Fragments

Fragment Allocation to Clusters 9 7 5 3 1 1

2 Cluster Number Final Frag. Alloc.

3

Initial Frag. Alloc.

Fig. 4: Fragment allocation to the clusters

CONCLUSION Our method is designed to meet the requirements of clustering sites and determining fragment allocation in distributed database system, minimizing the communication cost between sites, and enhancing the performance in a heterogeneous network environment system. Clustering method is developed to group the sites into clusters, which helps in reducing the communication costs between the sites during allocation process. Fragment allocation method is developed to enhance system performance by increasing availability and reliability where multiple copies of the same fragments are allocated. This approach can be implemented in different network environments even if the input parameters are very large.

REFERENCES [1] Ezeife, C.I. and Ken Barker. “Distributed Object Based Design: Vertical Fragmentation of Classes”. International Journal of Distributed and Parallel Databases, Vol. 6, No. 4, Kluwer Academic Publishers, October 1998. pp. 327-360. [2] Kamalaakar Karlapalem, Shamkant Navathe, and Magdi Morsi “Issues in Distribution Design of Object Oriented Databases”. Distribut ed Object Management. Morgan Kaufmann Publishers 1994. [3] Jin Hyun Son, and Myoung Ho Kim.: An Adaptable Vertical Partitioning Method in Distributed Systems. The Journal of Systems and Software. Elsevier, Dec. 2003. [4] Wai Wai Gen Yee and Michael J. Donahoo and Shamkant B. Navathe, "A Framework for Server Data Fragment Grouping to Improve Server Scalability in Intermittently Synchronized Databases", CIKM 2000, November, 2000. [5] Chun-Hung Cheng, Wing-Kin Lee, Kam-Fai Wong, A GeneticAlgorithm-Based Clustering Approach for Database Partitioning. IEEE Transactions On Systems, Man, And Cybernetics-Part C: Applications and Reviews, 2002, August Vol. 32 No. 3. [6] Tamhankar, AM & Ram S, Database Fragmentation and Allocation: An Integrated Methodology and Case Study. IEEE Transactions on Systems, Man. and Cybernetics-Part A. Systems and Humans. 1998 Vol. 28. No 3. May PP. 288 – 305. [7] Daudpota,NH, Five steps to construct a model of data allocation for distributed database systems. Journal of Intelligent Information Systems: Integrating Artificial Intelligence and Database Technologies. 1998 vol.11, no.2; Sept.-Oct. p.153-68. [8] H.Lee,Y.-K.Park, G.Jang, S.-Y.Huh, Designing a distributed database on a local area network: A methodology and decision support system. Information and Software Technology. 2000, 42 P. 171184. [9] Ladjel Bellatreche, Kamalakar Karlapalem, and Qing Li.: Complex Methods and Class Allocation in Distributed OODBs. Proceedings of the 5th International Conference on Object Oriented Information Systems. Paris, Sept. 1998. pp 239 – 256. [10] Peddemors,AJH & Hertzberger LO, A high performance distributed database system for enhanced Internet services. Future-Generation-Computer-Systems. 1999 vol.15, no.3; April p.407-15. [11] Yin-Fu Huang, Jyh-Her Chen, Fragment Allocation in Distributed Database Design. Journal of Information Science and Engineering. 2001, 17 P. 491-506. [12] Ismail O. Hababeh, Bowring, N. & Ramachandran, M, An Integrated Strategy for Data Fragmentation and Allocation in A Distributed Database Design. Proceedings of the International Conference on Information Technology and Natural Science (ICITNS). Amman Jordan, October. 2003. pp. 268 - 274. [13] Ismail O. Hababeh, Bowring, N. & Ramachandran, M, A Method for Fragment Allocat ion in Distributed Object Oriented Database Systems. Proceedings of the 5th Annual PostGraduate Symposium on The Convergence of Telecommunications, Networking & Broadcasting (PGNet). Liverpool UK, June. 2004. pp. 54 - 59.

UGRU - 12

The Sixth Annual U.A.E. University Research Conference