An algorithm for replication in distributed databases Adrian Runceanu
Marian Popescu
University Constantin Brâncuşi of Târgu-Jiu, Romania
[email protected] [email protected] Abstract. Distributed databases are an optimal solution to manage important amounts of data. We present an algorithm for replication in distributed databases, in which we improve queries. This algorithm can move information between databases, replicating pieces of data through databases. 1. DISTRIBUTED DATABASES 1.1 DISTRIBUTED SYSTEM
We consider a distributed system composed of a set of database sites S which are fully connected. The sites communicate through message passing. The system is asynchronous in that there is no bound on process relative speeds, clock drifts, or communication delays.Sites can only fail by crashing and we do not rely on site recovery for correctness. However, we assume that when a site recovers, it does so with the state that it had before the failure. Furthermore, we assume that our asynchronous model is augmented with a failure detector Oracle so that, consensus is solvable [8]. 1.2 DISTRIBUTED DATABASE
We consider a distributed relational database as a relational database whose relations are distributed, i.e. fragmented, among the set S of database sites. This distributed database is given by DDB DB S. The relations of the database can be fragmented horizontally, using a selection operation from the relational algebra, or vertically, using a projection operation. To avoid semantic changes to the relational database as a consequence of the fragmentation process, the following properties must be enforced [7], where the function frags(R) gives the fragments of a relation R: Completeness. The fragmentation cannot generate any loss of information: R = Ri, Rifrags(R). Reconstruction. It must be possible using a relational algebra operation to rebuild the original relation R as follows: R = Ri, Rifrags(R). This operation is a union in case of horizontal fragmentation and a join in case of vertical fragmentation. Disjointness. If a relation R is horizontally fragmented, then every two distinct fragments cannot have a single common tuple, i.e. Ri, Rj frags(R), Ri Rj = where i j.
If a relation R is vertically fragmented, then every two fragments must have the same keys, i.e Ri, Rj frags(R), Ri Rj = {set of primary key attributes of R}, where ij. An important assumption we make is that for every fragment of a relation there is a correct site that replicates it. 2. DATA FRAGMENTATION
We propose one software application which integrates different management tools, communication, evaluation, monitoring etc. together on a common platform. The aim is to provide technological support to teachers and students for optimizing the phases of the teaching-learning process for e-learning/e-work studying. Based on the above requirements, it is necessary to design an application that can improve performance of data access. As we know, every database is built to provide users high availability of data, however, a major cost in executing queries in a distributed database is the data transfer cost incurred in transferring multiple database objects (fragments). 2.1 FRAGMENTATION
The objective of fragmentation is to determine fragments to locate at different sites so as to minimize the total data transfer cost incurred in executing a set of queries. Three kinds of fragmentation can be applied to a relation in a distributed database: Vertical Fragmentation, Horizontal Fragmentation and Mixed Fragmentation. 2.1.1 VERTICAL FRAGMENTATION
Vertical fragments are created by dividing a global relation R on its attributes by applying the project operator: Rj = Π {Aj},key R, where 1 ≤ j ≤m
(1)
where {Aj} is a set of attributes not in the primary key, upon which the vertical fragment is defined and m is the maximum number of fragments. A vertical fragmentation schema is complete when every attribute in the original global relation can be found in some vertical fragment defined on that
2.2. REPLICATION
relation. Then the reconstruction rule is satisfied by a join on the primary key(s): ∀ R ∈ {R , R , …, R } : R = ►◄key R j
1
2
m
j
(2)
The disjoint ness rule does not apply in a strict sense to vertical fragmentation as the reconstruction rule can only be satisfied when the primary key is included in each fragment. So excluding the primary key, no data item should occur in more than one vertical fragment [1, 6]. 2.1.2 HORIZONTAL FRAGMENTATION
Horizontal fragmentation divides a global relation R on its tuples by use of the selection operator: Rj = σPj (R), where 1 ≤ j ≤m
(3)
where Pj is the selection condition as a simple predicate and m is the maximum number of fragments. The horizontal fragmentation schema satisfies the completeness rule if the selection predicates are complete. Furthermore, if a horizontal fragmentation schema is complete, the reconstruction rule is satisfied by a union operation over all the fragments: ∀Rj ∈ {R1, R2, …, Rm} : R = ∪ Rj
(4)
Finally, disjointness is ensured when the selection predicates defining the fragments are mutually exclusive [1, 6]. Derived horizontal fragmentation occurs when a member relation inherits the horizontal fragmentation of its owner. If the completeness and disjointness rules are satisfied for the owner fragments, they are intrinsically satisfied for the child fragments. The global relation can be reconstructed by the application of the union operator, as for primary horizontal fragmentation [9]. In this paper [10] is presented a general approach of the fragmentation issue of the dates from a distributed database. Using this tool for obtained the best partitioning scheme with implementation of several classic algorithms is one solution in designing phase of a distributed database. 2.1.3 MIXED FRAGMENTATION
Assuming that the database is fragmented, we have to decide will be allocated on different fragments of the network stations. When data are allocated, they are either replicated or kept in a single copy. Reliability and efficiency reasons replication of read-only queries. If there are multiple children, even if the system is falling, some children are likely to be accessible, so no data is lost. In addition, read-only queries that access the same items can be executed in parallel in parallel because there are children on several stations. On the other hand, the execution of queries that are updates causing problems because the system must ensure that all copies are updated correctly. Therefore, the decision on replication is a compromise that depends on the ratio of read-only queries and queries performing updates. This decision affects almost all control functions and almost all algorithms DDBMS. Creating multiple copies of the same information is justified in economic terms only if it is satisfied inequation: p
p
i 1
i 1
CMC CALi CA CADi (5) where the parameters involved have the following meaning: CMC - the cost of memory copy CALi - local access to the user cost CA - the cost of updating the copy CADi - the cost of remote access to the primary copy to the user p - number of users Multiple copies is the responsibility of management's DDBMS. Updates can be performed either simultaneously or copy is made on primary and secondary copies updating is done with a certain delay. 3. OUR SOLUTION
We developed an application that uses horizontal fragmentation with partial replication. The figure 1 shows the structure used for replication in the system developed. Obviously, this solution is punctual, that means that is built for a special database schema. However, the algorithm used for distribution and also to keep consistency can be applied to others applications.
An hybrid fragmentation schema is a combination of horizontal and vertical fragments. If the correctness and disjointness rules are satisfied for the comprising fragments, they are implicitly satisfied for the entire hybrid schema. Reconstruction is achieved by applying the reconstruction operators in reverse order of fragment definition. That is, if a global relation underwent horizontal followed by vertical fragmentation, the reconstruction would consist of a join followed by a union [10, 9]. Fig. 1 Structure of replication schema
In this algorithm (MS – Master/Slave) we verifies each query and keeps a counter to know the place and frequency of user’s access, if the system detects an access pattern, then it copies a set of records into a slave database. This approach allows improve the performance of queries around database. Although this solution seems very simple, works fine with the requirements. It is necessary that every computer having a slave database executes the next algorithm: Algorithm MS Input: a number of local database computers Output: replications records at each local database Begin Step 1: For each query requested, a slave computer increments a counter for the user that have made the request. Step 2: If counter reaches a constant number (parameter of this algorithm), then this computer is a candidate to have a set of records replicated and need to follow steps 3 and 4, else goto end. Step 3: Request the set of records that the user is asking for and save this information into the slave database. Step 4: Reset the user local counter to zero. End.
Fig. 3 Sending query from slave database to master database In this case, all the information can be founded at master database, but the system can detect when a user has an access pattern and it copies the information needed to a slave computer (partial replication), so that the user have their information nearest, diminishing the search time. For the application developed, we used communication with sockets using client-server model (fig. 5). We designed a basic database schema that is used to manage all the information about users this application, this schema is shown at figure 4.
In order to keep consistency into the distributed database it is necessary that both master and slave computers have the same information, but in this case, since the user will access their own information from the slave computer, it is not necessary to copy the information immediately, but later. This schema of information management allows database availability even when the connection between slave and master database is broken. Fig. 4 Database schema used for the application The schema shown at figure 4 is only a subschema of reality is. This because the real schema has around 45 tables and thousands of fields. All tests was made not only a local network.
Fig. 2 Query on local database An other algorithm (Slave/Master Search) has been designed to provide access to local database before sending the query to the master computer, this algorithm tests for the information into the slave database (fig. 2), if such records are founded, the information is given to the user, otherwise the query is sent to the master database as shown in fig. 3. Fig. 5. Communication with sockets using TCP/IP protocol.
4. CONCLUSIONS
We have presented an algorithm for a punctual solution of our application. This algorithm can move information between databases, replicating pieces of data through slaves databases. The algorithm is based on two techniques: Slave/Master Search, to provide fast access to database on user queries and Replication Slave/Master (two computers) in order to get availability. We tested the algorithm in a database laboratory with 40 computers, and it seems to be a solution for the problem established by the enterprise, since the results obtained in a local computer network have a good behavior. . REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
L. Bellatreche, K. Karlapalem, and A. Simonet, “Horizontal class partitioning in object-oriented databases,” in proceedings of the 8th International Conference on Database and Expert Systems Applications (DEXA’97), September 1997, pp. 58–67, Lecture Notes in Computer Science 1308. S. Ceri, M. Negri, and G. Pelagatti, “Horizontal data partitioning in database design,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1982, pp. 128–136. Chakravarthy S., Muthuraj R., Varadarajan R., and Navathe S. (1994) An objective function for vertically partitioning relations in distributed databases and its analysis. In Distributed and parallel databases, pages 183-207, Kluwer Academic Publishers. C.I. Ezeife and K. Barker, “A comprehensive approach to horizontal class fragmentation in distributed object based system” International Journal of Distributed and Parallel Databases, vol. 3, no. 3, pp. 247–272, 1995. S.B. Navathe, K. Karlapalem, and M. Ra, “Amixed partitioning methodology for distributed database design,” Journal of Computer and Software Engineering, vol. 3, no. 4, pp. 395–426, 1995. S.B. Navathe and M. Ra, “Vertical partitioning for database design: A graphical algorithm,” in Proceeding of the International Conference on Management of Data, ACM-SIGMOD’89, 1989, pp. 440– 450. Tamer O. and Valduriez P. (1999) Principles of Distributed Database Systems. Prentice Hall Englewood Cliffs, Second Edition, New Jersey 07362. Tushar D. C. and Toueg S. “Unreliable Failure Detectors for Reliable Distributed Systems”. Journal of the ACM, 1996. Runceanu A. “Fragmentation in distributed
[10]
Databases”, (2008), International Joint Conferences on Computer, Information, and Systems Sciences, and Engineering (CISSE 2007) Conference, Conference Proceedings book, December 3-12, 2007, University of Brigeport, USA, publish in Innovations and Advanced Techniques in Systems, Computing Sciences and Software Engineering, Springer Science, ISBN 978-1-4020-8734-9 (Print) 978-1-4020-8735-6 (Online), DOI 10.1007/978-1-4020-8735-6_12. Runceanu A., Popescu M., “EvalTool for Evaluate the Partitioning Scheme of Distributed Databases”, International Joint Conferences on Computer, Information, and Systems Sciences, and Engineering (CISSE 2009) Conference, Conference Proceedings book, December 4-12, 2009, University of Brigeport, USA, Volume 1 - Technological Developments in Networking, Education and Automation (Elleithy, Sobh, Iskander, Karim, Mahmood), SCSS 09. Published in Volume 1 – CISSE 2009 Proceedings: Technological Developments in Networking, Education and Automation, ISBN: 978-90481-9150-5