Materialization of Redesigned Distributed Relational Databases1 Kamalakar Karlapalem
Hong Kong University of Science and Technology Department of Computer Science Clearwater Bay, Kowloon, Hong Kong e-mail:
[email protected] Phone: (852) 358 6991 Fax: (852) 358 1477
Shamkant B. Navathe
Georgia Institute of Technology College Of Computing Atlanta GA 30332-0280, USA e-mail:
[email protected] Phone: 404 853 0357 Fax: 404 894 9442 Technical Report HKUST-CS94-14, September 1994. 1
This research has been funded by RGC CERG grant No. HKUST 609/94E and the NSF Grant No. IRI 8716798.
1
Contents
1 Introduction
1
2 Overview of the mixed fragmentation methodology 3 Representation Scheme for Mixed Fragments
3 6
1.1 Background concepts : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.2 Distributed database redesign : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.3 Two approaches to materialize a distributed database design : : : : : : : : : : : : :
3.1 Characteristics of and valid operations on grid cells : : : : : : : : : : : : : : : : : : :
4 Algorithms for Materialization
4.1 Materialization of a redesigned distributed relational database by automatic generation of SQL statements : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4.1.1 Problems with Query Generator approach : : : : : : : : : : : : : : : : : : : : 4.2 Materialization using operations on fragments : : : : : : : : : : : : : : : : : : : : : : 4.2.1 Intersection of fragmentation schemes : : : : : : : : : : : : : : : : : : : : : : 4.2.2 Algorithms for the operator method : : : : : : : : : : : : : : : : : : : : : : : 4.2.3 Example illustrating the operator method : : : : : : : : : : : : : : : : : : : :
1 2 2
7
10 11 15 15 16 18 22
5 Incorporating Change in Grid Structure 6 Cost Model 7 Materialization by using Multiple Query Optimization
23 25 27
8 Conclusions 1 Break, Form and Cluster Operations 2 MQO Example
32 36 37
7.1 MQO Algorithm to Materialize Distributed Database Design : : : : : : : : : : : : : 28 7.2 Performance Improvement : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 32
2
Abstract
The changes in the distributed database environment during its lifetime necessitate a redesign of the distributed databases so as to keep the performance of the applications/transactions from degrading. Till now most of the work in distributed database design dealt with developing better fragmentation and allocation algorithms. This paper introduces an hitherto unexplored step in distributed database design dealing with the materialization of a new design after the redesign for an existing populated distributed relational database. We de ne materialization of a distributed database design as a process of global restructuring of the populated local databases in order to achieve conformance with the logical de nition of the distributed database. Work on distributed database design dealing with mixed fragmentation facilitates the development of a formal basis for materializing distributed database designs. We present two approaches (Query Generator method and Operator method) to achieve the solution. We develop a cost model for evaluating these two approaches and present a multiple query optimization algorithm to improve the eciency of the query generator approach. These algorithms need to be initiated by the distributed database administrator to automatically materialize redesigned distributed databases. The operator method is suitable for incremental materialization whereas the query generator method is suitable for one-time materialization of the distributed databases.
Keywords: distributed realational database systems, distributed database design, redesign, materialization, mixed fragmentation, cost model, multiple query optimization
3
1 Introduction Distributed relational databases are becoming a reality now; a number of distributed relational database systems are coming into the market. The \client-server" architecture is becoming very popular and will allow us to realize distribution of data with a lot of exibility. With large distributed databases containing hundreds of tables with each table being fragmented into a number of fragments allocated at dierent sites, it is imperative that in order to eciently support the applications, we need to have not only a good initial distributed database design but also a good methodology for redesign, and an ecient way for materializing the new designs. In this paper we address the problem of materializing a new distributed database design from the current design for a populated distributed database. The problem is complicated because materialization of new design consists of global restructuring of populated local databases so as to conform to the new fragmentation and allocation scheme. This procedure for materialization must not only provide the mechanisms to eciently materialize the new design, but also be proved to be correct. The \correctness" implies that the materialization of new design should not result in loss of data or generation of spurious data. It is for this reason that the redesign problem for distributed databases was considered very dicult. We develop a representational notation for the distributed database design (generated by using mixed fragmentation scheme proposed in [8]) that enables us to develop algorithmic solutions to this problem. We assume that the current and new distributed database designs are completely speci ed; our objective is to develop algorithms to materialize a new design. In order to put the materialization problem in proper perspective, we now present a synopsis of relevant distributed database design work.
1.1 Background concepts
In any distributed database system there are two main factors that aect the performance of the applications2 , namely, number of disk accesses and data transfer cost [5]. For a given distributed relational database environment 3(DDE) the performance of the applications is dependent on the design of the distributed database. There are two ways by which performance of the applications can be improved by the design. Fragmentation is the process of clustering data from a relation by grouping relevant columns and tuples accessed by an application into a fragment. This reduces the amount of irrelevant data accessed by the application thus reducing the number of disk accesses. The result of the fragmentation process is a set of fragments de ned by a fragmentation scheme. A fragment can either be vertical [7, 9, 6, and others], horizontal [4, 3, and others], or mixed [1, 8]. Allocation is the process of placing the fragments generated by the fragmentation scheme at the sites of the distributed database system so as to minimize the data transfer cost and the number of messages to process a given set of applications [1, 2, and others]. The result of an allocation process is an allocation scheme. The fragmentation and allocation schemes de ne the distributed database design (henceforth referred to as design). We shall assume that the mixed fragmentation methodology (see Section 2) is used to generate the designs for a distributed relational database. An application consists of a set of transactions accessing the distributed database system. A distributed database environment refers to the distributed relational database system, distributed databases, transactions, applications, network, and computers at various nodes. 2 3
1
1.2 Distributed database redesign
In a DDE over a period of time, there may be a degradation in the performance of the applications accessing the distributed database. This is due to the changes in the DDE classi ed as follows: Physical Changes: The changes pertaining to the physical systems of the DDE are classi ed as the physical changes. Types of physical changes include: change in network topology, new nodes being added, changes in \processing speed" of a processor, placing a more ecient disk pack at a node, or changes in database system parameters. Logical Changes: The changes in the DDE that are not physical in nature are termed as the logical changes. Logical changes include: migration of applications/ transactions from one node to another, change in the frequency of execution of an application, addition or deletion of applications being executed on the distributed database. These changes necessitate redesign of the distributed databases in order to keep the performance of the applications/transactions being supported by the distributed database system from degrading. These changes in the DDE need to be taken into consideration to generate new fragmentation and allocation schemes. The redesign problem has been classi ed into limited redesign and total redesign in [14]. Limited redesign is the case where the allocation scheme changes but not the fragmentation scheme. Total redesign on the other hand considers the case where both the fragmentation and the allocation schemes change. There are three scenarios which require the redesign of the distributed database. The rst scenario is that of corrective redesign, the case where there is a deterioration in the application's performance because of the changes in the DDE that had already taken place. The second scenario is that of preventive redesign, the case where the distributed database administrators know beforehand the changes that will take place in the DDE and initiate a redesign to change the current design so as to keep processing applications eciently. The third scenario is that of adaptive redesign. That is, monitor some parameters in the DDE and initiate the redesign process whenever there is a relevant change in the parameter value. Any redesign of a distributed database results in a change of fragmentation scheme and/or allocation scheme. This requires that the new design be materialized.
1.3 Two approaches to materialize a distributed database design
The distributed database design (F; A) consists of a fragmentation (F ) and an allocation (A) scheme. The total redesign generates a new fragmentation and allocation scheme from the current design represented by (F 0 ; A0). The objective of this paper is to develop a methodology and a set of algorithms to materialize (F 0 ; A0) from (F; A) for a populated distributed database. Two approaches may be considered:
Query Generator In this approach each of the fragments in the new fragmentation scheme is
expressed as a database operation on the distributed database so as to let the distributed database system materialize the new design. This approach depends on the ability to generate correct database operations (like queries, insert, create, drop) for materializing each of the fragments. The distributed database system undertakes the task of materialization of the new design. The module which generates the above mentioned database operations is known as the query generator. Developing a query generator and proving its correctness is a complicated problem. We shall present an algorithm for query generation in the Section 4.1. 2
Operator Method This method was rst proposed for limited redesign by Wilson and Navathe
[14]. The operator method consists of de ning a set of primitive operations which are performed on the fragments. These operations are \split", \move", \replicate", \remove" and \merge". The operation split is used to split a fragment into two sub-fragments. The operations move, replicate and remove are used to move a fragment from one site to another site, replicate a given fragment at another site, and delete a fragment at some site respectively. The operation merge is used to form a single fragment by merging two fragments. The operator method generates a program with above mentioned operations which when executed will materialize the new design. The complexity of these algorithms depends on the complexity of the underlying split, merge and move operations. The module supporting these operations is external to the distributed database system. The motivation for this approach is that the query generator method depends on the processing capability of the distributed database system and it may become unwieldy and out of control to give optimal performance for a database with large relations with large number of fragments. Also the eciency of the query generator method depends on the eciency of the distributed database system and is typically beyond the control of the distributed database administrator. We shall present the algorithms for the operator method in Section 4.2.
The main contributions of this paper are in: 1. developing a representation scheme for the fragmentation and allocation scheme based on the mixed fragmentation approach to the distributed database design process. 2. presenting two approaches to materialize the new design, namely: Query Generator method and Operator method. 3. developing a set of algorithms for both the methods and proving their correctness. 4. developing a cost model for materializing populated redesigned distributed relational databases using these two approaches. 5. using a multiple query optimization technique to enhance the eciency of the query generator approach. In Section 2 we present an overview of the mixed fragmentation and allocation methodology. Section 3 describes the representation scheme for the distributed database design, Section 4 presents the materialization algorithms, proves their correctness and presents an example, Section 5 extends the materialization methodology to take into consideration the change in grid structure. In Section 6 a cost model is developed to evaluate the cost of redesign. Section 7 presents a multiple query optimization algorithm to enhance the eciency of the query generator approach, and evaluates the performance improvement due to multiple query optimization technique. Section 8 presents the conclusions and future work.
2 Overview of the mixed fragmentation methodology
Grid creation consists of simultaneously applying the vertical fragmentation and horizontal frag-
mentation algorithms on the relation to generate the grid consisting of minimal fragments or cells. Each of the relations of the distributed database is decomposed into a set of grid cells. A grid cell 3
de nes a projection of a set of certain subset of attributes corresponding to exactly one vertical fragment V and all the tuples of the grid satisfy a set of predicates corresponding to exactly one horizontal fragment H . A vertical fragmentation scheme and a horizontal fragmentation scheme de nes the grid structure for a relation. Grid optimization refers to the merging of the grid cells to form mixed fragments while minimizing the overall transaction processing cost. Having generated the cells in a top-down fashion as the minimal fragments of a relation, we now consider whether we should combine them in a bottom-up fashion. Merging is considered desirable if it reduces the overall transaction processing cost. We term the result fragments after the grid optimization process as mixed fragments. Parameters Logical Parameters Relational Schema Application Processing Characteristics Transaction Parameters Columns accessed Rows Accessed Frequency Sites of Origin Database System Parameters Cardinality of Relations Column Sizes Indexes Used Query Processing Strategy System Parameters Network Topology Baud Rates Node Parameters (MIPS, Disk I/O time)
Fragmentation Algorithms Horizontal
Vertical Vert_Frag()
Horz_Frag()
Grid Cells
Grid_Opt()
Allocation Algorithm
Grid Optimization
Mixed Fragments (Fragmentation Scheme) Alloc()
Allocation Scheme Distributed Database Design
Figure 1: Distributed Database Design Methodology Figure 1 shows the mixed fragmentation approach [8] to the distributed database design process. In this approach the fragmentation algorithms are executed to generate a set of mixed fragments, then these mixed fragments are allocated using the allocation algorithm. Mixed fragments are generated by (i) vertically fragmenting a relation using algorithm Vert Frag() presented in [9], (ii) horizontally fragmenting a relation using algorithm Horz Frag() presented in [8], and (iii) forming grid cells by simultaneously applying both the horizontal and vertical fragmentation algorithms, and then merging them by using algorithms Grid Opt() presented in [8]. These mixed fragments are allocated to the sites using the algorithm Alloc() based on [2]. There are four dierent kinds of parameters that are used by the fragmentation and allocation algorithms. They are logical parameters (relational schema and application processing characteristics), transaction parameters (columns accessed, rows accessed, frequency, and sites of origin), database system parameters (cardinality of relations, column sizes, indexes used, and query processing strategy), and DDE parameters (network topology, baud rates, and node parameters ). Figure 2 illustrates the Employee relation with attributes (Emp#, Name, City, Dept#, Proj#, Salary, Bonus, School, Degree). The attribute Emp# is the key of the relation Employee. Let each vertical fragment be de ned by a projection of each of the following subsets of attributes on relation Employee: 1. fEmp#, Name , Cityg, 4
EMPLOYEE Emp#
Name City
Dept#
Proj#
Salary
Bonus
School
Degree
123
20K
1K
GaTech
BS
124
50K
2K
UnivFL
MS
1
Ian
Atl
10
2
Jim
SF
11
3
Tom
LA
2
231
30K
1K
Duke
BS
4
Ann
NY
3
231
65K
5K
CalTech
MS
5
Kate
NO
21
123
33K
2K
UCB
BS
6
John
LDN
22
145
110K
8K
GaTech
Ph.D.
7
Ram
BOM
13
165
68K
6K
IIT
M.Tech
8
Mary STL
33
213
45K
3K
NYU
BS
9
Vijay DEL
33
213
40K
2K
GaTech
BS
Figure 2: Relation Employee
2. fEmp#, Dept#, Proj#g, 3. f Emp#, Salary, Bonusg, 4. fEmp#, School, Degreeg. and each horizontal fragment by a selection based on each of the following predicates on relation Employee respectively:
a. b. c. d.
f Dept# < 10g, f 10 Dept# < 20g, f 20 Dept# < 30g, f Dept# 30 g.
Figure 3 illustrates the 16 grid cells generated by simultaneously applying the vertical fragmentation ( i.e. four vertical fragments 1, 2, 3 and 4) and horizontal fragmentation ( i.e. four horizontal fragments a, b, c and d). Figure 4 shows the result of the grid optimization phase on the grid cells in Figure 2. This results in four mixed fragments DEPT PROJ , SAL SCHOOL1, SAL SCHOOL2 and SAL SCHOOL3. The details of the mixed fragmentation methodology are given in [8]. The allocation of fragments is achieved by using the techniques presented in [2]. In this paper we shall assume that the redesign of a distributed relational database is done by applying the mixed fragmentation approach [8]. We shall use the following characteristics of the mixed fragmentation approach in developing a representation scheme for mixed fragments. This representation scheme is used in developing algorithms to materialize distributed database designs. 5
Emp# Name 3 4
City LA
3
2
Ann NY
4
3
Tom
Emp# Name City 1
Ian
Atl
2
Jim
SF
7
Emp# Dept# Proj#
Ram BOM
Emp# Name City 5
Kate NO
6
John LDN
Emp# Name City
Emp# School Degree
231
3
30k
1k
3
Duke
BS
231
4
65k
5k
4
CalTech
MS
Emp# Dept# Proj#
Emp# Salary Bonus
Emp# School Degree
1
10
123
1
20k
1k
1
2
11
124
2
50k
2k
2
7
13
165
7
68k
6k
7
Emp# Dept# Proj# 5 6
21 22
Emp# Salary Bonus 5
123 145
Emp# Dept# Proj#
8
Mary STL
8
33
213
9
Vijay
9
33
213
DEL
Emp# Salary Bonus
6
Emp#
GaTech
BS
UnivFL MS IIT
M.Tech
School Degree
33k
2k
5
UCB
110k
8k
6
GaTech Ph.D.
Emp# Salary Bonus
Emp# School
8
45k
3k
8
9
40k
2k
9
BS
Degree
NYU BS GaTech BS
Figure 3: Basic Grid Fragments of a Relation Employee
A grid is created consisting of basic grid of fragments (i.e. grid cells), and a fragment is
formed by merging the basic grid fragments. The minimum granularity for allocation of a fragment to a site is the grid cell. A fragment can be split into sub-fragments only along the \grid lines" and the sub-fragments can be merged along the same grid lines to form the fragment.
3 Representation Scheme for Mixed Fragments In order to design and evaluate algorithms to materialize distributed relational databases we use a representational notation for the mixed fragments and their allocation. This representation facilitates a succinct presentation of the algorithms to materialize distributed relational databases. This representation scheme will also be used in elegantly proving the correctness of the algorithms, evaluating the algorithms for their eciency, and making improvements to these algorithms. Let V = (1; 2; : : :; n) denote the set of vertical fragments and H = (a; b; : : :; m) denote the set of horizontal fragments of the relation respectively. A grid is created by applying both the horizontal and vertical fragmentation schemes on the relation. The grid generates a set of grid cells wherein each grid cell belongs to exactly one horizontal and one vertical fragment of the relation. The set of grid cells are represented as G = (1a; 1b; : : :; 1m; 2a; 2b; : : :; 2m; : : : ; na; nb ; : : :; nm). Figure 5 shows the grid cells formed by simultaneously applying both the horizontal and vertical fragmentation schemes on the relation Employee. The set of grid cells are classi ed as horizontal grid cells or vertical grid cells. The set of vertical fragments of a given horizontal fragment form horizontal grid cells. For a horizontal fragment p they are (1p; 2p; : : :; np). In Figure 5, the set of grid cells f1b ; 2b; 3b; 4bg forms the set of horizontal grid cells for the horizontal fragment b. The set of horizontal fragments of a vertical fragment are 6
SAL_SCHOOL1 Emp#
DEP_PROJ Emp# 1
5
Name Ian
City Dept# Atl
10
Proj# 123
2
Jim
SF
11
124
3
Tom
LA
2
231
4
Ann
NY
3
231
7
Ram
BOM
13
165
Kate
NO
21
123
6
John
LDN
22
145
8
Mary
STL
33
213
9
Vijay
DEL
33
213
Salary
Bonus
School
Degree
1
20k
GaTech
BS
2
50k
2k
UnivFL
MS
3
30k
1k
Duke
BS
4
65k
5k
CalTech
MS
68k
6k
IIT
M.Tech
7
1k
SAL_SCHOOL2 Emp# 8 9
Salary
Bonus
45k
3k
NYU
BS
2k
GaTech
BS
40k
School
Degree
SAL_SCHOOL3 Emp#
Salary
Bonus
School
5
33k
2k
UCB
6
110k
8k
GaTech
Degree BS Ph.D.
Figure 4: Current Design (F; A) known as vertical grid cells. The vertical grid cells of a vertical fragment i are represented as (ia; ib; : : :; im). For example, in Figure 5 the set of grid cells f2a; 2b; 2c; 2d g form the vertical grid cells for the vertical fragment 2. Two binary operations concatenate (k), the horizontal merging S operator and union ( ), the vertical merging operator on the set of horizontal grid cells and vertical grid cells respectively are de ned. The concatenate operator is a special case of the join operator where only corresponding tuple identi ers of the relations are matched. All the relations involved in the concatenate operation have the same number of tuples and the same set of tuple identi ers. S Given two vertical grid cells ip and iq , the union of ip and iq is represented as (ip;q ) = ip iq and given two horizontal grid cells ip and jp the concatenation of ip and jp is represented as S (ip; jp) = ip kjp. The binary operations and k are commutative and associative over the set of vertical grid cells and horizontal grid cells respectively.
3.1 Characteristics of and valid operations on grid cells
In general, the grid cells will be represented as where = 1; 2; : : :; n and = a; b; : : :; m. is the \column" index and is the \row" index for a grid cell. A well formed expression over the set of grid cells is de ned as follows:
De nition 1 A well formed expression w is de ned as follows: 1. w = , is well formed where = 1; 2; : : :; n and = a; b; : : :; m. r(w) = is the representation of w.
2. w = 1 1 k2 2 is well formed if 1 6= 2 and 1 = 2. r(w) = (1 1 ; 2 1 ) is its representation. 7
1a
2a
3a
4a
1
2
3
4
1
1
b
c
d
2
2
b
c
d
b
b
3c
4c
3
4
d
d
Figure 5: Representation of grid cells
w = 1 1 S 2 2 is well formed if 1 = 2 and 1 6= 2 . r(w) = (1( 1 ; 2 )) is its representation. 3. w0 = S kw is well formed if there exists a grid cell represented by 0 in r(w). r(w0) = r(w) is its representation. w0 = S w is well formed if there exits a grid cell represented by in r(w). r(w0) = r(w) S is its representation. 4. Any number of possible invocations of the above set of rules.
0
The representation of a well formed expression is a shorthand notation for the set of grid cells over which the well formed expression is de ned. Since r(w) is a set of grid cells, we can perform set theoretic operations on the representations of well formed expressions, like, membership 2 r(w), intersection r1(w) \ r2 (w), and union r1(w) [ r2(w). The above de nition of well formed expression incorporates the discipline that is to be imposed while merging the grid cells during the grid optimization phase of the mixed fragmentation methodology [8]. With reference to Figure 5, the grid cells 1a and 1b canS be merged (using the rule 2 above, since 1 = 2 = 1, and 1 = a, and 2 = b, implies 1(a;b) = 1a 1b ), whereas the grid cells 1a and 2c cannot be merged.S The representation 1(a;b) de nes a set of grid cells f1a; 1b g over which well formed expression 1a 1b is de ned. Note that in forming the well formed expression, two grid cells are concatenated only if both of them constitute horizontal grid cells of the same horizontal fragment, and a union of two grid cells is allowed only if both of them constitute vertical grid cells of the same vertical fragment. Hence the conditions on the grid cells ( ) in the rules 2 and 3 of the de nition above. From the above de nition of the well formed expression, a set of valid operations which need to be performed on the grid cells to generate a set of legal mixed fragments are derived. In general, a well formed expression w is represented as r(w) = (1A1 ; 2A2 ; : : :; pA ) with each i representing a vertical fragment and each Ai representing a set ofS vertical grid cells of vertical fragment i in the union =n f operation. Note that the expression kii=1 2H (i )g generates the relation R. p
8
De nition 2 A regular well formed expression is a well formed expression w = kii==1p fS 2A (i )g where A1 = A2 = : : : = Ap = A. i
S
That is, w = kii==1p f 2A (i )g and r(w) = (1A ; 2A ; : : :; pA ).
The above condition states that the set of vertical grid cells Ai in the formation of the fragment is the same for all vertical fragments i .
De nition 3 A Fragment is de ned by a well formed expression over a set of grid cells. A regular fragment is de ned by a regular well formed expression over the set of grid cells. For example, in Figure 5, the set of grid cells 1a , 1b , 2a and 2b form a regular fragment (1(a;b), 2(a;b)), whereas the fragment (1(a;b), 2b ) formed by merging the grid cells 1a, 1b, 2b is not regular. It should be noted that each regular fragment corresponds to a table in the relational database. If a fragment is not a regular fragment, it will be dicult to materialize it as a single relation without introducing null values. Figure 4 shows the result of the grid optimization phase on the grid cells in Figure 3. This results in four fragments DEPT PROJ , SAL SCHOOL1, SAL SCHOOL2 and SAL SCHOOL3.
( f1,s1)
1a
2a
1b
2b
1c 1d
2c 2d
3a
4a
3b
4b
3c
4c
( f3 ,s1)
3d
4d
( f4,s3)
(f2 ,s2)
Figure 6: Current Fragmentation Scheme (F; A) Table 1: Representation of Current Design (F; A). Fragment Name
f1 f2 f3 f4
Table Name Representation Site DEPT PROJ (1(a;b;c;d); 2(a;b;c;d)) s1 SAL SCHOOL1 (3(a;b); 4(a;b)) s2 SAL SCHOOL2 (3(c); 4(c)) s1 SAL SCHOOL3 (3(d); 4(d)) s3
9
De nition 4 Fragmentation scheme is a set of regular fragments over the set of grid cells of a relation such that each grid cell belongs to at least one fragment. De nition 5 Non-overlapping fragmentation scheme is a fragmentation scheme of a relation such that each of the grid cells belongs to exactly one fragment of the fragmentation scheme. Let S be the set of sites in the distributed database environment. An allocation scheme A is a mapping between the set of fragments in the fragmentation scheme F and the set of sites S . That is, A : F ?! S . It is a many to many mapping. For each fragment fi , let i be the set of sites it is allocated to. If each of the i 's is a singleton set, then the allocation scheme is non-replicated. The design for a distributed database is represented as (F; A) = f(f1 ; 1); (f2; 2); : : :; (fn; n )g. Figure 6 shows the representation of the fragmentation scheme (F; A) (illustrated in Figure 4) with four fragments f1 , f2 , f3 and f4 .
4 Algorithms for Materialization The corrective, preventive, and adaptive redesign of distributed relational databases requires that the new design be materialized. This implies that this materialization be performed automatically and eciently. In this section by employing the representation scheme from the previous section a methodology for materialization of a new design for both the approaches the query generator and the operator method is developed. A query generator algorithm which generates a set of SQL statements (database operations) to materialize the new design from the current design is developed. In case of the operator method, algorithms for splitting, reallocating and merging the fragments of the current design to materialize the new design are presented. Let R be a relation, with V = f1; 2; : : :; ng as the set of vertical fragments, H = fa; b; : : :; xg as the set of horizontal fragments and f1a; 1b ; : : :; 1x; 2a; 2b; : : :; 2x; : : : ; na; nb ; : : :; nx g as the set of grid cells. We assume that the current design (F; A) and the new design (F 0 ; A0) de ned on the set of grid cells generated by vertical fragments V and H for the relation R are given. We shall address the case of when the fragmentation schemes F and F 0 are de ned on dierent vertical and/or horizontal fragments (i.e. change in grid structure) in Section 5. The algorithms presented in this section will work for overlapping/non-overlapping fragmentation schemes and replicated/non-replicated allocation schemes. In order to illustrate the working of these algorithms we shall consider another design for the relation Employee shown in Figure 2. This new design (F 0 ; A0) is shown in Figure 7 and the representation of its fragments is shown in Figure 8. Table 2: Representation of New Design (F 0 ; A0). Fragment Name Table Name Representation Site f1 EDP (1(a;b); 2(a;b)) s1 f2 EDPS (1(c;d) ; 2(c;d); 3(c;d)) s3 f3 ESB (3(a;b)) s3 f4 ESD1 (4(a;b;c)) s2 f5 ESD2 (4(d)) s4 0 0 0 0 0
10
ESD1 EDP
ESB
Emp# Name City Dept#
Proj#
1
Ian
Atl
10
123
2
Jim
SF
11
124
3
Tom
LA
2
4
Ann
NY
7
Ram
Emp# Salary
Bonus
1
20k
1k
2
50k
2k
231
3
30k
1k
3
231
4
65k
5k
BOM 13
165
68k
6k
7
City Dept#
Proj#
5
Kate NO
21
123
6
John
LDN
22
145
8
Mary
STL
33
213
9
Vijay DEL
33
213
School
1
GaTech
BS
2
UnivFL
MS
3
Duke
BS
4
Salary
Degree
CalTech
7
IIT
5
UCB
6
EDPS Emp# Name
Emp#
MS
M.Tech BS
GaTech Ph.D.
Bonus
33k
2k
110k
8k
45k
3k
40k
2k
ESD2 Emp# School
Degree
8
NYU
9
GaTech BS
BS
Figure 7: New Fragmentation Scheme (F 0 ; A0) Before presenting the algorithms we relate the grid cells, mixed fragments, and tables in the local relational databases. This will be used to come up with the operations to materialize distributed relational databases. Each grid cell belongs to exactly one horizontal and one vertical fragment and hence each grid cell is bounded by columns in the vertical fragment and the predicate de ning the horizontal fragment it belongs to. A regular fragment has been de ned as the result of a regular well formed expression over the set of grid cells. The attributes of a regular fragment are the columns (given by the union of the columns of all vertical fragments in the well formed expression) of the fragment and the binding predicate (given by the disjunction of all the predicates de ning the horizontal fragments in the well formed expression) of the fragment. All the tuples of a regular fragment satisfy the binding predicate. The system catalog of the distributed database system needs to store and manage the meta data describing these attributes and/or binding predicates of a regular fragment. The system catalog also stores and maintains the representation scheme along with the fragment names. In order to describe general purpose algorithms to materialize redesigned distributed database the fragment names (like f1 ; f2; : : :; fn ) instead of speci c table names (like EDP, EDPS, ESB, etc.) will be used, but they need to be replaced by the actual table names by using the mapping information from the system catalog at the run-time during materialization.
4.1 Materialization of a redesigned distributed relational database by automatic generation of SQL statements A grid cell is represented by , where is the vertical fragment and is the horizontal fragment the grid cell belongs to. Let c () be the set of columns spanning the vertical fragment , and p ( ) be the binding predicate of the horizontal fragment . Then the SQL query to de ne the grid 11
(f’1, s1)
1a
2a
3a
4a
1b
2b
3b
4b
(f’3 ,s3)
4c
1c
2c
3c
1d 2d
3d
4d
( f’ ,s ) 4
2
(f’5 ,s4)
( f’2 ,s3) Figure 8: Representation of Fragmentation Scheme (F 0; A0) cell is
SELECT c () FROM R WHERE p ( ).
: : : (1)
The concatenation of two horizontal grid cells is represented by ( ; ). The SQL statement to de ne the above well formed expression is SELECT c () S c ( ) FROM R WHERE p ( ). : : : (2) The union of two vertical grid cells is represented by (( ; )). The union of two horizontal fragments and is de ned as the disjunction of the binding predicates of the horizontal fragments. The SQL statement which de nes this well formed expression is SELECT c () FROM R WHERE p ( ) or p ( ). : : : (3) Given a regular fragment f belonging to the fragmentation scheme F with its representation as f1A ; 2A ; : : :; nA g. The SQL statement de ning the regular fragment is : : : (4) SELECT Sni=1 fc (i)g FROM R WHERE Wa 2 A fp (a)g. W where denotes the disjunction operator. The above expression is derived in a straight-forward manner from the statements (1), (2) and (3). Let (F; A) be the current design and (F ; A ) be the new design. Then the algorithm to generate the queries to materialize the new design from the current design is illustrated in the Figure 9. The routine De ne Columns() generates the speci cation of the columns according to the syntax of the CREATE data de nition language statement for the underlying distributed database management system. 0
0
0
0
0
0
0
0
0
Claim 1 The algorithm Query Generator() materializes the redesigned distributed database. Justi cation: Each fragment in the new fragmentation scheme F 0 is materialized at some site. The above algorithm generates the set of SQL statements that must be executed at each of the 12
Query Generator ((F; A); (F ; A )) Input: R relation, S set of sites in DDE, (F; A); (F 0; A0) Output: Set of CREATE,INSERT and DROP SQL statements 1. repeat for each s 2 S (a) generate frg (s) = ffi j s 2 i & (fi0 ; 0i) 2 F 0 g (b) repeat for each f 2 frg (s) i. let representation of f be f1 A ; 2A ; : : :; nA g 0
0
0
0
0
0
0
0
ii. Generate SQL statement: S \CREATE TABLE f De ne Columns ( ni=1 fc (i)g) ;" iii. Generate SQL statement: \INSERT INTO f Sni=1 fc (i)g S SELECT ni=1 fc (i)g FROM R WHERE Wa2Afp (a)g ;" 2. repeat for each s 2 S (a) generate frg (s) = ffi j s 2 i & (fi ; i) 2 F g (b) repeat for each f 2 frg (s) i. Generate SQL statement: \DROP TABLE f ;" 0
0
Figure 9: Query Generator
13
sites where fragments of the new design are allocated in the distributed database environment. The SQL statements create a fragment (Step 1(b)ii of the above algorithm) of the new fragmentation scheme at its allocated site (Step 1(a) of the algorithm), and insert the data into the fragment using the \INSERT INTO TABLE table name SELECT : : : ;" statement (Step 1(b)iii. of the algorithm). The important statement in the algorithm is the SELECT statement in the INSERT statement. The correctness of the above algorithm depends on the correctness of the SELECT statement. The statements (1), : : : , (4) prove that the SELECT statement is correct. Hence the above algorithm generates the correct set of statements that need to be executed at each of the sites to materialize the new fragmentation scheme. The second repeat loop generates a set of \Drop table : : : ;" statements to drop the current fragmentation scheme. The current design is automatically replaced by the new design. 2 The algorithm Query Generator gives us the following set of SQL statements that need to be executed at site s3 where the fragments of the fragmentation scheme F are located. The fragment names (see Tables 1 and 2) are substituted by actual table names of the local databases by the Query Generator algorithm. Fragments f2 ; f3 are allocated to site s3 . The set of SQL statements to be executed at site s3 to materialize fragments f2 ; f3 are: 1. CREATE TABLE EDPS ( 0
0
0
0
2.
Emp# Name City Dept# Proj# Salary Bonus
0
CREATE TABLE ESB (
Integer, Char(15), Char(20), Integer, Integer, Char(6), Char(6) );
Emp# Salary Bonus
Integer, Char(6), Char(6) ); 3. INSERT INTO EDPS(Emp#, Name, City, Dept#, Proj#, Salary, Bonus) SELECT Emp#, Name, City, Dept#, Proj#, Salary, Bonus FROM Employee WHERE (20 Dept# < 30) or (Dept# 30); 4. INSERT INTO ESB(Emp#, Salary, Bonus) SELECT Emp#, Salary, Bonus FROM Employee WHERE (Dept# < 10) or (20 Dept# < 30);
Similarly the Query Generator will generate appropriate SQL statements that need to be executed at Sites s1 , s2 and s4 to materialize fragments EDP , ESD1 and ESD2 respectively. After this a set of SQL statements to drop the fragments belonging to the fragmentation scheme F need to be executed at each of the sites where they are located. The set of DROP SQL statements which are executed at site s3 are: DROP TABLE SAL SCHOOL3; Thus the execution of above set of SQL statements at their respective sites will materialize the new design. 14
4.1.1 Problems with Query Generator approach The query generator approach, though simple and elegant depends on the processing capability of the distributed relational database system to materialize the new design. This approach will work well for small distributed databases with small number of fragments in a local area network environment. But for very large relations (gigabyte in size) with large number of fragments, it will become too unwieldy for the distributed database system to handle materialization process. In case of a wide area network environment materialization by this approach will not be practical because of the amount of time that it will take. Further this approach will overload the storage manager, concurrency control and recovery modules of the distributed database systems. In order to materialize large distributed databases, the distributed database administrator needs to have control over the materialization process so that the distributed database administrator can optimize the materialization process and perform incremental materialization. In Section 4.2 we present the operator method that provides such control over materialization process and in Section 7 we present a multiple query optimization technique to improve the eciency of the query generator approach.
4.2 Materialization using operations on fragments
Let f be a regular fragment with its representation r(f ) = f1A ; 2A ; :::; nA g, where 1 ; 2; : : :; n, are set of vertical fragments and A consists of the set of horizontal fragments. Let N = Sn the i=1 fc (i)g, where c(i) is the set of columns spanning the vertical fragment i . The operation split generates two sub-fragments by splitting a fragment either horizontally or vertically. This gives the following two types of the split operation: split v (NA; (N1)A; (N2)A) is the operation which splits vertically the fragment represented by S NA to generate vertical sub-fragments represented by N1 A and N2A where N = N1 N2. split h (NA; (N )A1 ; (N )A2 ) is the operation which splits horizontally the fragment represented by NA Wto generate two horizontal sub-fragments represented by NA1 and NA2 where p (A) = p (A1) p (A2) where the predicates A1 and A2 de ne the horizontal fragments. Merge is similarly de ned as merging horizontally or vertically, vertical merging is equivalent to forming a union of tuples, whereas horizontal merging is equivalent to concatenating the subfragments. These merging operations are de ned as follows. merge v ((N1)A; (N2)A ; NA) merges two vertical fragments represented by N1 A and N2A vertiS cally to form the fragment NA where N = N1 N2. by NA1 and NA2 are merge h ((N )A1 ; (N )A2; NA) merges two horizontal fragments represented W merged horizontally to form the fragment NA where p (A) = p (A1) p (A2). For example, fragment f2 of the fragmentation scheme F shown in Figure 6 has its representation as (3(a;b); 4(a;b)); given the representation of fragment f 0 as 4(a;b), the operation split v(f2; f 0 ; f 00) splits the fragment f2 into fragments f 0 (represented by 4(a;b)) and f 00 (represented by 3(a;b)). Similarly the operation merge v(f 0 ; f 00; f2 ), merges fragments f 0 and f 00 to form the fragment f2 . The number of tuples in (N1)A and (N2)A for split v operation are same as that of NA , and vice versa for the merge v operation. Similarly, the number of columns in NA1 and NA2 are the same as that of NA for the split h operation, and vice versa for merge h operation. The operations for reallocation are de ned as follows. 15
move (f; s; s ) moves the fragment f from site s to site s0. replicate (f; s) materializes a copy of fragment f at site s. remove (f; s) removes the fragment f at site s. 0
In order to come up with the exact split and merge operations to materialize the new design we need to identify intersection (overlap) between fragments of the current design and the new design.
4.2.1 Intersection of fragmentation schemes De nition 6 Given f1 and f2 as two fragments with their representations r1 and Tr2 respectively, the intersection of two fragments, f1 intersection f2 is de ned as the fragment f1 f2 de ned by T the well formed expression over the set of grid cells r(f1 f2 ) = f j 2r1 and 2r2g where r1 and r2 are the representations of fragments f1 and f2 respectively.
Theorem 1 [Regular Fragment Intersection Closure] Intersection of two regular fragments is a regular fragment.
Proof: Let f1 and f2 be two regular fragments with representations r1 and r2 respectively.
Let r1 = (1A1 ; 2A2 ; : : :; pA ) where A1 = A2 = : : : = Ap . and r2 = (1A1 ; 2A ; : : :; q A ) such that A1 = A2 = : : : = Aq . T2 Case 1: If r1 Tr2 is empty then it is a regular fragment. Case 2: Let r1 r2 be not empty; then there exists such that 2 r1 and 2 r2 . This implies that there exists some i and some j such that 2iA and 2j A . T T T That is, Ai Aj 6= ; which implies that Ai Aj 6= ;; 8 i; j . Therefore, let B = Ai Aj for all i and j . T Let I = fB j B 2 r1 r2 g. Let 1 be the smallest value of 2 I and let 2 be the largest value of 2 I . T Therefore 1 2. Then by the de nition of fragments f1 and f2 it follows that r(f1 f2 ) = f 1B ; : : :; kB ; : : :; 2B g. This representation is that of a regular fragment. Hence the intersection of two regular fragments is a regular fragment. 2 p
0
0
0
0
0
0
0
0
0
q
0
i
0
0
0
j
0
Let F = ff1; f2 ; : : :; fn g and F = ff1; f2 ; : : :; fm g be the two fragmentation schemes on R. As each fragment fi is a fragment of R, R covers4 fi . It may be possible to cover a fragment of F by using a subset of the fragments of F . 0
0
0
0
0
0
0
De nition 7 The Minimal Cover of a fragment fi of a fragmentation scheme F is the set 0
0
of fragments of fragmentation scheme F which have non-empty intersection with fi . That is, m(fi ) = ffj j r(fj ) T r(fi ) 6= ; where fj 2 F g. 0
0
0
Thus the minimal cover of a fragment (f ) of the new fragmentation scheme (F 0 ) consists of all the fragments (f ) from the current fragmentation scheme (F ) containing data to form it. The lemma below guarantees that no spurious data is generated when materializing a new fragment. 0
4
By covers, it means that each tuple of the fragment fi is a tuple or part of a tuple of the relation R. 0
16
Lemma 1 [No Spurious Data Generation] There is no fragment in F 0 whose minimal cover is empty.
Proof: Every grid cell of relation R belongs to some fragment fi of F and fj of F ; therefore, T r(fi ) r(fj ) = 6 ;. Hence the minimal cover of any fragment of F is not empty. 2 0
0
0
0
The basic idea is that no new grid cells are generated during the materialization of the new design, and all the grid cells are accounted for. Thus there is no spurious data generated while de ning the fragments of the new design. T The intersection scheme of fragmentation schemes F and F is de ned as F F = fm(f1 ); m(f2 ); : : :; m(fmT)g. The Regular Fragment Intersection Closure theorem assures that the sets of fragments in F F 0 are all regular fragments. The motivation for the intersection scheme is that for forming any fragment f of the fragmentation scheme F , data from fragments m(f ) of the fragmentation scheme F is needed. The cardinality of m(fi0 ) gives the number of sub-fragments that need to be merged to form the fragment f 0 . This intersection scheme provides an approach for generating the merge operations on the fragments of the fragmentation scheme F . The minimal covers for the fragments of the fragmentation scheme F are given in Table 3. For example, from Table 3 we notice that to form the fragment f20 , data from the fragments f1 , f3 and f4 (which are the elements of fragment f20 's minimal cover) is needed. 0
0
0
0
0
0
0
0
Table 3: Minimal Cover for Fragments in the New Design New Fragment (f 0 ) Elements (f ) of Representation Minimal Cover Site Minimal Cover r(f T f 0 ) (From Current Allocation) f1 f1 (1(a;b); 2(a;b)) s1 f2 f1 (1(c;d); 2(c;d)) s1 f3 (3c ) s1 f4 (3d ) s3 f3 f2 (3(a;b)) s2 f4 f2 (4(a;b)) s2 f3 (4c ) s1 f5 f4 (4d ) s3 0 0
0 0
0
De nition 8 An intersect of a fragment fi of fragmentation scheme F is de ned as the set of fragments of fragmentation scheme F which intersect with the fragment fi . That is, int(fi ) = ffj j r(fi ) T r(fj ) = 6 ; where fj 2 F g. 0
0
0
0
0
Thus the intersect of a fragment (f ) of the current fragmentation scheme (F ) consists of the fragments (f ) of the new fragmentation scheme (F 0 ) that need data from it. The lemma below guarantees that there is no loss of data when the new design is materialized. 0
Lemma 2 [No Loss of Data] There is no fragment in F whose intersect is empty. Proof: The proof is similar to that of theorem 2. 17
2
Intuitively, the above lemma says that every grid cell in the old design must occur somewhere in the new design. Otherwise, some data is lost. The cardinality of the int(fi ) of a fragment fi gives the number of sub-fragments the fragment fi is split into. In the above lemma it is proved that each fragment fi has a non-empty intersect, therefore each fragment fi needs to contribute its data to form some fragment fj0 of fragmentation scheme F 0 . Table 4 lists the intersect for each of the fragments of the fragmentation scheme F 0 . For example, the intersect of fragment f2 consists of fragments f30 and f40 . Table 4: Intersect for Fragments in the Current Design Old Fragment (f ) Elements (f 0) of Representation Intersect Site T 0 Intersect r(f f ) (New Allocation) f1 f1 (1(a;b); 2(a;b)) s1 f2 (1(c;d); 2(c;d)) s3 f2 f3 (3(a;b)) s3 f4 (4(a;b)) s2 f3 f2 (3(c)) s3 f4 (4(c)) s2 f4 f2 (3(d)) s3 f5 (4(d)) s4 0 0 0 0 0 0 0 0
4.2.2 Algorithms for the operator method
Given two designs (F; A) and (F ; A ), the methodology to materialize (F 0 ; A0) from (F; A) is illustrated as follows: 0
T
0
T
(F; A) ! Divide ! (F F ; A) ! Relocate ! (F F ; A ) ! Conquer ! (F ; A ). 0
0
0
0
0
That is, at each of the local sites the fragments of the current fragmentation scheme are split into sub-fragments based on the intersection of the current fragmentation scheme with the new fragmentation scheme (i.e. Divide). Next, these sub-fragments are reallocated by comparing the current and new allocation schemes (i.e Relocate). After that, the sub-fragments at each of the local sites are merged to form the fragments of the new fragmentation scheme (i.e. Conquer).
De nition 9 sub frgs(f ) de nes the sub-fragments into which the fragment f is split. sub frgs(f ) = T 0 f(f f ) j f 2 int(f )g. 0
In order to materialize a new design, fragments are split to their grid cells and clustered according to their sub-fragment de nitions. Then these clusters of sub-fragments are located at sites according to the new allocation scheme. Note that a fragment consists of a cluster of grid cells which are horizontally and/or vertically merged. This merging sequence is derived from the well formed expression de ning the fragment. The grid cells ( ) for each of the fragments (f ) of the current fragmentation scheme (F ) are generated by splitting the fragments horizontally and vertically, and 18
T
then these grid cells ( ) are clustered so as to form the fragments (f f 0 ) 2 sub frgs(f ). After T 0 that, these clusters of fragments (f f ); f 2 F; f 0 2 F 0 are then reallocated according to the new allocation scheme (A0). After all the grid cells ( ) for forming the new fragments f 0 2 F 0 are allocated according to the new allocation scheme (A0 ). The fragments (f 0) are formed by merging all the grid cells ( ) 2 r(f 0 ) so as to materialize the new fragmentation scheme (F 0 ). In order to support above mentioned steps, the following intermediate operations are de ned:
break (f ) which splits the fragments f into its grid cells 2 r(f ). This operation uses the
split v and split h operations. cluster (f ) which clusters the set of grid cells forming a fragment f into one cluster. locate (cluster(f ); ) which moves the cluster of grid cells forming the fragment f to a set of sites . form (f ) which merges the set of grid cells 2 r(f ) to form the fragment f . This operation uses the operations merge h and merge v .
The algorithms to de ne the operations break, cluster and form on the basis of the primitive operations split and merge are given in the Appendix 1. Materialization Strategy Operations
Divide, Relocate, Conquer.
Intermediate Break, Cluster, Locate , Form.
.
Split_v, Split_h, Merge_v, Merge_h Move, Replicate, Remove.
Operations
Primitive Operations
Figure 10: A Hierarchy of Fragment Operations Figure 10 illustrates the hierarchy of the materialization operations. The operations split h; split v; merge h and merge v are primitive operations. The operations break, locate and form are built on these primitive operations. The eciency of the operations break and form depends on the eciency of the split and merge operations. The operations Divide, Relocate and Conquer are the high level operations which are used by the algorithm that materializes the new design. The whole process of materialization of the new fragmentation and allocation scheme is done in three phases, namely: divide, relocate and conquer. The divide phase splits the fragments into 19
grid cells and clusters them according to the sub-fragments in the set sub frgs(f ). The relocate phase allocates these clusters according to the new allocation scheme. The conquer phase forms the fragments of the new fragmentation by merging the clusters of the sub-fragments. In the divide phase the routines break and cluster are used, in the relocate phase the routine locate is used, and in the conquer phase the routine form is used. Each of these phases is supported by an algorithm. Hence, the materialization of the new fragmentation and allocation scheme is equivalent to executing the routines presented in Figure 11. In Relocate routine s(f ) gives all the sites where the fragment f is located according to the new allocation scheme A . 0
0
0
Claim 2 The algorithms Divide(F), Relocate(F T F 0), and Conquer(F ) correctly materialize the 0
redesigned distributed database.
Justi cation: The correctness of the materialization of the new design by the above three algo-
rithms is supported by the following points: 1. The Divide algorithm does the following: The break operation is executed to get all the grid cells of a fragment f 2 F . As each cluster of grid cells (f T f 0) is required to form the fragment f 0 of the new fragmentation scheme F 0 . The T set of grid cells f 2 r(f )g are the clustered according to the sets of grid cells (f f 0 ) 2 sub frgs(f ). Note that the sets of grid cells in sub frgs(f ) are de ned by the fragments f 0 2 F 0 that have non empty intersection with fragment f 2 F . Therefore, the routine Divide breaks the fragments f 2 F into its grid T 0 cells, and clusters them into sets of grid cells r(f f ) so as to facilitate formation of the fragments f 0 2 F 0 . (Step 1(a) of Divide routine). 2. The Relocate algorithm does the following: At each site where a fragment f 2 F is allocated, for each cluster of grid cells (f T f 0) 2 sub frgs(f ); if the fragment f 0 isTnot allocated at this site, then the routine Relocate locates the cluster of grid cells (f f 0 ) at the site where the fragment f 0 is allocated. At the end of this step all the clusters of grid cells (f T f 0) 2 sub frgs(f ) are located at sites where the corresponding fragments f 0 2 F 0 are allocated. Moreover, as this routine is executed at all sites, all the grid cells required to form the fragments f 0 also get located at the site where f 0 is allocated (Step 1(a) of the Relocate routine). 3. The Conquer algorithm forms the fragments f 0 2 F 0 at each of the sites where they are allocated by merging the grid cells f 2 r(f 0)g that de ne the fragments f 0 2 F 0 . The execution of algorithms Divide, and Relocate before executing Conquer implies that the grid cells are already materialized at the sites where they are needed to form the fragments f 0 of the new fragmentation scheme F 0 . 4. Thus the execution of the Divide, Relocate, and Conquer algorithms correctly materializes the new design, by materializing the new allocation scheme A0 by executing routine Relocate and materializing the new fragmentation scheme F 0 by executing the routine Conquer. 2 The operator method gives the distributed database designer more freedom to eciently materialize the new fragmentation scheme. This is because the above set of operations are supported 20
Divide(F ) Input: Fragmentation scheme F; sub frgs(f ) for each f 2 F Output: Clusters of grid cells cluster(f \ f 0), f 2 F and f 0 2 F 0 1. repeat for each site s 2 S (a) repeat for each f 2 F at s i. break(f) ii. generate sub frgs(f ) T iii. repeat for each (f f 0) 2 sub frgs(f ) T A. cluster( f f 0 );
Relocate(F T F 0) Input: Clusters of grid cells cluster(f \ f 0), f 2 F , A0 allocation scheme 1. repeat for each site s 2 S (a) repeat for each f 2 F at s T i. repeat for each (f f ) 2 sub frgs(f ) T 0
A. locate (cluster (f f ); s(f )); 0
0
Conquer(F ) Input: Fragmentation scheme F 1. repeat for each site s 2 S (a) repeat for each f 2 F at s 0
0
0
0
0
i. form(f ); 0
Figure 11: Divide, Relocate and Conquer Routines
21
by the module which is external to the distributed database system. Once these set of routines have been successfully executed, changes are made in the system directory to represent the new fragmentation scheme. This is done by executing the SQL statements \CREATE TABLE : : : , " as described in the query generator method.
4.2.3 Example illustrating the operator method Figures 4 and 7 show the fragmentation schemes, their representation is shown in Figures 6 and 8. Table 4 shows the intersect for each of the fragments of the fragmentation scheme F and Table 3 shows the minimal cover for each of the fragments of the fragmentation scheme F . With reference to the above set of gures the divide, relocate and conquer algorithms will execute the following intermediate operations. The fragment names (f1; f2 ; f3; f4) for fragmentation scheme F and (f10 ; f20 ; f30 ; f40 ; f50 ) for fragmentation scheme F 0 will be replaced by the corresponding table names at run time by the routines em Divide, Relocate, and Conquer. The routines that need to be executed are listed for site s1 where the fragments DEPT PROJ and SAL SCHOOL2 of the fragmentation scheme F are located and the fragment EDP is located. 0
Site s1
Old design: The fragments DEPT PROJ and SAL SCHOOL2 are located at site s1 . New design: EDP is allocated to site s1 . int(DEPT PROJ ) = fEDP; EDPS g. int(SAL SCHOOL2) = fEDPS; ESD1g. T DEPT PROJ T EDPS T g. sub frgs(DEPT PROJ ) = fDEPT PROJ EDP; T sub frgs(SAL SCHOOLT2) = fSAL SCHOOL2 EDPS; SAL SCHOOL2 ESD1g. that DEPT PROJ EDP 2 sub frgs(DEPT PROJ ), and hence sub-fragment DEPT PROJ TNote EDP must be relocated to the site of fragment EDP . But fragment EDP is located at site s1 , which is also the site where fragment DEPT T PROJ is located, thus implying that there T is no need to relocate subfragmentT DEPT PROJ EDP . Therefore, cluster(DEPT PROJ EDP ) and locate((DEPT PROJ EDP ); s1) are not listed in the operations which are executed by routines Divide and Relocate. 1. break(DEPT PROJ ); 2. break(SAL SCHOOL2); T 3. cluster(DEPT PROJ EDPS ); T 4. locate((DEPT PROJ EDPS ), s3 ); T 5. cluster(SAL SCHOOL2 EDPS ); T 6. cluster(SAL SCHOOL2 ESD1);
T
7. locate((SAL SCHOOL2 EDPS ), s3 ); T 8. locate((SAL SCHOOL2 ESD1), s2 ); Similarly at Sites s2 , s3 , and s4 , we have: Old design: The fragment EMP SCHOOL1 is located at site s2 , and the fragment SAL SCHOOL3 is located at site s3 . New design: The fragment ESD1 is allocated to site s2 , and the fragment EDPS is allocated to 22
site s3 . int(EMP SCHOOL1) = fESB; ESD1g. int(SAL SCHOOL3) = fEDPS; ESD2g. sub frgs(EMP SCHOOL1) = fSAL SCHOOL2TT ESB; SAL SCHOOL2 TTESD1g. sub frgs(SAL SCHOOL3) = fSAL SCHOOL3 EDPS; SAL SCHOOL3 ESD2g. The fragments EMP SCHOOL1 2 F and ESD1 2 F 0 are located at the same site s2 , and fragments SAL SCHOOL3 2 F and EDPS 2 F 0 are located at the same site s3 . Therefore the operations executed by routines Divide and Relocate at sites s2 and s3 are: Site s2 1. break(EMP SCHOOL1); T 2. cluster(EMP SCHOOL1 ESB ); T 3. locate(cluster(EMP SCHOOL1 ESB ), s3 ); Site s3 1. break(SAL SCHOOL3); T 2. cluster(SAL SCHOOL3 ESD2); T 3. locate(cluster(SAL SCHOOL3 ESD2), s5 ); After that the set of form operations as executed by the Conquer routine at the sites where fragments of new design are allocated are listed below.
1. form(EDP ) at site s1 as fragment EDP is allocated to site s1 2. form(ESD1) at site s2 as fragment ESD1 is allocated to site s2 . 3. form(EDPS ) and form(ESB ) at site s3 as fragments EDPS and ESB are allocated to site s3 . 4. form(ESD2) at site s4 as fragment ESD2 is allocated to site s4 . After this, changes need to be made in the system directory to provide users and applications access to the new design. Once these set of routines are executed, the new fragmentation scheme will be materialized. This transfer from an existing distribution of data to a new distribution of data can be done automatically.
5 Incorporating Change in Grid Structure In the last section, algorithms had been developed to materialize the redesigned distributed databases using either SQL statements or using materialization operations on the fragments. But it was assumed that there is no change in the grid structure. That is, the fragments of the current design and the new design are de ned on the same vertical and horizontal fragmentation schemes. In this section we relax this assumption and consider the case where the fragments of the current fragmentation scheme F and the new fragmentation scheme F 0 are de ned on dierent of vertical and horizontal fragmentation schemes. 23
Let (F; A) be the current design based on vertical fragments (1; 2; : : :; n) and horizontal fragments (a; b; : : :; x) and denoted as f(f1; 1); (f2; 2); : : :; (fp; p)g. Let (F , A ) be the new design based on vertical fragments V = (1 ,2 ,: : : , m ) and horizontal fragments H = (a , b ,: : : , y ) and denoted as f(f1, 1 ), (f2 , 2),: : : ,(fq ,q )g. De ne a set of virtual vertical fragments V 00 = fc(v 00) = c(k) \ c(k ) j c(k) \ c(k ) 6= ;, where k 2 V and k 2 V g, and a set of virtual horizontal fragments H 00 = fp(h00) = p(u) V p(u ) j p(u) V p(u ) is non-contradictory5 , where u 2 H and u 2 H g Let G be the set of grid cells formed by vertical fragmentation scheme (V) and horizontal fragmentation scheme (H), G be the set of grid cells formed by vertical fragmentation scheme (V ) and horizontal fragmentation scheme (H ) and G00 be the set of virtual grid cells generated by the virtual vertical fragmentation scheme (V 00) and virtual horizontal fragmentation scheme (H 00). The grid cells G are mapped to grid cells G00 and the grid cells G0 are mapped to grid cells G00, so that the fragmentation schemes F and F are now de ned based on the common set of virtual grid cells G00. Then the algorithms presented in the last section can be applied to materialize new design (F , A ) from the current design (F; A) as both the fragmentation schemes are de ned on the same set of virtual grid cells G00. 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Theorem 2 [Mapping of Grid Cells] Any grid cell 2 G can be mapped to a regular mixed fragment f 00 de ned on grid cells G00.
Proof: Consider the grid cell , i.e., is the vertical fragment in f1; 2; : : :; ng, and is the horizontal fragment in fa; b; : : :; xg.
The grid cell shall be mapped to the virtual grid cells de ned by G00. Let ? = fv 00 j c() \ c(v 00) 6= ; where v 00 2 V 00 g. Let = fh00 j p( ) \ p(h00) is non-contradictory where h00 2 H 00 g. Because of the way in which V 00 and H 00 are de ned, c() \ c(v 00) = c(v 00) for all v 00 2 ? and V 00 p( ) p(h ) ) p(h00) for all h00 2 . This implies that the set of vertical fragments ? and horizontal fragments generate complete and non-overlapping vertical and horizontal fragments of and respectively. S W That is, v 2?c(v 00) = , and h 2 p(h00) = . Thus, the set of grid cells formed by vertical fragments in ? and horizontal fragments in completely de ne grid cell belonging to G. The mixed fragment formed by the grid cells generated by ? and represents the grid cell belonging to G. Hence any grid cell in G can be represented by a mixed fragment f 00 formed by grid cells in G00 whose representation is r(f 00) = fv 00h j v 00 2 ?; h00 2 g. Moreover, it is a regular fragment because for each virtual vertical fragment v 00 all the virtual horizontal fragments h00 2 de ne the grid cell . 2 00
00
00
By the above theorem, there is a mapping from each of the grid cells in G and G0 to mixed fragments de ned on grid cells in G00. Similarly, the fragments in fragmentation scheme F can be mapped to mixed fragments on grid cells in G00 and fragments in fragmentation scheme F 0 to mixed fragments on grid cells in G00. Thus the fragments in fragmentation schemes F and F 0 are de ned Non-contradictory means that conjunction of the two predicates p(u) and p(u ) makes sense, unlike conjunction of predicates, say, (SAL < 20k) and (SAL > 40K) which is contradictory because it does not make sense as there will be no tuple that satis es both the predicates. 5
0
24
on a common set of grid cells G00; hence the materialization algorithms developed in last section can be used for materializing the redesigned distributed databases. Note that the grid cells in G00 are virtual grid cells which are not materialized, it is just the fragments of the fragmentation schemes F 0 that are materialized. The concept of virtual grid cells is a tool to facilitate materialization of new design when there is a dierence in the grid structures of old fragmentation scheme and the new fragmentation scheme.
6 Cost Model In this section a cost model for materializing a redesigned distributed database is developed. In order to provide a uniform cost model we do not consider use of any database system dependent costs (like buer management) while materializing the new design. This cost model will be based on the amount of time taken to materialize the redesigned distributed database. Since the size of the distributed database is going to be large, only the time spent on data access and communication delay incurred to materialize the new design will be considered while developing the cost model. Both the costs considered are size on the size of the fragments of the distributed database design. As the fragmentation scheme is generated by the mixed fragmentation approach, an algorithm to calculate the size of the mixed fragments given the sizes of the vertical and horizontal fragments is developed. Let f1; 2; 3; : : :; ng be the set of vertical fragments of a relation R. Let K be the key attribute(s) of the relation. Let l(1); l(2); : : :; l(n) be the lengths (or, average lengths if the vertical fragments have variable length attribute) of the vertical fragments excluding the length of the primary key attribute and l(K ) be the length of the key attribute. Let fa; b; : : :; xg be the set of horizontal fragments of the relation. Let ft(a); t(b); : : :; t(x)g be the number of tuples in each of the horizontal fragments respectively. Then the size of a regular mixed fragment is calculated iteratively as follows: 1. The size of the grid cell represented by is size( ) = fl() + l(K )g t( ). 2. The size of the fragment represented by [ is size( [ ) = fl() + l(K )g ft( ) + t( 0))g. 3. The size of the fragment represented by k0 is size( k0 ) = fl()+ l(0)+ l(K )g t( ). 0
0
4. The size of aPregular mixed fragmentPf with its representation r(f ) as (1 A ; 2A ; : : :; pA ) is size(f ) = f( ii==1p (l(i)) + l(K )g f 2A(t( ))g. The above set of iterative rules provide us with an algorithm to calculate the size of each of the fragments of fragmentation schemes F and F 0 . Similarly, the sizes of the sub-fragments of fragment based on intersect and cover operations is calculated. The sizes of fragments and its sub-fragments enable us to calculate the time taken to materialize a distributed database. Given (F; A) and (F 0 ; A0) as the current and new distributed database designs. Let f 2 F be a fragment that is needed by fragments f 2 int(f ). This requires fragment f to be split into card(int(f )) sub-fragments. The cost of doing this is: 0
size(f ) e + fd pagesize
X
(f \ f ) eg fDataAccessTimeg d size pagesize 0
f 2int(f ) 0
25
(1)
where pagesize is the size of the page in bytes holding the tuples of the relation, DataAccessTime is the time taken on an average to retrieve page of data from the stable storage to the main memory. The above cost formula gives the time taken to divide a fragment f 2 F to its sub-fragments based on fragmentation scheme F 0 . The data in each page of fragment f that is read into the main memory is placed in the pages corresponding to the fragments de ned by the intersecting subfragments int(f ). A fragment f 2 F 0 that needs data from fragments f 2 m(f ) (i.e. minimal cover), requires f to be merged from card(m(f )) sub-fragments. The cost of doing this is: 0
0
0
0
size(f ) e + fd pagesize
X
0
(f \ f ) eg DataAccessTime d size pagesize 0
f 2m(f ) 0
(2)
The above formula gives us the cost of merging the sub-fragments of the new fragment f 0 based on the initial fragmentation scheme F . It is assumed that the sub-fragments are (in the worst case) stored as temporary tables in the site where f is to be located. This is a valid assumption because these sub-fragments are mostly transferred from other sites of the distributed database environment and may be too large to be accommodated in the main memory. The cost of moving the sub-fragments according to the new allocation scheme is given by: 0
X
size(f \ f ) ; DataTransferRate f 2m(f )&s(f )6=s(f ) 0
0
(3)
0
where the DataTransferRate is the number of bytes that can be transferred on an average from one site to another in the distributed database environment. Only those sub-fragments of a fragment f needed at other sites are transferred. Finally, the cost of materializing the redesigned distributed database de ned by (F 0 ; A0) based on the current design (F; A) by using the operator method in the worst case is given by:
X X f fd size(fi) e +
size(fi \ f ) eg d Co = pagesize f 2F pagesize f 2int(f ) X fd size(fj ) e + X d size(f \ fj ) egg fDataAccessTimeg + pagesize f 2m(f ) pagesize f 2F X size(f \ f ) g + f DataTransferRate f 2F &f 2m(f )&s(f )6=s(f ) 0
0
0
0
0
0
0
0
0
0
0
(4)
0
The rst part of the equation gives the cost of the Divide phase, the second part gives the cost of the Conquer phase and the last part gives the cost of the Relocate phase. This cost function has been generalized from the equations 1, 2 and 3. Given the length of the tuple of each vertical fragment, the cardinality of each horizontal fragment, the data access time and the data transfer rate, Equation 4 gives the cost of materializing the design (F 0 ; A0) from the design (F; A) by using the operator method. In case of the query generator approach, a set of SQL queries are executed at each of the sites in the distributed database environment. These set of SQL statements basically perform the operations of Divide and Conquer. Although it is not obvious from evaluating the SQL statements executed at each site, it becomes clear when all statements generated by the Query generator() are 26
considered together. The distributed query processor of the distributed database system does not have the capability of processing a set of SQL statements. Hence it evaluates and processes each of the SQL statements to its completion sequentially. Thus a set of SQL statements that divide a fragment f into its sub-fragments int(f ) would require a complete scan of the fragment f for each of the sub-fragments generated. But when merging the sub-fragments to form a new fragment, the sub-fragments (which already exist either as temporary tables or relations) are loaded into the main memory and the operations union and/or join are performed. Thus the cost of merging the sub-fragments to form a new fragment is exactly the same as that for the operator method. The sub-fragments of a fragment f are located at sites where they are needed to form the new fragments. Again the cost of doing this will be the same as that for the operator method. Therefore the cost of materializing the redesigned distributed database by query generator method in the worst case is given by:
X X f
size(fi) e + d size(fi \ f ) eg fd pagesize Cq = pagesize f 2F f 2int(f ) X fd size(fj ) e + X d size(f \ fj ) egg fDataAccessTimeg + pagesize f 2m(f ) pagesize f 2F X size(f \ f ) g f DataTransferRate + f 2F &f 2m(f )&s(f )6=s(f ) 0
0
0
0
0
0
0
0
0
0
0
(5)
0
Note that the query generator method incurs the additional cost of scanning the entire fragment to generate each of its sub-fragments. Here it is assumed that there are no indexes that can be used to reduce the number of pages that need to be accessed. Even if for one sub-fragment generation an index cannot be used, the cost of materialization for the query generator method will be larger than the operator method. While for the operator method the complete fragment is scanned only once to generate all its sub-fragments, for the query generator method it may require multiple scans of a fragment to generate all its sub-fragments. It is also not practical to have indexes to bene t the generation of the sub-fragments because of lack of disk space, and also index generation is a costly and time consuming job. Therefore, the operator method will outperform query generator method almost always.
7 Materialization by using Multiple Query Optimization The query generator approach to materialize the redesigned distributed database initiates a set of SQL statements to be executed at various sites in the distributed database environment. Since all these SQL statements are going to be executed concurrently, there is a scope for optimizing them together. In this section, it will be shown that applying multiple query optimization will enhance the eciency of the query generator approach to that of the operator method. The SQL statements to materialize the fragments of the new distributed database design are all sent to a coordinator site (which could be potentially the site where the materialization process was initiated). This coordinator site uses the multiple query optimization technique (as described below) to generate a set of SQL statements that need to be executed at various sites of the distributed database environment. This process of enhancing the eciency of the query generator approach by using 27
the multiple query optimization is described by the following steps. A detailed example illustrating the working of this MQO algorithm is given in the Appendix 2. 1. Each SQL statement to materialize the fragment f 0 of the new design is rewritten to specify which fragments (data localization step [10, (see chapter 8)]) f 2 m(f 0) (i.e. the minimal cover) of the current fragmentation scheme are accessed. After this each SQL statement is parsed and a query tree is generated. 2. This query tree is modi ed by adding get and combine operations (these operations are de ned in the next section). The leaf nodes of the query tree are the fragments of the current fragmentation schema. The root nodes are the fragments of the new fragmentation schema. The rest of the nodes denote get and combine operations on the fragments. 3. These enhanced query trees are then merged into one query graph by merging the common leaf nodes. This merged query graph contains as its leaf nodes (i.e. nodes with only out going edges) corresponding to the fragments of current distributed database design, and as the (multiple) root nodes (i.e. nodes with only out going edges) corresponding to the fragments of the new distributed database design. 4. Finally, an SQL statement is generated for each leaf node (emulating divide operation) and each root node (emulating conquer operation). There will be additional commands for transferring the data from one site to another site based on the dierences between current and new allocation schemes. The execution of these optimized SQL statements will materialize the new design.
7.1 MQO Algorithm to Materialize Distributed Database Design
The objective is to use the MQO technique in materializing design (F 0 ; A0) from the design (F; A). The algorithm consists of the following steps:
MQO materialization(SQL in, SQL out)
Input: SQL in is a set of SQL statements generated by the query generator algorithm. Output: SQL out is an optimized set of SQL statements to be executed at various sites of the distributed database environment. Step 1 Select a site s in the distributed database environment to be the coordinator site for the MQO (it can be the site where the materialization process was initiated). All the query modi cation steps will be performed at the site s . Step 2 For each f 0 2 F 0 , the Query Generator generates the following SQL statement. Note that c(r(f 0)) denotes the columns and p(r(f 0)) denotes the disjunction of the predicates speci ed in the representation of fragment f 0 and (f 0 ) gives the locations of the sites where f 0 is to be allocated.
INSERT INTO f c(r(f 0)) SELECT c(r(f 0)) FROM R WHERE p(r(f 0)); 0
28
The speci cation \insert@(f 0 )f 0" describes the operation of inserting tuples into the relation f at site (f 0 ). Let the function combine de ne the merging of a set of grid cells to form a mixed fragment and the function get de ne the retrieval of a set of grid cells from a mixed fragment. Rewrite the SQL statement using the fragment representations, combine function, and the get function. ff @(f ) j f 2 m(f 0)g lists the fragments in the minimal cover of the fragment f 0. The FROM clause of an SQL statement is replaced by the combine function, the SELECT and WHERE clauses are replaced by the get function. The motivation for doing this is to have a concise speci cation of the SQL statement and to facilitate the description of the multiple query optimization algorithm to be elaborated in Steps 3, and 4. In step 5 this notational speci cation using get and combine functions is translated back to a set of SQL statements. insert@(f 0 ) f 0 combine(r(f 0)) get(r(f 0)) from ff @(f ) j f 2 m(f 0 )g 0
Step 3 In order to reduce the amount of irrelevant grid cells accessed from the fragments of the current fragmentation scheme the get function needs to be propagated. Modify the above statement to specify only the relevant set of grid cells that need to be retrieved from each of the fragments of the current fragmentation scheme. insert@(f 0 ) f 0 combine(r(f 0)) for each f 2 m(f 0 ) generate: get(r(f 0) \ r(f )) from f @(f ) In this step the sub-fragments to be retrieved are given by the intersection of the fragment (f 0) in the get expression and the representation of the fragment (f ) being accessed. This speci cation is represented as a query tree (see Figures 12 and 13) with leaf nodes as the fragments of the current fragmentation scheme and the root node as the fragment of the new fragmentation scheme. Step 4 In this step all the query trees are merged together (see Figure 14), the common leaf nodes form the fragments of the current design and the multiple root nodes form the fragments of the newTdesign. The intermediate nodes are for the get and combine operations. The set of grid cells r(f ) r(f 0 ), f 0 2 int(f ) are extracted from each fragment f of the current design, and for each fragment f 0 of T the new design the sets of grid cells in r(f ) r(f 0), f 2 m(f 0) are combined. We now generate two sets of statements one for each fragment of the current design and one for each fragment of the new design. This is done by considering the minimal cover of the fragment f 0 2 F 0 and the intersect of the fragment f 2 F . That is, the materialization of the new distributed database design is done in two phases, the rst phase consists of extracting the relevant set of grid cells get(r(f ) \ r(f 0)) from the fragments of current design and sending it to the sites where they are required. The notational operation send to(site id), sends a set of grid cells (as a temporary table) to the site site id. The second phase consists of combining these set of grid cells (i.e. temporary tables) to form the new fragments. The rst phase which retrieves the relevant grid cells from the fragments in the minimal cover of the current design and stores them in the temporary tables is described below.
29
for each f 2 F for each f 0 2 int(f ) get(r(f ) \ r(f 0 )) ^ send to((f 0 )) as TempTable(f;f ) from f @(f ) 0
The following statements describe the formation of the fragments of the new fragmentation scheme by merging the grid cells (i.e. temporary tables). That is, combinef 2m(f ) (TempTable(f;f ))), for each fragment f 0 of the new design combine all the grid cells given by TempTable(f;f ) where f 2 m(f 0). Note that TempTable(f;f ) is formed by merging the set of grid cells represented by r(f ) \ r(f 0). 0
0
0
0
for each f 0 2 F 0 generate insert@(f 0 )f 0 combinef 2m(f ) (TempTable(f;f )) 0
0
Step 5 The modi ed query expressions speci ed above are translated into SQL statements to de ne the temporary tables that are generated, and to merge (by using the relational algebra operations union and join to combine the temporary tables to form a regular fragment) these temporary tables to form the fragments of the new design. The set of SQL statements enclosed between constructs beginfconcurrentg and endfconcurrentg are executed concurrently (this is where the bene t of MQO is achieved). Thus this optimization procedure generates a set of SQL statements that need to be executed at the sites of the fragments of the current and new design.
for each site s 2 S do: de ne frg (s) = ff j s 2 (f )&(f; (f )) 2 F g
beginfconcurrentg for each f 2 frg (s) for each f 0 2 int(f ) generate INSERT INTO TempTable(f;f )c(r(f ) \ r(f 0)) SELECT c(r(f ) \ r(f 0)) FROM f WHERE p(r(f ) \ r(f 0)); endfconcurrentg for each f 2 frg (s) ^ for each f 0 2 int(f ) ^ for each s 2 (f ) ^ for each s0 2 (f 0) if s = 6 s0 generate send to((f 0); TempTable(f;f )) 0
0
Following are the set of SQL statements that need to be executed at each of the sites where the fragments of the new distributed database design need to be materialized. for each site s 2 S do: de ne frg 0(s) = ff 0 j s 2 (f 0 )& f 0 2 F 0 g 30
for each f 0 2 frg 0(s) form sql(f 0) The function form sql(f 0) generates the SQL statement to materialize the fragment f 0 . All the fragments in the current distributed database design and the new distributed database design are regular fragments. Also all the elements of the set fr(f ) \ r(f 0 ) j f 2 m(f 0)g are regular. The fragments represented by grid cells r(f ) \ r(f 0) de ne the TempTable(f;f ) . These temporary tables were created in the previous step. Hence to materialize fragment f 0 we need to merge these temporary tables. There are two ways of merging tables, one is the join operation and the other is the union operation of tables. So in order to automate this merging process, the correct order of these merging operations needs to be generated. Let r(f 0) = f1 A ; 2A ; : : :; nA g, and A be set of horizontal fragments de ned by predicates p(a); p(b); : : :; p(s). The SQL statement to materialize the fragment f 0 is: form sql(f 0) 0
INSERT INTO f 0 c(f 0) SELECT c(f 0) FROM TempTable(f;f ) WHERE p(a) and joinff ja2r(f ) and f 2m(f )g(TempTable(f;f )) UNION SELECT c(f 0) FROM TempTable(f;f ) WHERE p(a) and joinff jb2r(f ) and f 2m(f )g(TempTable(f;f )) UNION 0
0
0
0
0
0
.. .
UNION SELECT c(f 0) FROM TempTable(f;f ) WHERE p(a) and joinff js2r(f ) and f 2m(f )g(TempTable(f;f )) 0
0
0
The function join takes as its input a set of temporary tables that satisfy the subscripted condition ( like, ff j b 2 r(f ) and f 2 m(f 0)g) and generates a set of join conditions to join these tables. For joining k tables, (k ? 1) join conditions are generated. The above algorithm rst joins all the tables to generate the horizontal fragments of the fragment f 0 and then it performs a UNION of all these horizontal fragments. Only those tuples that satisfy the predicate de ning the horizontal fragment from the temporary tables are joined as speci ed in the WHERE clause of the SQL statement. This is a brute force method to materialize the fragment f 0 , but any other method would be much more complex as dierent cases need to be considered. The SQL statement to materialize a fragment would depend on the cardinality, shape and size of the temporary tables required by its minimal cover. If the minimal cover has only one temporary table, then the above SQL statement is a simple select + project operation. If it has two temporary tables, then the statement uses either a join or an union operation. But as the number of temporary tables in the minimal cover increases, it is not straight forward to correctly specify the optimal order of join and union operations for merging these temporary tables. In some special cases a simpler SQL statement can be constructed by the 31
distributed database designer, but in order to provide a general, correct and automated solution we took the above approach. Moreover the local database system may use the query optimizer to optimize this SQL statement so as to execute it eciently. The above set of modi ed SQL statements are then sent from site s to all other relevant sites to be initiated. Once these SQL statements are completely executed, the new distributed database design is completely materialized. 2
7.2 Performance Improvement
The multiple query optimization assures that the performance of the query generator method to be the same as that of the operator method. The main dierence between the two was that the operator would generate all the sub-fragments de ned by int(f ) of a fragment f 2 F by reading in the fragment only once. Whereas in case of the query generator method, if there were no suitable indexes the relation would be scanned completely more than once to generate all the sub-fragments. This implied that the query generator method could perform worse than the operator method. In this multiple query optimization algorithm all the sub-fragments of a fragment are generated at once by executing the rst part of the Step 5. This would essentially mean that a fragment would be read only once to generate all its sub-fragments. Of course, this depends on the implementation query processor of the relational database system. But most commercial database systems use buering schemes that could enable a table to be read only once. This is the reason that all the temporary tables are generated concurrently. As a page of a relation read can be written into pages of all the temporary tables that require that data at once. This would require an optimal allotment of some buers to all the temporary tables which is beyond the scope of this paper. However, with the amount of main memory that current systems have, it should be feasible to do this. Similarly when merging the temporary tables to form the fragment of the new distributed database design, all the temporary tables can be read only once. There has been work done in general multiple query optimization [11, 12, 13] wherein some heuristic algorithms have been developed, and in [12] it was shown that the multiple query optimization problem is NP-hard. But in the case of the materialization problem, the common grid cells (i.e. temporary tables) accessed by multiple queries can be determined by comparing the representation of the fragments of current and new fragmentation schemes in a polynomial time. Therefore the materialization problem is a straight forward problem which has a simple and elegant solution as shown in this section.
8 Conclusions Two automated approaches (Query Generator method and Operator method) to materialize the new designs were described. The algorithms were presented and their correctness proved for both the methods. By means of an example we illustrated how the materialization procedure is automatically done by both approaches. The query generator method depends on the distributed database system's processing capabilities for eciently materializing the new designs. Thus the eciency of materialization in this approach depends on the global query optimization, decomposition and execution. In case of small sized distributed databases (possibly up to 10's of gigabyte in size) in a LAN environment, the query generator method will be useful. But, in case of large databases with hundreds of gigabyte of data, the distributed database system may not be able to handle the 32
volume of data transferred. On the other hand, the operator method is independent of the distributed database system functionality. The operator method is more suitable for large distributed databases in a WAN environment. The major advantage of the operator method is that it can initiate the routines divide, relocate and conquer in background with minimum interaction from the distributed database system. Moreover, the operations split, merge and move can be implemented so as to be ecient, it is this control which is lacking in the query generator method (which is very system dependent). It is this exibility in implementation of the primitive operations that makes the operator method more general and powerful than the query generator method. The materialization methodology was extended to take into consideration the change in grid structure during the redesign process. This extension is done by de ning both the current and new design by means of a common set of virtual grid cells. This enables us to use the materialization algorithms that were developed for the case when there is no change in grid structure. A cost model based on the sizes of the horizontal and vertical grid fragments is developed to estimate the time taken to materialize the distributed database by using both methods. It was shown that the query generator approach is in general more costly than the operator approach. The queries generated by the query generator were analyzed and optimized using multiple query optimization techniques to generate a modi ed set of queries to be executed at the sites of the current and new design of the distributed database environment. The performance of this MQO set of queries was argued to be that of the operator method. The operator method consists of three phases Divide, Relocate and Conquer, similarly, the modi ed set of SQL statements generated by the multiple query optimization also materialize the redesigned distributed database by (1) generating the temporary tables, (2) moving them to dierent sites (if necessary) and (3) merging them to form the new fragments. Thus this MQO set of SQL statements when executed emulate the operator method and use the distributed database system to materialize the new distributed database design. The operator method can still be used if the distributed database system cannot handle redistribution of large sets of data and if the materialization has to be done o line or incrementally. There are lot of open problems which need to addressed in order to evaluate the performance and implementation details of the algorithms to materialize distributed databases. Some of these problems are:
The operator method uses the primitive operations which are speci ed at higher level like
\split a fragment", \merge two fragments", etc. But detailed algorithms specifying how to implement these operations at the storage manager level have not been developed. A particular distributed database management system (like client-server versions of Oracle or Sybase) needs to be considered to design and evaluate the algorithms to perform the primitive operations split, merge and move operations. The design and development of these materialization algorithms should also take into consideration the issues of incremental materialization of redesigned distributed databases while the transactions are accessing the database. This would require scheduling the materialization operations on the fragments of the relations such that a minimal number of transactions that update the database are blocked. In a well de ned application processing center scenario it would be feasible to know in advance which transactions are going to be executed. This would enable us to schedule the materialization of fragments that are not being updated by these transactions. 33
Caching of the temporary tables created during materialization process will increase the eciency of the materialization algorithms. Hence a topic of future research is to study the eect of caching and buer management on the eciency of the materialization algorithms.
Finally, these algorithms need to be tested on large distributed databases and experimented
with various parameters such as size of relations, number of replications, number of sites, number of fragments per relation, etc. This experimentation will give us an insight into the materialization and maintenance problems of large distributed relational databases.
References [1] P. M. G. Apers. Data allocation in distributed database systems. ACM Transactions on Database Systems, 13(3):263{304, September 1988. [2] S. Ceri and S. B. Navathe. A comprehensive approach to fragmentation and allocation of data in distributed databases. In Proceedings of the IEEE COMPCON Conference, pages 426{431, 1983. [3] S. Ceri, S. B. Navathe, and G. Wiederhold. Distribution design of logical database schemas. IEEE Transactions on Software Engineering, SE-9(5):487{504, July 1983. [4] S. Ceri, M. Negri, and G. Pelagatti. Horizontal data partitioning in database design. In Proceedings of ACM SIGMOD International Conference on Management of Data, pages 128{ 136, 1982. [5] S. Ceri and G. Pelagatti. Distributed Databases: Principles and Systems. McGraw Hill, New York, 1984. [6] J. Muthuraj, S. Chakravarthy, R. Varadarajan, and S. B. Navathe. A formal approach to the vertical partitioning problem in distributed database design. In Proc. of Second International Conference on Parallel and Distributed Information Systems, San Diego, California, 1993. [7] S. B. Navathe, S. Ceri, G. Wiederhold, and J. Dou. Vertical partitioning algorithms for database design. ACM Transactions on Database Systems, 9(4):680{710, 1984. [8] S. B. Navathe, K. Karlapalem, and M. Ra. A mixed fragmentation approach for initial distributed database design. Journal of Computers and Software Engineering, To appear in 1994. [9] S. B. Navathe and M. Ra. Vertical partitioning for database design: A graphical algorithm. In Proceedings of ACM SIGMOD International Conference on Management of Data, 1989. [10] T. Ozsu and P. Valduriez. Principles of Distributed Database Systems. Printice-Hall Inc., 1991. [11] Timos K. Sellis. Multiple-query optimization. ACM Transactions of Database Systems, 13(1):23{52, March 1988. [12] Timos K. Sellis and Subrata Ghosh. On the multiple-query optimization problem. IEEE transcations on Knowledge and Data Engineering, 2(2):262{266, June 1990. [13] Kyuseok Shim, Timos K. Sellis, and Dana Nau. Improvements on a heuristic algorithm for multiple-query optimization. Data & Knowledge Engineering, 12:197{222, 1994. 34
[14] B. Wilson and S. B. Navathe. An analytical framework for the redesign of distributed databases. In Proceedings of the 6th Adavanced Database Symposium, Tokyo, Japan., 1986.
35
1 Break, Form and Cluster Operations Note that the fragments in the fragmentation schemes are all regular. A regular fragment is represented by f1 A ; 2A ; : : :; nA g. Where 1 ; 2; : : :; n are the basic vertical fragments, and A = fa1; a2; : : :; ang denote the set of basic horizontal fragments. Then the operation break is de ned by the following algorithm.
break(f ) begin repeat for each i 2 r(f ) split v(f; iA; f ); v = iA; repeat for each a 2 A split h(v; ia; f ); v=f ; end repeat f =f ; end repeat end begin 0
00
00
0
The form procedure is analogous to the break procedure. It is given below.
form(f ) begin repeat for each i 2 r(f ) v=; repeat for each a 2 A merge h(v; ia; f ); v=f ; end repeat merge v(v; v ; f ) v =f end repeat f =v end begin 0
0
0
0
00
00
0
The function cluster(f ) is de ned as follows
cluster(f ) begin Cl = ; 36
repeat for each i 2 r(f ) repeat for each a 2 A Cl = Cl Sfia g end repeat end repeat Clust(f ) = Cl end begin
2 MQO Example Consider the Employee relation given in Figure 2. Figures 4 and 7 illustrate the current design (F; A) and the new design (F 0 ; A0) respectively. The objective is to use the MQO technique in materializing design (F 0 ; A0) from the design (F; A) by using the query generator algorithm. Following are the set of SQL statements initiated to materialize the new design by the Query Generator algorithm: At site s1 :
INSERT INTO EDP Emp#, Name, City, Dept#, Proj# SELECT Emp#, Name, City, Dept#, Proj# FROM Employee WHERE Dept# < 10 AND 10 Dept# < 20; At site s : INSERT INTO EDPS Emp#, Name, City, Proj#, Dept#, Salary, Bonus SELECT Emp#, Name, City, Proj#, Dept#, Salary, Bonus FROM Employee WHERE Dept# 30 AND 20 Dept# < 30; At site s : INSERT INTO ESB Emp#, Salary, Bonus SELECT Emp#, Salary, Bonus FROM Employee WHERE Dept# < 10 AND 10 Dept# < 20; At site s : INSERT INTO ESD1 Emp#, School, Degree SELECT Emp#, School, Degree FROM Employee WHERE Dept# < 10 AND 10 Dept# < 20 AND 20 Dept# < 30; At site s : INSERT INTO ESD2 Emp#, School, Degree SELECT Emp#, School, Degree FROM Employee WHERE Dept# 30; 3
3
2
4
After step 3 of the MQO algorithm the modi ed speci cation of SQL statements are given below. Figure 12 shows the 5 modi ed query graphs corresponding to the above 5 initial SQL statements. Note that all the fragments being retrieved are regular fragments. 1). insert@s1 EDP combine((1( ); 2( ))) a;b
a;b
37
insert@s EDP
insert@s
1
combine
( ( 1( a
,b ) ,
get ((1( a , b ) , 2 (a
3
EDPS
combine (( 1 (c,d ) , 2
2 ( a , b )) )
get (( 1 (c, d ) ,2
, b ))
(c, d) ,3 ( c, d )
SAL_SCHOOL2
DEPT_PROJ ESB
combine ( 3
get ( 3
(a , b
(a , b
insert@s 2 ESD1
))
combine ( 4
get ( 4 (
)
))
SAL_SCHOOL3
DEPT_PROJ
3
))
2 1
insert@s
(c, d ), 3 (c,d )
(a , b,c )
a ,b , c )
insert@s 4 ESD2
)
combine (( 4 d ))
)
get ( 4 d ) 4
3 SAL_SCHOOL1
5 SAL_SCHOOL1
SAL_SCHOOL2
Figure 12: Initial Query Graphs
38
SAL_SCHOOL3
get((1( ) ; 2( ))) from DEPT PROJ@s1 2). insert@s3 EDPS combine((1( ); 2( ); 3( ))) get((1( ) ; 2( ); 3( ) )) from DEPT PROJ@s1 , SAL SCHOOL2@s1 , SAL SCHOOL3@s3 3). insert@s3 ESB combine((3( ))) get((3( ) ) from SAL SCHOOL2@s1 4). insert@s2 ESD1 combine((4( ))) get((4( ))) from SAL SCHOOL1@s2 ,SAL SCHOOL2@s1 5). insert@s4 ESD2 combine((4( ))) get((4( ) )) from SAL SCHOOL3@s3 a;b
a;b
c;d
c;d
c;d
c;d
c;d
c;d
a;b
a;b
a;b;c
a;b;c
d
d
After step 4 the modi ed query speci cations are given below. For example, the set of grid cells to be retrieved from DEPT PROJ is (1(c;d); 2(c;d); 3(c;d)), and the representation of DEPT PROJ is (1(a;b;c;d); 2(a;b;c;d)), therefore T the actual set of grid cells retrieved is (1(c;d); 2(c;d); 3(c;d)) (1(a;b;c;d); 2(a;b;c;d)) = (1(c;d); 2(c;d)) This optimization step is illustrated in the Figure 13 and the modi ed query descriptions are given below. 1). insert@s1 EDP combine((1( ); 2( ))) get((1( ) ; 2( ))) from DEPT PROJ@s1 2). insert@s3 EDPS combine((1( ); 2( ); 3( ))) (i) get((1( ) ; 2( ))) from DEPT PROJ@s1 (ii)get((3 )) from SAL SCHOOL2@s1 (iii)get((3 )) from SAL SCHOOL3@s3 3). insert@s3 ESB combine((3( ))) get((3( ) ) from SAL SCHOOL1@s2 4). insert@s2 ESD1 combine((4( ))) (i) get((4( ))) from SAL SCHOOL1@s2 (ii)get((4 )) from SAL SCHOOL2@s1 5). insert@s4 ESD2 combine((4( ))) get((4( ) )) from SAL SCHOOL3@s3 a;b
a;b
a;b
a;b
c;d
c;d
c;d
c;d
c;d
c
d
a;b
a;b
a;b;c
a;b
c
d
d
39
insert@s
3
EDPS 2
combine (( 1 (c,d ) , 2
(c, d ), 3 (c,d )
))
get (( 3 ( c ) )) get (( 1 (c, d ) ,2
(c, d)
get (( 3 ( d ) ))
))
SAL_SCHOOL3
SAL_SCHOOL2
DEPT_PROJ
insert@s 2 ESD1 4
combine (( 4
(a , b,c )
))
get (( 4 (a , b ) ))
get (( 4
SAL_SCHOOL1
(c )
))
SAL_SCHOOL2
Figure 13: Step 2 of Multiple Query Optimization
40
Figure 14 shows the combined optimized query graph. The intermediate nodes of the graph are the get and combine nodes. The get nodes retrieve the relevant set of grid cells from the fragments of current design to be used in materializing a fragment of the new design. Hence there are arrows coming in from the fragments of the current design showing which grid cells are being accessed, and there are arrows going into the combine statements which specify the grid cells being used to form the fragments of the new design. The combine nodes merge the temporary tables generated by the get nodes to form the fragments of the new design. After step 5 the modi ed sets of query speci cation are given below. 1). get((1( ) ; 2( ))) ^ send to(s1 ) as TempTable1 , get((1( ) ; 2( ))) ^ send to(s3 ) as TempTable2 from DEPT PROJ@s1 2). get((4( ) )) ^ send to(s2 ) as TempTable3 , get((3( ) )) ^ send to(s3 ) as TempTable4 from SAL SCHOOL1@s2 3). get((3 )) ^ send to(s3 ) as TempTable5 , get((4 )) ^ send to(s2 ) as TempTable6 from SAL SCHOOL2@s1 4). get((3 )) ^ send to(s3 ) as TempTable7 , get((4( ) )) ^ send to(s4 ) as TempTable8 from SAL SCHOOL3@s3 a;b
a;b
c;d
c;d
a;b
a;b
c
c
d
d
1 ). insert@s1 EDP combine(TempTable1) 2 ). insert@s3 EDPS combine(TempTable2, TempTable5 , TempTable7 ) 3 ). insert@s3 ESB combine(TempTable4) 4 ). insert@s2 ESD1 combine(TempTable3,TempTable6 ) 5 ). insert@s4 ESD2 combine(TempTable8) 0
0
0
0
0
Finally, after step 6 following sets of SQL statements are executed to materialize the new design. At site s1 : beginfconcurrentg
INSERT INTO TempTable Emp#, Name, City, Dept t#, Proj# SELECT Emp#, Name, City, Dept#, Proj# FROM DEPT PROJ WHERE Dept# < 10 AND 10 Dept# < 20; INSERT INTO TempTable Emp#, Name, City, Dept#, Proj# SELECT Emp#, Name, City, Dept#, Proj# FROM DEPT PROJ WHERE Dept# 30 AND 20 Dept# < 30; INSERT INTO TempTable Emp#, Salary, Bonus SELECT Emp#, Salary, Bonus FROM SAL SCHOOL2 1
2
5
41
insert@s EDP
insert@s ESD3
1
insert@s EDPS
3
combine (( 4d ))
combine ( (1 ( a ,b ) , 2 ( a , b )) ) insert@s ESB
3
combine (( 1 (c,d ) ,2
combine (( 3
DEPT_PROJ @s 1
(a , b)
(a , b)
))
(c, d ), 3 (c,d )
))
insert@s ESD1
2
combine (( 4 (a ,b,c ) ))
))
get (( 4 (a ,b) ))
get((1(a , b ) , 2 (a , b) )) get (( 3
4
get(( 4
get (( 1(c,d ) ,2 (c,d )))
SAL_SCHOOL1 @s 2
c
))
get (( 3 c ))
SAL_SCHOOL2 @s 1
get ((4 d ))
get (( 3d ))
SAL_SCHOOL3 @s 3
Figure 14: Combined optimized query graph
42
WHERE 20 Dept# < 30; INSERT INTO TempTable Emp#, School, Degree SELECT Emp#, School, Degree FROM SAL SCHOOL2 WHERE 20 Dept# < 30; 6
endfconcurrentg send to(s3 ) TempTable2 send to(s3 ) TempTable6 send to(s2 ) TempTable5 At site s2 : beginfconcurrentg INSERT INTO TempTable3 Emp#, School, Degree SELECT Emp#, School, Degree FROM SAL SCHOOL1 WHERE Dept# < 10 AND 10 Dept# < 20; INSERT INTO TempTable4 Emp#, Salary, Bonus SELECT Emp#, Salary, Bonus FROM SAL SCHOOL1 WHERE Dept# < 10 AND 10 Dept# < 20; endfconcurrentg send to(s3 ) TempTable4 At site s3 : beginfconcurrentg INSERT INTO TempTable7 Emp#, Salary, Bonus SELECT Emp#, Salary, Bonus FROM SAL SCHOOL3 WHERE Dept# 30; INSERT INTO TempTable8 Emp#, School, Degree SELECT Emp#, School, Degree FROM SAL SCHOOL3 WHERE Dept# 30; endfconcurrentg send to(s4 ) TempTable8 At site s1 :
INSERT INTO EDP Emp#, Name, City, Dept#, Proj# SELECT Emp#, Name, City, Dept#, Proj# FROM TempTable ; At site s : INSERT INTO ESD1 Emp#, School, Degree SELECT Emp#, School, Degree FROM TempTable , TempTable WHERE TempTable .Emp# = TempTable .Emp#; At site s : INSERT INTO EDPS Emp#, Name, City, Proj#, Dept#, Salary, Bonus SELECT Emp#, Name, City, Proj#, Dept#, Salary, Bonus FROM TempTable , TempTable WHERE TempTable .Emp# = TempTable .Emp# UNION SELECT Emp#, Name, City, Proj#, Dept#, Salary, Bonus FROM TempTable ; INSERT INTO ESB Emp#, Salary, Bonus SELECT Emp#, Salary, Bonus 1
2
3
6
3
6
3
5
7
5
7
2
43
FROM TempTable ; INSERT INTO ESD2 Emp#, School, Degree SELECT Emp#, School, Degree FROM TempTable ;
At site s4 :
4
8
44