Complete and fragmented replica selection and retrieval in Data Grids

Future Generation Computer Systems 23 (2007) 536–546 www.elsevier.com/locate/fgcs

Complete and fragmented replica selection and retrieval in Data Grids Ruay-Shiung Chang ∗ , Po-Hung Chen Department of Computer Science and Information Engineering, National Dong Hwa University, Shoufeng, Hualien, 974, Taiwan Received 1 February 2006; received in revised form 9 July 2006; accepted 16 September 2006 Available online 2 November 2006

Abstract Data Grids support data-intensive applications in wide area Grid systems. They utilize local storage systems as distributed data stores by replicating datasets. Replication is a commonly used technique in a distributed environment. The motivation of replication is that replication can improve data availability, data access performance, and load balancing. Usually a complete file is copied to many Grid sites for local access. However, a site may only need parts of a replica. Therefore, to use the storage systems efficiently, it is necessary for a Grid site to store only parts of a replica. In this paper, we propose a concept called fragmented replicas. That is, when doing replication, a site can store only some partial contents needed locally. It can greatly save the storage space wasted in storing unused data. We also propose a block mapping procedure to determine the distribution of blocks in every available server for later replica retrieval. According to this procedure, a server can provide its available partial replica contents for other members in the Grid system to access. On the other hand, a client can retrieve a fragmented replica directly by using the block mapping procedure. After the block mapping procedure, some co-allocation schemes can be used to retrieve data sets from the available servers. The simulation shows that the co-allocation schemes also improve download performance in a fragmented replication system. c 2006 Elsevier B.V. All rights reserved.

Keywords: Data Grid; Replication; Dynamic self-adaptive replica location; Fragmented replication

1. Introduction The computational problems arising from many scientific disciplines are growing more and more complicated. They cannot be solved by a single computer. The alternatives of using a supercomputer or a distributed system prove costly or inconvenient. Furthermore, the data sets may be huge and these scientific applications may need to transfer terabytes or petabytes of information between wide-area, distributed computing environments. Experimental analyses and simulations in several scientific applications could need this kind of service. In order to solve these problems mentioned above, Grid technologies have been proposed. The term Grid is borrowed from the electrical grid [1]. Users can obtain computing power through the Internet by using a Grid, just like getting electric power from any socket in the wall. By connecting your computers to the Grid, a user can get the needed computing power and storage spaces. Grids will ∗ Corresponding author.

E-mail address: [email protected] (R.-S. Chang). c 2006 Elsevier B.V. All rights reserved. 0167-739X/$ - see front matter doi:10.1016/j.future.2006.09.006

connect computers, workstations, servers and storage elements, and provide necessary mechanisms for everyone to use the Grid. The users do not have to worry about where the resources and applications are. Grids will assign the most appropriate one to a user’s job. The basic concept of Grids is a coordinated resource sharing and problem solving mechanism in the dynamic, multi-institutional virtual organizations. This sharing defines clearly and carefully just what is shared, who is permitted to access, and the behavior under which sharing occurs. A set of institutions and individuals that follow such sharing rules form a virtual organization (VO) [2]. Fig. 1 is an example. There are three actual organizations and two VOs in this example. Every actual organization can participate in more than one VO to share its resources. The sharing rules are formulated according to resource owners, the actual organizations, and VOs. Some basic concepts of a Grid can be found in [3] and for other backgrounds in Grids, please refer to [4–8]. Among many Grid applications, Data Grid is one of the most important. For example, the high-energy physics experiment needs to do a lot of analyses on large amounts of data sets.

R.-S. Chang, P.-H. Chen / Future Generation Computer Systems 23 (2007) 536–546

537

Fig. 1. Sharing relationships within virtual organizations.

CERN is the world’s largest particle physics center [9]. The LHC (Large Hadron Collider) at CERN will start running in 2007. The biggest scientific instrument in the world will produce a large amount of data sets up to several petabytes per year. CERN has chosen the Grid technology to solve this challenging huge data storage and computing problem. The volume of data sets produced by the LHC is about 10 PB a year. The Grid technology seems to be a reasonable way to govern all the data sets. CERN pushes the Grid in several ways and leads some of the most important Grid projects in this world. So CERN hopes that the Grid technology can be a practical approach when the LHC starts running in 2007. DataGrid is a project funded by the European Union [10]. The objective is to build the next generation computing infrastructure providing intensive computation and analysis of shared large-scale databases, from hundreds of terabytes to petabytes, across widely distributed institutions. In Data Grids, data can be stored at multiple heterogeneous storage systems such as DPSS (Distributed Parallel Storage System) or HPSS (High Performance Storage System). Data Grids also provide a basic architecture to serve data-intensive applications. Examples of these applications include handling data generated by supercomputers, performing experimental analyses for a large data set, and simulating for scientific discoveries. These applications all have the same characteristics of accessing and handling an extraordinary amount of data. The ability of a single computer is difficult to meet all requirements of these scientific applications. It is a reasonable way to make copies or replicas of the data set and stored these replicas among multiple sites because of the geographic distribution of the cooperation in a Data Grid. Replication has been used in distributed computing technology for a long time. The benefits of using replication are to reduce access latency and bandwidth consumption. Replication can also improve data availability and load balancing. For example, Fig. 2 is the scheme for the high energy physics data sets generated by CERN that are replicated in a hierarchical structure. All data sets are stored in the original location (CERN) and some partial data sets are stored at a lower layer of this structure. The scientists in these sites of the lower layer generate a request for a data set and the data set may be transferred from the server (CERN) to the client. The server will have a greater overhead if every client requests the file from the

Fig. 2. The scheme for hierarchical replication of data in CERN.

server. If there are replicas, the scientists can transfer the needed file from the other sites. Thus, services for replicating data, selecting replicas from available sites, and locating existing replicas are necessary. One popular application with data distribution ability is the peer-to-peer system. Basically, a Grid is about the sharing of resources, including computing power, storage space, I/O devices, and programs while a P2P system mainly focuses on the sharing of files. Grid systems usually have to be verified and trusted while P2P is based on anonymity and untrusted assumptions. Please refer to [11] for a special issue about the interaction between P2P computing and Grids. In Grid data replication, there exist some issues: (1) replica management, (2) replica selection and (3) replica location. Replica management is the process of creating or deleting replicas at a storage site [12]. To create a replica, we have to select a Grid site to place the replica. This is the replica placement problem. Most often, these replicas are exact copies of the original files, created to harness certain performance benefits. A replica manager typically maintains a replica catalog containing replica site addresses and the file instances. Replica selection is the process of choosing a replica from among those spreading across the Grid, based on some characteristics specified by the application [13,14]. One common selection criteria would be access speed. Finally, replica location is the challenging problem of finding the physical locations of multiple replicas of the desired data efficiently in a large-scale wide area Data Grid system [15,

538


16]. A complete replica can be located in many Grid sites. As stated above, it may occur that only partial contents of the replica will ever be needed in a local application. In order to save the storage space, it may be necessary to delete some useless and unnecessary contents from the replica, resulting in a fragmented replica. Based on this concept, we discuss the replica fragmentation and its selection problem in this paper. Fragmentation in other distributed file systems, especially the distributed database systems [17], are also common. However, they are usually under a tightly controlled system. The management method is mostly centralized. Therefore, their methods cannot be directly applied to the highly distributed Grid systems. Another point that makes the Data Grid stand out is the sheer size in datasets that makes the methods in distributed file systems inapplicable. The rest of this paper is organized as follows. Section 2 is the related work about replica problems. In Section 3, we introduce the DSRL (Dynamic Self-adaptive Replica Location) architecture and our approaches. Simulations are conducted in Section 4. Finally, Section 5 concludes this paper and suggests some possible future research work. 2. Related work There are many different sites in a Data Grid environment. These sites may have various performance characteristics, such as different storage system architectures, network connectivity, and system load. The datasets which we want to access are probably located at some of these sites. The information infrastructure is handled by MDS (Monitoring and Discovering System) within the Globus Toolkit [7]. We can query MDS about the information of these replica locations. The storage broker architecture [12,18] provides a replica selection mechanism to find a suitable replica for accessing using a highspeed file transfer protocol, GridFTP. But this protocol just accesses a file from the best matched server. It does not perform the operation in parallel. Multiple replicas can improve the data retrieval performances in a Grid environment by allowing parallel downloading of the desired data sets. However, since network conditions and server speeds differ, how to schedule the downloading job efficiently is very important. In [19,20], an approach to predict new file download performances based on the observations of past transfers is proposed. The goal is to obtain an accurate prediction of the time required to transfer a file. A predictive framework that combines three main parts is developed. First, record the information about every data transfer. Then adopt GridFTP service and add a mechanism to log performance information of every file transfer. The second step is making predictions of future behavior based on past performance information using a regression analysis. The third step is integrating this prediction with a resource provider, and allowing this prediction to be discovered by using an information service. Replicating popular content in multiple servers is a widely used technique. In [14], the author employs the prediction techniques and developed several co-allocation mechanisms to establish connections between servers and a client. There

is only one stream in a typical Internet download between a server and a client. This may suffer from some problems when downloading large datasets in a DataGrid environment. The bandwidth achievable is limited by several bottlenecks. For example, one is the congestion in the link connecting the server to the client and to the Internet. The other is the outputting bandwidth of the server’s network connection. It may involve the server CPU speed, disk storage speed, etc. One way to improve download speeds is downloading data from multiple locations in parallel in order to minimize the problems mentioned above. Based on the Globus resource management architecture [4], a co-allocation architecture is proposed. It has three components: 1. Application: when a user executes an application to access the data, the job of the application is to present the description of the data to broker. 2. Broker/co-allocator: the user needs a broker to identify the available and appropriate resources in the Grid system for his tasks. 3. Local storage systems: the storage systems provide basic mechanisms for accessing and managing the data. The data is located in systems such as high performance storage systems(HPSS) and distributed parallel storage systems (DHSS). The authors in [14] developed several co-allocation mechanisms to enable parallel downloading. The most interesting one is called Dynamic Co-Allocation. The dataset that the client wants is divided into “k” disjoint blocks of equal size. Each available server is assigned to deliver one block in parallel. When a server finishes delivering a block, another block is requested, and so on, until the entire file is downloaded. Faster servers can deliver the data quickly, thus serving larger portions of the file requested when compared to slower servers. This approach exploits the partial copy feature of the GridFTP within the Globus Toolkit. One drawback of this approach is that faster servers must wait for the slowest server to deliver the final block. Although the Dynamic Co-Allocation proposed by [14] reduces the total transfer time, the faster servers have to wait for the slowest server. In [21], an improvement of the Dynamic CoAllocation is proposed. The work is based on the co-allocation architecture and the prediction technique. This paper propose two techniques: (1) abort and retransfer, (2) one by one coallocation. These techniques can increase the volume of data requested from faster servers and reduce the volume of data fetched from slower servers. These two techniques are aimed at the drawback of Dynamic Co-Allocation. The Abort and Retransfer scheme allows the aborting of the slowest server transfer work and moves the work to the faster servers. It can change the allocation condition according to the situation of the transfer. When all data blocks are assigned, the procedure will check the remaining transfer time of the slowest server. If the remaining time is longer than the time of transferring the last data block from the fastest server, the final data block will be assigned to the fastest server. Thus, this technique can prevent the drawback of [14].


539

Another one is called One by One Co-Allocation which is mainly to prevent the allocation to the slowest server. The One by One Co-Allocation is a pre-scheduling methodology to allocate the data blocks to the available servers. By using the prediction technique, the transfer time of each server can be calculated. The data blocks are assigned to the faster server with the lowest transfer time in every round. If one server is assigned transferring one or more data blocks in an earlier round, its total transfer time is accumulated. Thus, preventing the allocation to the slower server is achievable. In a huge Data Grid system, one data set may have a lot of replicas owing to its popularity. In [15], the challenging problem of finding the physical locations of multiple replicas of desired data efficiently in a large-scale wide area Data Grid system is discussed. The authors propose a Dynamic Self-adaptive Replica Location method (DSRL) to solve this problem. It describes how to get one or more corresponding Physical Replica Names (PRN) from a logical name. DSRL has many advantages such as low query latency, scalable, reliable and self-adaptive. 3. Fragmented replica selection and retrieval in Data Grids

Fig. 3. Architecture of DSRL.

3.1. DSRL architecture In a Data Grid, data sets are stored on multiple heterogeneous storage systems, such as distributed file systems like Andrew File System (AFS) or Network File System (NFS), mass storage systems like High Performance Storage System (HPSS) or UniTre. These storage systems differ very much in basic mechanisms. In order to achieve uniform access interface, every data set has a unique logical name in a Data Grid system, called Logical Data Name (LDN) [15]. Thus, finding desired data sets becomes easier in a Data Grid that the client just needs to know the LDN of the desired data sets. The client gets one or more corresponding PRN by giving the LDN of the desired data set to the DSRL. The HTTP URL (HyperText Transfer Protocol Uniform Resource Locator) is one kind of PRN that can be seen everywhere. Fig. 3 shows the architecture of DSRL [15]. DSRL adapts a three-layer decentralized architecture consisting of physical storage layer, local location layer and global location layer. The physical data sets are stored on Storage Sites (S) in the storage layer. The Local Location Node (LLN) provides mappings between the LDN and the PRN in local location layers, as shown in Fig. 4. In global location layers, the Global Home Node (GHN) only maintains the information on mapping between LDN and LLN to minimize the storage and update overhead. These mapping information are called home directory information. The LLN caches recently accessed home directory information to reduce the overhead of the GHNs. The cache replacement adopts timeout and LRU (Least Recently Used) mechanisms. In a Data Grid, maintaining strong consistency is very hard because data are constantly being changed in various nodes across the Grids. Some nodes may leave this Data Grid suddenly without notifying the GHN. Thus, the GHN will never know that some replica information of the leaving nodes is outdated.

Fig. 4. Local replica information on LLNs.

Storage sites, LLN and GHN are all logical concepts. They can be located at the same node in the real systems. In addition, home directory information should not concentrate in only one or a few GHNs. Otherwise, queries will be directed to these few nodes such that they become system bottlenecks. Thus, in [15], the authors propose a dynamic mapping technique to achieve load balance by selecting home nodes for data elements using MD5 hash algorithms. When a home node joins or departs, the home directory information should be migrated among GHNs according to the prefix-matching principle such that the architecture can remain in a balanced state. So, DSRL can deal with the increase of the whole Data Grid system. 3.2. Complete replicas and their retrieval In the dynamic co-allocation scheme [14], an original file may be copied to n different sites. When processing the complete replica retrieval, the data set is divided into “k” disjoint blocks of equal size. Each available server is assigned to deliver one block initially. Once a server finishes a delivery, another block is assigned and so on, until all blocks are assigned. Then, the client can download a file that it wants from different sites in parallel. The faster server will transfer more blocks than slower servers. It can reduce the downloading completion time and improve the Grid system performance. Fig. 5 is the diagram of this approach. In Fig. 5, OF is the original file. CR is a complete replica. Each replica is divided into b1 , b2 , . . . , bk , k replicas. The drawback is that faster

540


Fig. 5. The Dynamic Co-allocation approach.

Fig. 7. The procedure of One by One Co-allocation approach.

Fig. 8. Extra local replica information on LLNs of DSRL.

Fig. 6. The procedure of the abort and retransfer approach.

servers must wait for the slowest server to deliver the final block. In [21], two techniques are proposed to improve the dynamic co-allocation method. The first one is the abort and retransfer scheme. It adapts the dynamic co-allocation to allocate the data blocks. When all data blocks are assigned, it checks the completion time of the slowest server. It allows the client to abort the slowest server delivery. Then, it allocates the block to a faster server and checks if the completion time can be further improved recursively. The procedure is shown in Fig. 6. T is the transferring time for the available fastest server to deliver one block. R is the time for the slowest server to deliver the remained portion of the last block. Even though the abort and retransfer approach improves the performance of data transfer, it still cannot prevent the allocation to the slowest server. The other method is called the one by one approach. It calculates the time needed to transfer one data block. We can calculate the time needed to transmit one data block as Ti = M/Bi for server Si , where M is the data block size and Bi is the predicted transfer rate. Let Ci = Ti × Ai , where Ai is the number of blocks assigned to Si . Ci is the completion time of transmitting Ai blocks from Si . Let E i = Ti × (Ai + 1), where E i is the estimated completion time assigned one more block. R is the number of blocks remaining. The client divides the files into k blocks and the initial value of R is equal to k. On the co-allocation architecture, the client presents the description of k blocks to the broker. The broker identifies the file locations that can be fetched from the information services and informs the co-allocator agent. Then, the agent runs the recursive algorithm to assign the data blocks to faster servers and the client downloads the data in parallel using GridFTP. The

co-allocator agent assigns one block to the fastest server with the minimum transfer time in every round. The accumulated transferring time for each server can be calculated. Fig. 7 is the procedure. The approach avoids fetching the data blocks from slower servers. It not only decreases the completion time of data transfers, but also reduces the workload of slower servers. 3.3. Fragmented replicas and their retrieval A replica is a duplicate of an original file (OF) from the source site. The source site puts these replicas to the chosen sites according to the replication strategies [22]. Not all contents of a file are needed by every Grid site or application. Therefore, we can save the local storage space and a great deal of bandwidth by allowing fragmented replicas. For a data set, a fragmented replica consists of parts of the data set content copied to a Grid site. For example, these fragmented replicas could be a section of video or music that users specifically want. In this paper, based on the DSRL architecture, we allow a site to retain only the partial contents of an OF that are really needed and to delete the unnecessary parts. To achieve this capability, we need extra local replica information on local location nodes of the DSRL architecture. This is depicted in Fig. 8. C/F, FO and FRS are three new fields added to the local replica information table. C/F indicates whether the replica is a complete one or a fragmented one. A single bit is sufficient for C/F. It can help a client to search for suitable sites to download data. FO stands for Front Offset and FRS stands for Fragmented Replica Size. Their meanings are shown in Fig. 9. To allow for maximum flexibility, although data transfers are always done in units of blocks, we allow arbitrarily fragmented replica size in a Grid site. There is also no restriction as to how many fragmented replicas a site may possess. Only complete recording of the local fragmented replica information is needed. When processing the replica selection, each client maps the requested blocks into the Grid sites using the LLN table. The mapping algorithm will be described later.


541

Fig. 9. FO and FRS illustration.

Fig. 11. A fragmented replica system.

Fig. 10. Fragmented replica selection.

Fig. 10 shows the fragmented replica retrieval concept. A file or data element may be stored in more than one different storage site. To save the storage space, some storage sites will delete the unnecessary partial contents of a replica. Then, they inform LLNs (local location node) to update the local replica information and the home directory information. If a client requests a data element, whether a complete one or a fragmented one, it will find out all sites which have the desired data blocks for parallel downloading. First, the client queries the LLN to check whether there is some local replica information on LDN1 in question. If it is true, the LLN will return the corresponding PRNs of LDN1. If the fragmented replicas stored in the local site cannot meet the request, it uses a dynamic mapping technique to obtain the global home node of LDN1. Then, it turns to the global home node to fetch the home directory information of LDN1. The home directory information is the mapping between the LDN1 and the LLNs which contain the local replica information of LDN1, such as (LDN1, LLN1), (LDN1, LLN2), . . . and (LDN1, LLNi). It looks up the returned LLNs sequentially to find out the servers that have the needed fragmented replica. After the information is collected completely, the parallel downloading can commence. Let there be m Grid sites denoted as S1 , S2 , . . . , Sm . Also assume that a complete replica can be divided into n blocks denoted as B1 , B2 , . . . , Bn . In a fragmented replica system, each site may store some parts of a complete replica and each block may be found in some sites. This situation is shown in Fig. 11. To access a fragmented replica, we can remove from the block set all those blocks that are not needed from Fig. 11. Then we want to assign blocks to sites such that the parallel downloading time can be minimized. In a high speed Internet, the block size may be relatively small compared to the speed of

the network. Therefore, we can assume the downloading time is the same for each block disregarding its downloading site. Then an algorithm from [23] can be used to find an optimal schedule for parallel downloading. However, if the downloading time for each block depends on where it is downloaded from, the problem becomes NP-complete [23]. Even in this condition, the algorithm in [23] can be used as a heuristic to obtain a feasible solution. In the following, we will modify the algorithms for complete replicas for accessing the fragmented replicas. A client first informs the broker to find out the available servers which have the desired fragmented replica. Since only a whole block can be transferred and each record in the replica location information table represents a continuous fragmented replica, a block mapping technique is used to find out the exact relation of the block and the fragmented replica that each site owns. When the broker queries the replica location information a client wants, there are four cases shown in Fig. 12. Case 1: The fragmented replica starts on the beginning of one block and finishes in the end of one block. Case 2: The fragmented replica starts on the beginning of one block and finishes inside of one block. Case 3: The fragmented replica starts inside of one block and finishes in the end of one block. Case 4: The fragmented replica starts inside of one block and finishes inside of one block. The formula is shown in Fig. 12 where α is the block size (BS); β is the fragmented replica size of this site (FRS); γ if the front offset of a fragmented replica (FO), the front offset of a complete replica equals zero; δ is the start block number and N is the amount of blocks that a site has. The procedure of the Block Mapping technique is shown in Fig. 13. First, the broker calculates the front address of a fragmented replica, Front = γ /α. If Front is an integer, it means that this fragmented replica starts exactly on the beginning of a block. On the other hand, this fragmented replica starts inside of one block. End is the ending address of a fragmented replica, End = (γ + β)/α. If End is an integer, it means that this fragmented replica finishes exactly in the end of one block. Otherwise, this fragmented replica finishes inside of one block. Thus, the broker can distinguish these four block mapping cases to calculate block start number (δ) and the amount of blocks (N ) that each site owns. For each available

542


Fig. 12. The mapping relation of the fragmented replica and block.

Fig. 14. Block selection.

Fig. 13. Block mapping procedure.

server S j , 1 ≤ j ≤ m, if G i is the set of sites that contain block Bi , 1 ≤ i ≤ n. Then, S j that contains block Bi is included in set G i , δ ≤ i ≤ δ + N − 1.

First, the broker uses the procedure of Fig. 13 to determine which case this site is. Then, the formula in Fig. 12 can be used to get the precise information of the fragmented replica of this site. Thus, the broker can decide whether the site has the desired fragmented replica according to the obtained information. Let S = {Si |i = 1, 2, . . . , k}, where S is the available servers’ set. Let G = {G j | j = 1, 2, . . . , n}, where G i is the set of sites that have the block Bi . After the block mapping

Fig. 15. The modified One by One Co-allocation for fragmented replicas.


543

Fig. 17 is the group information. The result of the One by One Co-allocation is shown in Fig. 18. We can see that 15 is the transferring time of this example. This time the One by One Co-allocation does not result in an optimal solution for parallel downloading. Fig. 19 is an optimal solution. If we assign B2 to S2 instead of S1 , the transferring time is reduced to 11. 4. Simulations Fig. 16. The local replica information.

4.1. The Grid environment

Fig. 17. The group information.

procedure, the broker adapts replica selection mechanism to select the download site S j from G i to assign block Bi , shown in Fig. 14. The replica selection and transfer strategy can be modified from Dynamic co-allocation or Abort and Retransfer, or One by One Co-Allocation method. We present the One by One Co-Allocation algorithm for fragmented replica selection. The other methods can be modified similarly. The algorithm is shown in Fig. 15. In Fig. 15, it is assumed that all blocks are needed. For downloading fragmented replicas, just skip the unwanted blocks in the outer do loop. We present an example of the algorithm. The size of the data set in question is 40. We divide the data set into 4 blocks with block size 10. There are four Grid sites, S1 –S4 , in this example. Fig. 16 is the local replica information of the desired data set collected from every LLN. The group information, G 1 –G 4 , can be obtained by using the block mapping procedure.

We construct a basic Grid environment to implement our approach. We adopt the Globus Toolkit Version 3.0.2 (GT3) as the Grid middleware to provide a basic operating mechanism. Scalable Cluster Management System (SCMSWeb) is a cluster monitoring tool that can monitor our Grid sites to obtain some resource information immediately, such as CPU utility rate, memory usage, load average, etc. Our Fragmented Replication system is implemented with MySQL and PHP (Hypertext Preprocessor). MySQL is a Relational Database Management (RDBM) system that is used to record the home directory information and local replica information of the DSRL. PHP is used to provide a web based query interface and the ability to cooperate with MySQL. The Globus Toolkit provides the basic Grid services like GridFTP. Fig. 20 is the flowchart of the query process. The user transmits the LDN to the Broker. When receiving a request, the broker starts to collect the necessary information about this LDN. It requests the home directory information from the GHN to get the LLNs that can find this LDN. Then, the broker fetches the local replica information of this LDN on every LLNs that GHN reports. The broker runs the block mapping procedure to differentiate if this LLN has the required blocks. The result will return to the user in order to do replica selection.

Fig. 18. The result of One by One Co-allocation approach.

544


Fig. 19. A better (in this case optimal) solution compared to Fig. 18.

Fig. 20. The flowchart of the query.

4.2. Performance evaluation and analysis

After block mapping, we can get the group information (G i ) of every block. The broker can do replica selection according to the group information. We evaluate three coallocation approaches: (1) Dynamic co-allocation (DCA), (2) Abort and retransfer (AR), (3) One by One Co-allocation (OOCA). Fig. 21 is the simulation environment. There are six replica sites (a1–a6) and one user (U) in this environment. The black node is the router. The numbers between nodes are network bandwidths. Every node in this environment has a complete replica or a fragmented one. The client downloads a complete data set from six replica sites in parallel. We evaluate

the achievable download throughput of the three co-allocation approaches. We evaluate the influence of different sizes of data set. The data set is 50 MB, 100 MB, 500 MB, or 1 GB respectively. The distribution of fragmented replicas is produced randomly. The result is shown in Fig. 22. We can see that, as the data set size grows, the throughput will increase since more parallel downloading is possible. But when it grows too large, there will be many blocks to transfer between sites leading to network congestion. Therefore, the throughput will decrease for the 1 GB case. As the performances between transfer methods, the OOCA approach usually has the better performance than the others. Because of its pre-scheduling, it can reduce the transferring time efficiently by preventing assigning a block


Fig. 21. The simulation environment.

545

Fig. 23. The achievable bandwidth with different numbers of blocks.

Fig. 24. Comparison of accessing time for a complete replica.

Fig. 22. The achievable throughput with different data size.

to a slower server. When transferring a huge data set, this advantage will become very obvious. The gain of the OOCA is not obvious when compared to the AR. Because not every site has a complete replica, the choice of replica sites is not always optimal in a fragmented replication system. So, its performance is similar to the AR approach. The DCA has a drawback that the faster servers must wait for slow servers to deliver the last block. So, the DCA has poor performance owing to waiting for delivery from sites a2 or a6. The other two approaches perform better in this respect. We discuss another factor, the amount of data blocks, which may affect the transfer time. The data blocks are ranged from 5 to 15 and the size of the data set is fixed to 500 MB. The result is shown in Fig. 23. It is obviously that the block number will not affect the performance very much in the AR and OOCA. On the contrary, the DCA performs much better when the data set is divided into more blocks. With more blocks, the faster servers can transfer more data blocks. We run another simulation using the topology in Fig. 21 to compare with the systems of complete replicas. For a dataset of 50, 100, 500 MB, or 1 GB, assume the following two cases. For case 1, the user requires the complete replica. The simulation result is shown in Fig. 24. We can see that the performances of both complete and fragmented replica systems are almost the same. Of course, there will be a little more processing in the fragmented system. For case 2, the user only requires a randomly generated fragmented replica (about 20%–50% of the dataset). Fig. 25 shows the simulation results. It is obvious that the fragmented system outperforms the complete one.

Fig. 25. Comparison of accessing time for a fragmented replica.

5. Conclusions In this paper, we introduce the concept of fragmented replicas and propose algorithms for their retrieval. However, we have not touched the problem of fragmented replica updates. Plausibly each Grid site can maintain a bitmap of blocks with timestamps. Each time a block of a fragmented replica is modified, the corresponding block bit is set along with a time stamp. Then the bitmap is announced to other sites with the replica. Each site then determines when and where the fragmented replica update should take place. Still, it will be interesting future research to find efficient ways for fragmented replica updates. Furthermore, we assume that the blocks in a fragmented replica are contiguous. If they were not, then the data structure to represent the fragmented replica and the algorithm for retrieval would be more complicated. It is some more research that should be done in the future. Finally, as in the example shown in Section 3, the algorithms proposed do not always find an optimal solution. Another interesting future research is to find whether a worst case performance bound exists for the algorithms.

546


Acknowledgements We thank the anonymous reviewers for their comments that greatly improved the quality of this paper. This research is supported in part by NSC under contract number 94-2213E-259-013 and 94-2213-E-259-014. The authors would also like to acknowledge the National Center for High-Performance Computing in providing resources under the Unigrid project. References [1] Ian Foster, Carl Kesselman, The Grid: Blueprint for a New Computing Infrastructure, 2nd edition, Morgan Kaufmann, San Francisco, 2004. [2] Ian Foster, Carl Kesselman, Steven Tuecke, The anatomy of the grid: Enabling scalable virtual organizations, International Journal of Supercomputer Applications 15 (3) (2001) 200–222. [3] Grid Caf´e. http://www.twgrid.org/gridcafe. [4] The Globus Alliance. http://www.globus.org. [5] Ian Foster, Carl Kesselman, Jeffrey Nick, Steven Tuecke, The physiology of the grid: An open grid services architecture for distributed systems integration, in: Open Grid Service Infrastructure Working Group, Global Grid Forum, June 22, 2002. [6] Ian Foster, Carl Kesselman, The Globus project: A status report, in: Proceedings of IPPS/SPDP ’98 Heterogeneous Computing Workshop, Orlando, FL, USA, 1998, pp. 4–18. [7] The Globus toolkit version 4. http://www-unix.globus.org/toolkit/docs/4. 0/. [8] Luis Ferreira, Viktors Berstis, Jonathan Armstrong, Mike Kendzierski, Andreas Neukoetter, Masanobu Takagi, Richard Bing-Wo, Adeeb Amir, Ryo Murakawa, Olegario Hernandez, James Magowan, Norbert Bieberstein, Introduction to Grid Computing with Globus, 2nd edition, IBM International Technical Support Organization, 2003. [9] CERN. http://public.web.cern.ch/public/. [10] The DataGrid project. http://eu-datagrid.web.cern.ch/eu-datagrid/. [11] Adriana Iamnitchi, Domenico Talia (Eds.), P2P Computing and Interaction With Grids, Future Generation Computer Systems 21 (3) (2005), (special issue). [12] Heinz Stockinger, Asad Samar, Bill Allcock, Ian Foster, Koen Holtman, Brain Tierney, File and object replication in data Grids, Journal of Cluster Computing 5 (3) (2002). [13] Sudharshan Vazhkudai, Steven Tuecke, Ian Foster, Replica selection in the Globus data Grid, in: Proceedings of IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGRID 2001, Brisbane, Australia, May 2001, pp. 106–113. [14] Sudharshan Vazhkudai, Enabling the Co-allocation of Grid data transfers, in: Proceedings of International Workshop on Grid Computing, Phoenix, Arizona, USA, 17 November 2003, pp. 44–51. [15] Dongsheng Li, Nong Xiao, Xicheng Lu, Yijie Wang, Kai Lu, Dynamic self-adaptive replica location method in data Grids, in: Proceedings

[16]

[17]

[18] [19]

[20]

[21]

[22]

[23]

of IEEE International Conference on Cluster Computing, Hong Kong, China, December 2003, pp. 442–445. Matei Ripeanu, Ian Foster, A decentralized, adaptive, replica location service, in: Proceedings of 11th IEEE International Symposium on High Performance Distributed Computing, HPDC-11, Edinburgh, Scotland, July 2002, pp. 24–32. Salvatore T. March, Sangkyu Rho, Allocating data and operations to nodes in distributed database design, IEEE Transactions on Knowledge and Data Engineering 7 (2) (1995) 305–317. Storage Resource Broker, Version 3.0, SDSC. http://www.sdsc.edu/srb. Sudharshan Vazhkudai, Jennifer M. Schopf, Predicting sporadic Grid data transfers, in: Proceedings of IEEE International Symposium on High Performance Distributed Computing, HPDC-11 2002, Edinburgh, Scotland, July 2002, pp. 188–196. Sudharshan Vazhkudai, Jennifer M. Schopf, Ian Foster, Predicting the performance of wide area data transfers, in: Proceedings of International Parallel and Distributed Processing Symposium, IPDPS 2002, Marriott Marina, Fort Lauderdale, FL, USA, April 2002, pp. 34–43. Ruay-Shiung Chang, Chih-Min Wang, Po-Hung Chen, Replica selection on co-allocation data Grids, in: Proceedings of Second International Symposium on Parallel and Distributed Processing and Applications, ISPA’2004, Hong Kong, China, December 2004, pp. 584–593. Kavitha Ranganathan, Ian Foster, Identifying dynamic replication strategies for high performance data Grids, in: Proceedings of the Second International Workshop on Grid Computing, Denver, CO, USA, November 2001, pp. 75–86. Ruay-Shiung Chang, Richard C.T. Lee, On a scheduling problem where a job can be executed only by a limited number of processors, Computer & Operations Research 15 (5) (1988) 471–478.

Ruay-Shiung Chang received his B.S.E.E. degree from National Taiwan University in 1980 and his Ph.D. degree in Computer Science from National Tsing Hua University in 1988. He is now Dean of Academic of Affairs and professor in the Department of Computer Science and Information Engineering, National Dong Hwa University. His research interests include Internet, wireless networks, and grid computing. Dr. Chang is a member of ACM and IEICE, a senior member of IEEE, and founding member of ROC Institute of Information and Computing Machinery. Dr. Chang also serves on the advisory council for the Public Interest Registry (http://www.pir.org). Po-Hung Chen received his MS degree from the Department of Computer Science and Information Engineering, National Dong Hwa University, Taiwan in 2005. He is an engineer in the Science Park, Hsinchu, Taiwan. His research interests include wireless networks and grid computing.

Complete and fragmented replica selection and retrieval in Data Grids

Complete and fragmented replica selection and retrieval in Data Grids

Suggest Documents

Complete and fragmented replica selection and retrieval in Data Grids

Response Time Optimization for Replica Selection Service in Data Grids

New Replica Selection Technique for Binding Replica Sites in Data

Replica Selection in the Globus Data Grid

A Comparative Study of Replica Placement Strategies in Data Grids

phyloSkeleton: taxon selection, data retrieval and ...

Data Placement and Replica Selection for Improving Co ... - CiteSeerX

Replica Traffic Manager for Data Grids - Computer Science

RRS: Replica Registration Service for Data Grids - SDM - Lawrence ...

new replica selection technique for binding cheapest replica sites in ...

Replica Selection in Grid Environment: A Data-mining ... - CiteSeerX

Replica Selection in Grid Environment: A Data-mining ... - CiteSeerX

new replica selection technique for binding cheapest replica sites in ...

Interaction in information retrieval: Selection and ... - CiteSeerX

Replica selection in Apache Cassandra - DiVA portal

Replica selection in Apache Cassandra - DiVA portal

Grid Data and Replica Management System

Evaluation of Replica Placement and Retrieval ... - (JP) Martin-Flatin

Data Replication and Power Consumption in Data Grids - CiteSeerX

Models for Replica Synchronisation and Consistency in a Data Grid

Data and Sample Selection

AutoReplica: Automatic Data Replica Manager in ... - NUCSRL!

Dissociable Controlled Retrieval and Generalized Selection

Cognitive inhibition in selection and sequential retrieval - Springer Link