Dynamic Data Replication Scheme in the Cloud Computing Environment

4 downloads 13078 Views 993KB Size Report
Abstract— In the cloud computing environment, data replication strategy (DRS) is used to improve data access. Related studies have proposed data replication ...
2012 IEEE Second Symposium on Network Cloud Computing and Applications

Dynamic Data Replication Scheme in the Cloud Computing Environment Myunghoon Jeon, Kwang-Ho Lim, Hyun Ahn, Byoung-Dai Lee* Department of Computer Science Kyonggi University Suwon, Korea {mhjeon, khlim, hyunahn, blee}@kyonggi.ac.kr various DRSs that have been proposed in previous studies are inadequate in that they ensure optimal performance only for a well-defined data access pattern [5-11]. In order to compensate for this limitation, we propose an algorithm that dynamically selects an optimal DRS and flexibly responds to different data access patterns observed in the cloud computing environment. The algorithm analyzes data access patterns of the past, perceives changes in a user’s pattern, and identifies an optimal strategy to maintain the quality of the cloud computing service.

Abstract— In the cloud computing environment, data replication strategy (DRS) is used to improve data access. Related studies have proposed data replication strategies. The performances of these strategies are closely related to the users’ access patterns, and work optimally for a particular data access pattern. However, as the data access patterns become more flexible and unpredictable, it is difficult to manage them with traditional replication strategies. Given this circumstance, this paper proposes an algorithm that detects changes in a user’s data access pattern and dynamically applies an optimal replication strategy. The proposed algorithm has the advantage of maintaining an optimal performance by responding to various data access patterns. We tested the proposed algorithm and validated its effectiveness.

Section II reviews related works and Section III describes the proposed algorithm. Section IV demonstrates an experiment that evaluates the performance and validity of the algorithm, and describes the results. Lastly, Section V provides a conclusion and a prospect for follow-up research.

Keywords—component; Clouds, Data Access Pattern, Dynamic Switching, Replication

I.

II. RELATED WORK In the cloud computing environment, data resources are geographically scattered, and network delay is a major obstacle that blocks quick data access. For this reason, studies have been undertaken to replicate data in several data storages that are physically distributed in order to reduce the amount of long-distance data transmission over the network. Data replication strategies can be categorized by different types, units, and criteria of replication [3]. In terms of replication type, in particular, there are two types: static and dynamic. The former is hardly fit for a large-scale cloud data service because it passively carries out and manages data replication. It is incapable of responding to various network statuses and changes in the data access pattern. For this reason, studies have been actively conducted on various strategies of dynamic data replication [8][9][11-18].

INTRODUCTION

Cloud computing involves technology that virtually integrates and supplies computing resources located in remote places [1]. The physical location of requested data affects the data access method, the transmission rate, and the service quality. When the location of stored data is near, it can be quickly retrieved and processed. However, in the clouding computing environment, it is hard to secure sufficient space to store a vast amount of service data. Thus, efficient data storage at local sites is essential for a good performance of cloud computing service, and currently this is an area of active research. In particular, data replication strategy (DRS) aims to improve data access by replicating data from remote places to local sites. Many related studies are underway, and various strategies have been proposed, including LRU (Least Recently Used), LFU (Least Frequently Used), and LDRS (Localitybased Data Replication Strategy) [5][11].

K. Ranganathan et al. [5] have suggested replication strategies to reduce network bandwidth and access delay. They also compared the performance of data access patterns categorized by time and spatial locality. By considering network capacity and a file access pattern, H. Sato et al. [7] proposed a file replication algorithm that improved simple replication methods. Similarly, R.S. Chang et al. [9] proposed the Latest Access Largest Weight (LALW) method, which uses data access history to determine a dynamic replication strategy by applying a greater weight to a more recent access. W. Zhao et al. [16] proposed a dynamic optimal replication strategy (DORS) and it evaluates a file’s value based on the file’s

Data replication strategy identifies frequently used data and replicates them in local sites for better accessibility. In this process, frequently requested data should be stored first, based on the users’ data access patterns. Data access patterns show how a user accesses data to perform jobs and there are two types of data access patterns: sequential and random. The former indicates that a user accesses a particular data set frequently and repeatedly, and the latter means that the type of requested data is unpredictable and the access is irregular. The

* Corresponding author

978-0-7695-4943-9/12 $26.00 © 2012 IEEE DOI 10.1109/NCCA.2012.10

40

Figure 1. System architecture.

access history, the file size, and the network condition in order to determine target files to replicate and replace.

replication scheme that dynamically selects an optimal data replication strategy to maintain optimal data service.

In terms of cost-saving, [6][10] proposed a strategy for optimal data replication. M. Carman et al. [6] proposed a service that applies the market economy model in order to optimize replication cost and data access on a data grid. H. Lamehamedi et al. [10] suggested a data replication strategy based on a cost-estimate model that considers both the cost of data access and the performance of replication.

III.

DYNAMIC DATA REPLICATION SCHEME

A. System Architecture Fig. 1 illustrates the system architecture and operating mode of the proposed Dynamic Data Replication Scheme. Broadly speaking, the system consists of a Job Broker (JB) and multiple sites. A JB receives a user’s job request and dispatches the job to a site with an optimal transmission rate. The job is completed at these sites, and each site consists of Storage, Work Nodes, and Replication Manager (RM). Storage is a component that stores data that is required to carry out the job, and Work Nodes actually perform the job. RM is a software component that determines how to perform the requested job, manages data access history, and controls inter-site data flows. Depending on whether the required data is in Storage, RM decides whether to perform the job locally or remotely. If the data is already in Storage of the local site, RM instantly orders Work Nodes to perform the job. If not, RM communicates with other RMs, searches for the required data, and carries out data acquisition. RM consists of a Pattern Recognizer (PR) and a Replication Optimizer (RO). A PR recognizes changes in a data access pattern, and RO optimizes the performance of the system by dynamically choosing the algorithm which showed the best performance under the current data access pattern from a set of replication algorithms employed by the system.

In order to ensure efficient resources management in an unpredictable data grid environment, B.D. Lee et al. [4] developed a model that dynamically selects a resource management strategy that responds to a particular workload type based on the performance history. This model demonstrated that a real-time workload type is an important factor in resources management and the selection of a data replication strategy. Similarly, A.R. Abdurrab et al. [11] proposed a File Reunion based on Data Replication Strategy for Data Grids (FIRE), which refers to the file access history of a nearby data storage in order to determine and optimize data replication, which is used to supply high quality service in the cloud computing environment. FIRE divided data access patterns into a sequential and a random pattern, and tested the performance of the LDRS and the LRU algorithm for both patterns. The LDRS performed better than the LRU for a sequential pattern with spatial locality. However, the outcome was reversed for a random data access pattern, and the LRU performed relatively better.

B. Dynamic Switching Algorithm In what follows, we describe the Dynamic Switching Algorithm that perceives changes in a user’s data access pattern and determines an optimal data replication strategy. The following lists two main considerations to ensure proper operation of the Dynamic Switching Algorithm:

These studies demonstrated that the traditional replication strategies cannot ensure optimal performance of cloud computing service for all data access patterns and that these strategies are also unable to accommodate changing data access patterns. To overcome this limitation, we propose a data

41

Figure 3. Impact of change in user’s data access pattern on performance of data replication strategies.

Figure 2. Correlation between data access pattern and job performance.

replication strategy. Suppose a change of performance is detected, as is shown in A of Fig. 3. In this case, if a replication strategy is immediately switched, it may be problematic since the change is not due to a change in a data access pattern, but merely a temporary change in service performance. Fig. 3 illustrates how the two different data replication strategies {DR1, DR2} affect job performance with a lapse of time. Depending on a user’ s data access pattern, DR1 and DR2 show markedly different performances. As a result, when performance of DRS deteriorates, it can be assumed that there has been a change in a data access pattern, calling for a change of DRS. For example, performance of DR2 fell in Section B,

(1) “How to detect a change in a data access pattern?” (2) “When to change a data replication strategy?” To answer the first question, we suggest a periodical measurement of service performance. Fig. 2 depicts a correlation between a data access pattern and the performance of data replication strategies obtained from [11]. As it shows, the change in the pattern leads to proportional changes in performance of both the LDRS and the LRU strategy. According to [11], LDRS outperforms LRU in a sequential access pattern, whereas in a random access pattern LRU shows better performance. That is, changes in a data access pattern and changes in service performance are closely interrelated, and based on this, we attempt to predict changes in a user’s data access pattern. ENU (Effective Network Usage) [2] and TAT (Total Access Time) are used as indicators of service performance. ENU measures efficiency of the network usage as follows:

Nremote file accesses represents how many times Work Nodes accessed remote files. Nfile replications represents the number of replications, and Nlocal file accesses means the number of accesses to local file by Work Nodes. Therefore, a lower ENU value means a more efficient use of the network. TAT measures the average time taken to perform a requested job and it is calculated as follows:

Tlocal file accesses, Tremote file accesses, and Tfile replications represent sums of times taken to access local files, remote files and to replicate designated files, respectively. Ntotal file accesses denotes the number of file accesses. Therefore, a higher TAT value means a longer processing time. The second consideration is equally important for the algorithm to decide an exact timing of changing a data

Figure 4. Pseudo-code of switching algorithm.

42

at 100 means that ENU and TAT will be measured for 100 times of data access. ORS is a parameter to observe changes in network efficiency and service performance of PMW in carrying out a requested job. The ORS value of three indicates that changes of ENU and TAT will be observed for the next three PMWs in sequence. The observed changes, depending on their types, become a basis for deciding the timing of switching a data replication strategy. Changes in network efficiency and service performance of PMW within ORS range can be classified into four types (Fig. 5). In this figure, the symbol (+) and (-) respectively represent an increase or a decrease in the value of ENU and TAT. According to the aforementioned formula, a higher value indicates a lower network efficiency and a longer processing time. Thus, a measured value in Region C indicates improvement in both network efficiency and processing time. In comparison, Region B suggests deterioration of both ENU and TAT. Region A indicates a decline in ENU and an improvement in TAT, and Region D, the opposite. Thus, when measured values fall under Region A or D, it is unclear whether the service performance has actually deteriorated as either TAT or ENU could be improving.

Figure 5. Categorization of changes in ENU and TET values.

and it is regarded that a data access pattern has changed. So the strategy is switched to DR1 for the subsequent jobs to maintain a good service quality. However, in Section A, declining performance of DR2 is only temporary. If the algorithm perceives this as a change in a data access pattern and changes a data replication strategy accordingly, then it might further deteriorate the performance. For this reason, it is necessary to have criteria that can identify a cause of temporary changes in performance. Fig. 4 shows a pseudo-code of the proposed algorithm.

When the changes of ENU and TAT values within ORS range fall under Region B, it indicates declining network efficiency and processing speed in performing a requested job, and suggests a possibility of a change in the user’s access pattern. This might call for a change of data replication strategy, but there is also a probability that the erratic performance could be only temporary. Thus, it is necessary to monitor further changes in service performance. To do so, we apply EORS to decide whether the change in service performance is temporary. When the change of network efficiency and service

As shown in Fig. 4, the main parameters of the proposed algorithm include PMW (Performance Measurement Window), ORS (Observation Range for Switching), and EORS (Extra Observation Range for Switching). PMW determines a range of measurement for ENU and TAT. For example, setting PMW

Figure 7. Performance evaluation by data access patterns.

43

performance falls under Region B, performance of future jobs is observed for the same ORS range.

access pattern. Still, there is a probability that the change might be temporary, and, for this reason, EORS is applied to further observe changes in the performance of future jobs. In Fig. 6, EORS1 is triggered by ORS3 as its changes fall under Region B, and performance changes are observed for a range that is two times larger than ORS3.

That is, by applying EORS, changes of performance observed within the ORS range can be compared with those for an equivalent range in future jobs to determine whether the change is temporary. Fig. 6 illustrates the overall mechanism of the Dynamic Switching Algorithm that measures service performance, perceives changes in a data access pattern, and applies a suitable DRS accordingly. In the figure, PMW is set at 1,000; PMW is generated sequentially for each set of 1,000 user’ data access. For each PMW, TAT and ENU are measured 1,000 times and shown in sum values. In the case of PMW1, sum values of ENU and TAT are 55 and 50 respectively for a hundred cases of data access; as for PMW2, the same values are 55 and 55 for data access numbered between 1,001 and 2,000. ORS is set to three. This means that network efficiency and processing time among the three PMWs will be observed and compared to detect changes in a data access pattern. As shown in Fig. 5, the observed changes are classified to decide whether to apply EORS. For example, ORS1 measures network efficiency and processing time of {PMW1, PMW2, PMW3}. The results show a slower processing time as TAT scored +5, and a better network efficiency as ENU scored -5. This is categorized as Region D, and it is unclear whether the service performance has actually deteriorated. Then, ORS is switched to ORS2, and changes are once again observed for {PMW2, PMW3, PMW4}.

The measurement results show that changes among PMWs in EORS1 range fall under Region A. The Dynamic Switching Algorithm perceives a temporary change in the user’s data access pattern. Accordingly, it determines that maintaining a current DRS is proper and continues to do so. After measurement for EORS1 is complete, ORS3 is switched to ORS4 to observe changes in performance. In the case of ORS6, changes of performance fall under Region B, triggering EORS2. The changes of performance measured in EORS2 range also fall under Region B. Consequently, the Dynamic Switching Algorithm judges that the user’s data access pattern has changed, and switches data replication strategy. It switches from DRS A, which has been applied from EORS2 to DRS B, and observes changes in performance for ORS7. IV.

PERFORMANCE E VALUATION

A. Experiment Setup The simulator for the experiment was implemented in Java, and the variables of the network environment were excluded in order to maintain compactness and consistency of the experiment’s results. Table I shows parameters that we used for the experiment. The LRU and the LDRS were applied as replication strategies employed in the proposed system, and the switching algorithm is established in the RM.

This time, changes among PMWs in ORS2 range fall under Region A, and, as a result, ORS2 is switched to ORS3. In the case of ORS3, changes in performance among PMWs fall under Region B, which clearly indicates a decline in performance and a possibility of a change in the user’s data

Figure 6. Mechanism of switching algorithm.

44

Figure 8. Performance evaluation by different ORS values.

The experiment tested whether the RM can analyze deterioration in performance due to changes of workload, and switch DRS at a proper timing. In this process, an assumption was made for the workload to have a sequential, random data access pattern, or a compound data access pattern that accompanies changes in the types of workload. The last follows an order of sequential-random-sequential. Each workload contains 90,000 data accesses. TABLE I.

experiment tested performance of the DRS under sequential, random and compound data access patterns respectively. The performance of ENU and TAT for a sequential data access pattern is shown in Fig. 7(a). In this pattern, the proposed algorithm and the LDRS both showed an excellent performance. The LDRS performed better than the LRU, and accordingly, the LDRS was applied to the proposed algorithm to yield equally optimal results. The performance of ENU and TAT for a random data access pattern is shown in Fig. 7(b). In this case, the LRU showed the best performance; the performance of the proposed algorithm fell between the level of the LRU and the LDRS. This is because the LDRS is a default data replication strategy for a dynamic selection algorithm. If the LRU is used instead of the LDRS, then the performance of the proposed algorithm would be equal to that of the LRU (Fig. 7(c)). In other words, a default DRS decides the performance level of an overall algorithm. When a change occurs in a data access pattern, it should be detected and a decision should be made whether to switch a data replication strategy. In this process, overhead may occur due to time delay. However, notwithstanding the time delay, satisfying performance of the proposed algorithm could be proved (Fig. 7(d)). The figure shows performance of ENU and TAT for a compound data access pattern (sequential-random-sequential). The results show that, despite overhead from time taken to detect a change in a data access pattern, a dynamic selection algorithm performed best. This can be attributed to the fact that the algorithm selects optimal DRS and thus maintains an optimal performance.

PARAMETERS USED FOR THE EXPERIMENT

Parameter

Value

Number of servers

3

Number of files

26

Single file size (MB)

700

File storage size (GB)

10

PMW (Access)

100

ORS (PMW)

10, 30, 50 ,70

B. Experiment Result The objective of the experiment is to validate the algorithm that dynamically selects a data replication strategy for a given particular data access pattern. We conducted a comparative analysis between the traditional DRS and the proposed dynamic DRS. The traditional DRS relies on the LRU and the LDRS that are suitable for particular data access patterns. The

45

Figure 9. Changes in data replication strategy by different ORS values.

In the second experiment, performances according to different ORS values were analyzed to examine how an optimal ORS affects job performance. Fig. 8 demonstrates how ORS affects a dynamic selection algorithm. Identifying an optimal ORS is essential to maximize the performance of a dynamic selection algorithm. Fig. 8(a) shows performance of the algorithm corresponding to various ORS values for a sequential data access pattern. When ORS is 10, the algorithm sensitively reacts to even a temporary change in performance, regards it as a change in a data access pattern, and may change DRS accordingly. This results in poor performance. In comparison, when the ORS value exceeds 30, the algorithm correctly perceived a temporary change in performance and took time to determine whether the change was temporary. This helped to avoid performance deterioration due to an unnecessary change of the DRS. Fig. 8(b) shows performance of the algorithm for a random data access pattern, and yielded the best result when the ORS value was 10, a highly sensitive level. In other words, the algorithm with a smaller ORS value swiftly responds to changes in performance, and switches to a more suitable DRS, under a random and unpredictable circumstance. Lastly, Fig. 8(c) shows performance of the algorithm for various ORS values for a compound data access pattern. In this case, the optimal performance was achieved at an ORS value of 30. The results of the three experiments indicated that an ORS value of 30 led to the best performance of the algorithm.

factors that affect data service. As the data service in the cloud computing environment expands rapidly, a vast amount of data are stored in multiple sites. Also, service users are geographically dispersed, and there is a growing demand for new DRSs that can work well for the cloud data service. The traditional DRS heavily relies on particular data access patterns and is incapable of effectively managing new data access patterns in the cloud computing environment. To address this issue, we analyzed data access patterns and proposed an algorithm that selects DRS dynamically. We conducted an experiment with a simulator to test the validity and effectiveness of the algorithm. The study applied two data replication strategies for three types of data access patterns, and many improvements are needed to yield more reliable, practical outcomes. More data replication strategies and a wider variety of data access patterns should be examined in follow-up research, and efforts will be made to develop an algorithm that dynamically identifies and applies an optimal ORS value. ACKNOWLEDGMENT This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology(2012-0003228).

Based on the experiment’s results, we applied an ORS value of 30 in testing performance of the dynamic DRS algorithm. Fig. 9 shows the timing of switching the DRS according to different ORS values for a compound data access pattern. At an ORS value of 10, the algorithm sensitively responded to even a temporary change in performance, and accordingly, the DRS was switched frequently. In comparison, when the ORS value is set at 70, the algorithm responded to changes in performance less sensitively, thus changing the DRS less frequently. At an ORS value of 30, the algorithm properly responded to changes in a data access pattern and changed the DRS more effectively than in other cases.

REFERENCES [1]

[2] [3]

[4]

[5]

V. CONCLUSION The purpose of a data replication strategy is to enhance data service for the users by deciding the optimal timing, location, and method of data replication. Thus, it is essential to consider

[6]

46

M. Armbrust, et al., "Above the Clouds: A Berkeley View of Cloud Computing", Technical Report No. UCB/EECS-2009-28, University of California at Berkley, 2009. D.G. Cameron, et al. "UK Grid Simulation with OptorSim", UK eScience All Hands Meeting, 2003. S. Venugopal, R. Buyya, R. Kotagiri, "A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing," ACM Computing Surveys, Vol. 38, pp. 1-53, 2006. B.D. Lee, J.B. Weissman, Y.K. Nam, "Adaptive Middleware Supporting Scalable Performance for High-End Network Service," Network and Computer Applications, Vol. 32, pp. 510-524, 2009. K. Ranganathan, I. Foster, "Design and Evaluation of Dynamic Replication Strategies for a High-Performance Data Grid," International Conference on Computing in High Energy and Nuclear Physics, 2001. M. Carman, K. Stockinger, "Toward an Economy-based Optimization of File Access and Replication on a Data Grid," International Symposium on Cluster Computing and the Grid, pp. 340-345, 2002.

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17] [18]

H. Sato, et al., "Access-Pattern and Bandwidth Aware File Replication Algorithm in a Grid Environment," International Conference on Grid Computing, pp. 250-257, 2008. K. Sashi, A.S. Thanamani, “Dynamic Replication in a Data Grid Using a Modified BHR Region Based Algorithm”, Future Generation Computer Systems, Vol. 27, No. 2, pp. 202-210, 2011. R.S. Chang, H.P. Chang, "A Dynamic Data Replication Strategy Using Access-Weights in Data Grids," Supercomputing, Vol. 45, No. 3, pp. 277-295, 2008. H. Lamehamedi, et al., "Data Replication Strategies in Grid Environments", International Conference on Algorithms and Architectures for Parallel Processing, pp. 378-383, 2002. A.R. Abdurrab, T. Xie, "FIRE: A File Reunion Based Data Replication Strategy for Data Grids," International Conference on Clustering Computing and the Grid, pp. 215-223, 2010. T. Amjad, M. Sher, A. Daud, "A Survey of Dynamic Replication Strategies for Improving Data Availability in Data Grids", Future Generation Computer Systems, Vol. 28, No. 2, pp. 337-349, 2012. Y. Yuan, et al., "Dynamic Data Replication based on Local Optimization Principle in Data Grid", International Conference on Grid and Cooperative Computing, pp. 815-822, 2007. F. Jolfaei, A.T. Haghighat, “Improvement of Job Scheduling and Tow Level Data Replication Strategies in Data Grid”, Mobile Network Communications & Telematics, Vol. 2, No. 3, 2012. L.M. Khanli, A. Isazadeh, T.N. Shishavanc, “PHFS: A Dynamic Replication Method, to Decrease Access Latency in The Multi-tier Data Grid”, Future Generation Computer Systems, Vol. 27, No. 3, pp. 233244, 2011. W. Zhao, et al., “A Dynamic Optimal Replication Strategy in Data Grid Environment”, International Conference on Internet Technology and Applications, pp. 1-4, 2010. M. Bsoul, et al., “A Threshold-based Dynamic Data Replication Strategy”, Supercomputing, Vol. 60, No. 3, pp. 301-310, 2010. N. Saadat, A.M. Rahmani, “PDDRA: A New Pre-fetching Based Dynamic Data Replication Algorithm in Data Grids”, Future Generation Computer Systems, Vol. 28, No. 4, pp. 666-681, 2012.

47