Performance Evaluation of a Heterogeneous Disk Array Architecture

10 downloads 0 Views 140KB Size Report
kevin.han@hitachigst.com. New Jersey Institute of Technology. Hitachi. Abstract. A Heterogeneous Disk Array (HDA) architecture is proposed that allows device ...
Performance Evaluation of a Heterogeneous Disk Array Architecture Alexander Thomasian Bogdan A. Branzoi [email protected] [email protected] New Jersey Institute of Technology Abstract A Heterogeneous Disk Array (HDA) architecture is proposed that allows device heterogeneity, as well as RAID level heterogeneity. Various disks of different types may be incorporated into a single HDA, and multiple RAID levels may coexist in the same array while sharing disk space. The goal of this architecture is to utilize the resources of all its disks to the maximum possible extent by using appropriate RAID levels to meet the varying availability and performance requirements for different applications. An improved best-fit allocation algorithm is proposed for the HDA to meet this goal.

Chunqi Han [email protected] Hitachi

2. HDA Architecture The heterogeneity of the proposed architecture originates from allowing various disks of different capacity, bandwidth and make, as well as various different RAID levels, to coexist on a disk array. Desirable properties of the HDA architecture include meeting diverse application requirements for performance and reliability, optimizing disk access bandwidth and capacity utilization, and automated adaptive reconfiguration by data migration for system tuning, due to hot spots or to accommodate disk additions and removals.

2.1. HDA Structure and Design R e a d a n d U p d a te R e q u e s t (L o g ic a l b lo c k a d d re s s )

A ss ig n e d L o g ic a l A d d re ss

S p litte r (B re a k la rg e re q u e s t to sm a lle r eq u al size req u ests)

D is tr ib u to r (S e le c t a p p ro p ria te h a rd d riv e th a t sto re s d a ta )

S y stem D ir e c to r y (lo g ic a l-p h y s ic a l a d d re s s m a p p in g , d a ta -p a rity b lo c k m ap p in g )

S y ste m T uner (O p tio n a l) (M o v e b lo c k s to a c h ie v e o p tim a l b a la n c e d u tiliz a tio n ) Data

P erform a n ce M o n ito r (m a x th ro u g h p u t, c a p a c ity , th ro u g h p u t u s e d , c a p a c ity u s e d )

Data

A r r a y C o n tr o lle r S ch em e S electo r ( C h o s e R A ID le v e l, p a rity g ro u p size )

Data

Disk arrays are usually shared by multiple computer applications, for which each application tries to access a multitude of datasets. Each dataset tends to have a number of different requirements, such as capacity, throughput, and reliability. It is preferable to store data according to the Redundant Arrays of Independent Disks (RAID) paradigm [1] since it provides protection against disk failures and load balancing. RAID organizations meet various requirements, but there is not one RAID level that can meet all of them. Thus, the dilemma of selecting an appropriate RAID level exists [2]. HDA architecture is proposed as an example of a self-managed storage system with a very valuable characteristic, heterogeneity. In this paper, we describe the HDA architecture and propose an improved best-fit allocation algorithm to optimize the data allocation process. We touch upon system tuning and scalability issues, compare various allocation algorithms, and conduct a performance study on an HDA system simulation. Finally we discuss related work, followed by future work, and finish with the conclusion of our study.

A llo c a tio n R e q u e st (s iz e , e x p e c te d a c c e s s ra te , re a d /w rite ra tio , d e sire d a v a ila b ility )

Data

1. Introduction

D a ta C h a n n e l(s)

H ard D riv e

H ard D r iv e

H a rd D r iv e

H ard D r iv e

C o n tr o l C h a n n e l(s )

Figure 1. Architecture of the Heterogeneous Disk Array (HDA) system. The HDA architecture is depicted in Figure 1. The scheme selector uses file attributes to select a suitable RAID level for the incoming allocation request through the help of a reliability model. The data is striped into equal size SUs (stripe units) by the

Proceedings of the 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’05) 1526-7539/05 $20.00 © 2005 IEEE

splitter. Parity SUs are added as required. The distributor takes a batch of equal-size blocks from the splitter and assigns them to disks in a way such that the bandwidth and capacity utilizations of all disks are roughly equal and that any required constraints are met. This is treated as a vector-scheduling problem. The system directory stores and retrieves the logicalto-physical address mapping information and the data parity relations between blocks, keeps track of the hotness of blocks for performance tuning purposes, and manages the free space. The system tuner carries out data reallocation based on access statistics. Load balancing is performed according to a greedy algorithm called disk cooling [3]. The addressable entities in the HDA system are depicted in Figure 2.

∑ x j ≤ X i and ∑ c j ≤ Ci , 1 ≤ i ≤ n j∈Ji

(1)

j∈Ji

Given the utilization of disk throughput and capacity, x c (2) U = ∑ x X and U = ∑ c C i

j∈Ji

j

i

i

j∈J i

j

i

one possible objective function F to minimize is: x c F = Var U i + α Var U i

{ }

{ }

(3)

in which Var{xi} is the variance over a set of numbers {x i 1 ≤ i ≤ n }. To put more emphasis on throughput

rather than capacity, α is chosen between 0 and 1 because balanced throughputs are more important than balanced disk capacities. Another reason is that mean disk response time is proportional to (1 − ρ )−1 since the disk throughput is directly proportional to the disk utilization, ρ , a product of the arrival rate of requests and the mean service time per request. HDA uses a best-fit allocation algorithm (BESTFIT) to find the index of the disk that minimizes the objective function F.

2.3. System Tuning, Upgrade, and Scalability

Figure 2. Entities in the HDA system and their relationship.

2.2. Balancing Throughput and Capacity Both disks and allocation requests are modeled as two-dimensional vectors, in which the first dimension is maximum throughput (access rate respectively) and the second dimension is disk capacity (file size respectively). The problem of balancing throughput and capacity utilization may be defined as follows: th Problem Definition: The i disk of a set of n disks is represented by di = (Xi,Ci), where Xi denotes the maximum throughput and Ci denotes the capacity of th the i disk. Given a set J of allocation requests, each allocation request is represented by a two-dimensional vector pj = (xj,cj), where xj is its anticipated access rate and cj is the size of data. Problem Solution: A partition of the set J into n subsets J1,…, Jn such that the sum of both dimensions of subsets does not exceed the corresponding limit set by the dimensions Xi and Ci:

The initial allocation is based on estimated access rates and R:W ratios, which tend to be inaccurate. The system records the heat or access rates to RBs. The system tracks the heat of blocks and the system tuner relocates the hottest blocks from hot drives to cold drives [3]. The problem of data migration occurs when new disks are added or old disks are removed. The vector repositioning problem is stated as follows: Given: A disk array specified by vectors di = (Xi, Ci), 1 ≤ i ≤ n as before; the physical position and heat index of RBs at each disk drive; the characteristics of the new drives dj = (Xj, Cj), (n+1) ≤ j ≤ (n+m). Result: A set of RBs, R = {r1, r2,…}, with each ri assigned to one of the new disks dj, (n+1) ≤ j ≤ (n + m), so that a target function F is minimized, while also minimizing the number of RBs being moved.

2.4. Specifications of the HDA Simulation A six-disk HDA is considered in this study: 2 IBM 18ES, 1 Atlas10k, 2 Barracuda, 1 Cheetah4LP, in that order [4]. The arrival process to the HDA system is Poisson for both allocation and access requests (i.e. read/update), and the request size follows the exponential distribution. The mean request size is 100 sectors or 50 KB, with a cutoff threshold of 4096 sectors and a minimum of one sector. The access rate is exponentially distributed, with a mean access rate of -7 8 x 10 accesses/second and a maximum access rate of 10 accesses/second. Two RAID levels, RAID1 and

Proceedings of the 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’05) 1526-7539/05 $20.00 © 2005 IEEE

RAID5, coexist in the HDA. The building blocks from the Disk Array Simulator DASIM [5] were used in this paper. Multiple runs of simulations are executed with various read/write ratios for data blocks, but only the results for the read/write ratio of 3/1 are reported here.

that the bandwidth and capacity utilizations of all the disks are very close throughout the entire simulation run and indicate a great workload-balancing act. The system’s resources seem to be fully exploited and both capacity and bandwidth utilized in a balanced way. Utilization of Bandwidth vs Time

2.5. Studies of Data Allocation Methods

Placement Strategy

Alloc Req

Ux(0) Uc(0)

Ux(1) Uc(1)

Ux(2) Uc(2)

Ux(3) Uc(3)

Ux(4) Uc(4)

Ux(5) Uc(5)

σx σc

Best Fit

332953

0.999 0.881

0.988 0.881

0.969 0.881

0.968 0.887

0.986 0.896

0.980 0.888

0.011 0.005

RoundRobin

260130

0.979 0.688

0.999 0.688

0.724 0.689

0.414 0.694

0.407 0.696

0.650 0.691

0.259 0.003

Random

256639

0.999 0.679

0.950 0.679

0.734 0.680

0.431 0.686

0.433 0.686

0.641 0.682

0.245 0.003

Prop. to Max xput

262853

0.986 0.694

0.999 0.694

0.758 0.696

0.436 0.696

0.425 0.705

0.650 0.695

0.253 0.004

Prop. 260319 to Capacity

0.954 0.687

0.999 0.685

0.743 0.681

0.393 0.681

0.402 0.692

0.624 0.690

0.260 0.004

The simulation stops when either bandwidth or capacity utilization of any disk exceeds 99%, thus being very close to saturation. The results of the simulation are given in Table 1. It is apparent that when the best-fit with objective F allocation method is used, the resources of the system are used to a higher extent while simultaneously balancing the workloads better over the system’s disks than when using the other methods.

2.6. Performance of the HDA The best-fit allocation algorithm described in section 2.4 is used on the HDA system. The goal of the HDA is to balance the capacity and bandwidth for all its disks. To closely observe the behavior of the HDA system, the utilizations of both bandwidth and capacity for all the disks are plotted over the entire simulation time frame in Figures 3 and 4. It is apparent

0.8

Ux(0)

0.6

Ux(1) Ux(2)

0.4

Ux(3)

0.2

Ux(4) Ux(5)

0

5

10 15 Time (hours)

Figure 3. time.

20

25

30

35

Utilization of bandwidth over

Utilization of Capacity vs Time Utilization of Capacity

Table 1. Simulation results for an HDA system with six disks. Ux(i), Uc(i): utilization of throughput and capacity for disk i.

1.0

0.0

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

Uc(0) Uc(1) Uc(2) Uc(3) Uc(4) Uc(5)

0

5

10 15 Time (hours)

Figure 4. time.

20

25

30

35

Utilization of capacity over

Individual Response Time vs Time 1600

Individual Response Time (ms)

To test the effectiveness of the best-fit allocation algorithm, studies were conducted on the HDA system previously described. Four data allocation methods (round-robin, random, proportional to throughput, and proportional to capacity) were compared against the proposed allocation algorithm, in which the objective function F defined in Equation (3) was used.

Utilization of Bandwidth

1.2

1400

Resp(0)

1200

Resp(1) Resp(2)

1000

Resp(3)

800

Resp(4)

600

Resp(5)

400 200 0 0

5

10

15

20

25

30

35

Time (hours)

Figure 5. over time.

Individual

response

time

Proceedings of the 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’05) 1526-7539/05 $20.00 © 2005 IEEE

Figure 5 depicts the behavior of the read response time for the individual disks that make up the storage system. The curves follow the same trend and response time remains steady in a wide range of time. This implies the system works in a stable manner. The write response time, the time to complete the writing of both data and parity blocks, is not plotted here since writes do not affect application response time. In summary, the simulation study shows that the HDA balances the utilization of both bandwidth and capacity over all its disks, thus exploiting the system resources to the greatest possible extent. The read response time on each disk may differ, as the disks have different access times, but their utilizations are very close and are within a narrow range.

3. Related work One aspect of the HDA topic is addressed by the file placement problem, which attempts to balance disk workloads by eliminating disk access skew. Other techniques addressing the HDA topic include AdaptRaid0 and AdaptRaid5 [6], algorithms that fully utilize disk capacity but do not take load balancing into account. Minerva [7] and Ergastulum [2] are two storage system design tools that address the attribute mapping problem [8] but make static decisions on the RAID structure and data layout based on complete information on all datasets to be stored.

4. Future work Future areas of interest will consider simulations of HDA systems in degraded mode, tuning and scalability of the HDA, and the current allocation algorithm will be extended to take into account the overload that occurs due to disk failures. Rebuilding in the HDA may be applied selectively to the currently most active virtual arrays.

5. Conclusions

available resources environment.

in

a

heterogeneous

disk

Acknowledgements This research was sponsored by NSF through Grant 0105485 in Computer Systems Architecture. Thanks go to Mr. Gang Fu who developed a large portion of the code used in this simulation.

References [1] P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson. “RAID: High-performance, reliable secondary storage”, ACM Computing Surveys, 26(2), June 1994, pp.145-185. [2] E. Anderson, et al. “Ergastulum: Quickly finding nearoptimal storage system designs”, Technical Report HPLSSP-2001-05, 2001. [3] P. Scheuermann, G. Weikum, and P. Zabback. “Data partitioning and load balancing in parallel disk systems”, Very Large Data Base Journal, 7(1), 1998, pp. 48-66. [4] Diskspecs. Validated Disk Parameters. http://www.pdl.cmu.edu/DiskSim/diskspecs.html [5] A. Thomasian, C. Han, G. Fu and C. Liu. “A performance tool for RAID disk arrays”, Proc. Quantitative Evaluation of Systems – QEST’04, Enschede, Holland, September 2004. [6] T. Cortes and J. Laborta. “Extending heterogeneity to RAID level 5”, Proc. 2001 USENIX Annual Technical Conference. Boston, 2001, pp. 119-132. [7] G. A. Alvarez, et al. “Minerva: An automated resource provisioning tool for large-scale storage systems”, ACM Trans. Computer Systems, 19(4), 2001, pp. 483-518. [8] E. Borowski, et al. “Using attribute-managed storage to achieve QoS”, Proc. 5th Int’l Conference Workshop on Quality of Service. New York, 1997.

The design of the HDA is motivated by the high cost of storage management and the need to automate it. The HDA is a self-managed storage system that includes the following features: 1) allows multiple disks of various make and characteristics to coexist, 2) uses capacity and bandwidth to the maximum extent possible, 3) allows multiple RAID schemes to coexist on the same device, and 4) balances the disk loads in terms of both bandwidth and capacity utilization. Simulation results showed that it is possible to balance the utilization of bandwidth and capacity at the same time, and therefore provide efficient usage of the

Proceedings of the 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’05) 1526-7539/05 $20.00 © 2005 IEEE

Suggest Documents