Reliability and Performance of Hierarchical RAID ... - Semantic Scholar

3 downloads 22597 Views 648KB Size Report
High reliability, Markov process, three-failure-tolerant array, ... Permission to make digital or hard copies of all oi" part of this work Ibr ... data loss by means of single disk redundancy, even if a disk fails. .... techniques, allowing data to be recovered when a cluster ... drive, accordingly it can be a general storage device or a.
Reliability and Performance of Hierarchical RAID with Multiple Controllers* Sung Hoon Baek, Bong Wan Kim, Eui Joung Joung and Chong Won Park Electronics and Telecommunications Research Institute Yuseong-gu P.O.Box 106 Daejeon, Korea

shbaek][email protected] ABSTRACT R e d u n d a n t arrays of inexpensive disks (RAID) offer fault tolerance against disk failures. However a storage system having more disks suffers from less reliability and performance. A R A I D architecture tolerating multiple disk failures shows severe performance degradation in comparison to the R A I D Level 5 due to the complexity of implementation. We present a new R A I D architecture t h a t tolerates at least three disk failures and offers similar throughput to the R A I D Level 5. We call it the hierarchical RAID, which is hierarchically composed of R A I D Levels. Furthermore, we formally introduce the mean-time-to-data-loss (MTTDL) of traditional R A I D and the hierarchical R A I D using Markov process for detailed comparison.

Categories and Subject Descriptors C.4 [ C o m p u t e r S y s t e m s O r g a n i z a t i o n ] : Performance of

Systems--Fault Tolerance General Terms Reliability

Keywords High reliability, Markov process, three-failure-tolerant array, hierarchical R A I D 1. INTRODUCTION In the last few years, we have experienced huge disparity between I / O subsystem performance and processing power of a computer system that has been growing steadily. Myers has reported t h a t processor power has doubled every 2.25 years since 1978 [10], however the I / O performance has not kept pace with the gains in processing power. As the gap between the performance of processors and I / O systems is *More detailed information of this paper can be obtained from h t t p : / / m o o g o k . e t r i . r e . k r / s h b a e k / Permission to make digital or hard copies of all oi"part of this work Ibr personal or classroom use is granted without tee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PODC 01 Newport Rhode Island USA Copyright ACM 2001 t-58113-383-9/01/08...$5.00

becoming large, the overall performance of a computer system will depend on the I / O bottleneck [8], [16]. Therefore, it is essential to balancing the I / O bandwidth and the computational power. Improving I / O performance, known as d a t a declustering and disk striping in disk array systems [16], [17], [13], has been one of the main research topic for computer architects in recent years. Patterson et at. have proposed R e d u n d a n t Arrays of Inexpensive Disks (RAID) t h a t is defined by five different levels of R A I D (Level 1 5) depending on the d a t a and parity placement scheme [13], [14]. The R A I D offers large capacity and good performance using a number of disks, and is a reliable storage system that prevents from d a t a loss by means of single disk redundancy, even if a disk fails. Nowadays, the d e m a n d on huge data-storing capacity required by video on demand, internet d a t a center, d a t a warehousing, digital imaging, nonlinear video editing, and etc increases the number of disks of a RAID. As these trends accelerate, traditional R A I D cannot protect from the simultaneous loss of more than one disk [4]. As a result, a lot of researches have arisen in disk array system t h a t will not lose d a t a even when multiple disks fail simultaneously [2], [31, [7], [111, [121. Previous works [21, [3], [11] have presented fault tolerant schemes for a disk array tolerating against two disk failures and another works [7], [12] have presented three-disktolerant schemes offering better reliability of a disk array. Their schemes at least double the complexity of implementation in comparison to the R A I D Level 5, thus a disk array offering b e t t e r reliability suffers from severe performance degradation, since practical engineers hesitate to adopt these schemes into a commercial R A I D system. R A I D Level 3+1 and Level 5+1 have been introduced to dissolve the performance degradation with attaining high reliability [15]. The R A I D Level 3+1 is mirroring of the R A I D Level 3, and the RAID Level 5+1 is mirroring of the R A I D Level 5. However, there is the drawback t h a t these Levels require too many redundant disks, and their disk utilization is lower than 50%. Fig.1 shows the structure of R A I D Level 3+1 and 5+1. The above architecture is a kind of hierarchical R A I D that

246

Host Computer

Host Computer

RAID Comrollcr

RAID Controller

h .

Y Mirroring

Y Mirroring

(a) RAID Levcl 3+1

(b) RAID Level 5+1

.

.

.

RA°

F i g u r e 1: S t r u c t u r e o f R A I D L e v e l 3 + 1 a n d 5 + 1 ~

is mixed with multiple R A I D Levels. Wilkes et at. [18] have presented a hierarchical AutoRAID, another kind of hierarchical storage architecture, in which a two level storage hierarchy is implemented. In the upper level of this hierarchy, two copies of active d a t a are stored to provide full redundancy and better performance. In the lower level, R A I D Level 5 parity protection is used to provide lower storage cost. Accordingly, A u t o R A I D shows better performance than the RAID Level 5.

~eRAID ~elY

I~efO:~e]

0Q Disk0

Disk [

RAIDLe~IY

R.~D~]Y

amaa Disk2

Disk0

Disk I

Disk2

Q Disk0

Disk t

Disk2

' ~ l ]ov,,mlevel " RAID F i g u r e 2: H i R A I D L e v e l X x Y: H o s t c o m p u t e r is connected with the upper level RAID Level X comp r i s e d o f v i r t u a l d i s k s , e a c h o f w h i c h is c o m p o s e d

Therefore, hierarchical approach used in A u t o R A I D [18], the RAID Level 3+1, and 5+1 enhances R A I D technology. However, current hierarchical R A I D technologies are within the scope of mixing the R A I D Level 1 with another R A I D Level, considering the performance bound to a single disk array controller.

~

We present a new hierarchical RAID, which consists of multiple arbitrary R A I D Levels with multiple disk array controllers, not within the above scope. It can offer good performance and reliability without exhaustive disk utilization, thus annihilates the drawbacks of traditional disk arrays. For example, our hierarchical R A I D Level 5 x 5 consists of the upper level R A I D Level 5 including virtual disks, each of which is composed of the lower level R A I D Level 5 inside individual disk array controllers. Hierarchical R A I D tolerates at least three disk failures, and has good performance that is similar to the R A I D Level 5 because the operations of our hierarchical R A I D disperse throughout disk array controllers. For the fair comparison of reliability of hierarchical R A I D and RAID, we present a sophisticated analysis of mean-timeto-data-loss (MTTDL) for various hierarchical R A I D Levels and R A I D Levels. Patterson et al. [14], [5], [4], [7] have tried to show analytically the reliabilities of RAID, however they did not provide a sophisticated analysis. Gibson et al. [7] have present the reliabilities in a simulation manner. Patterson et al. [14], [5] have approximated their analyses in a vague method. Burkhard et al. [4] have just shown the reliability of the two-disk-fault tolerant disk array using Markov process, but have approximated the reliabilities of other disk arrays. Our analysis does not have any approximation and simulation, but we use precise Markov process for all RAID Levels.

2.

of

lower level RAID Level Y.

~iFiiiiii..... i

BIBki~sCO~t~

Disu~~ c O.Si~

RAIDLrvet5 I

F i g u r e 3: H i R A I D L e v e l 1 x 5 x 5: it is m i r r o r i n g o f t h e H i R A I D L e v e l 5 x 5. E a c h l o w e r l e v e l R A I D controller has two RAID groups. Four RAID groups c o m p o s e t h e R A I D L e v e l 5, t w o o f w h i c h c o m p o s e

t h e R A I D L e v e l 1 t h a t is p r o v i d e d b y s o f t w a r e o f the host computer.

HIRAID: O U R HIERARCHICAL RAID

247

Traditional R A I D consists of only disks and single array controller, but HiRAID is R A I D over RAID. In other words, HiRAID consists of virtual disks, each of which is composed of RAID. Fig.2 shows the block diagram of one embodiment of a HiRAID comprised of multiple R A I D controllers. The R A I D controller of the R A I D Level X connected to host computer is connected to the R A I D controllers composed of the R A I D Level Y, where X and Y mean the conventional R A I D Level such as 0, 1, 3, 5, 6. Hereinafter, it is referred to the HiRAID Level X x Y. For example, X x Y c a n b e 0 x 3 , 1 x 3 , 3 x l , 0x5,5xl, l x 5, 5 x 5, and so on. The H i R A I D L e v e l l x 3, l x 5, 0 x 3, and 0 x 5 are equal to the R A I D Level 3+1, 5+1, 3+0, and 5+0 [15], respectively. Each HiRAID Level offers different characteristics such as performance, reliability, and disk utilization, hence the HiRAID Level 1 x 5 is not equal to the HiRAID Level 5 x 1 , thus X x Y is not commutative. HiRAID does not need to be composed of multiple R A I D controllers, but also the depth of HiRAID does not need to be 2. HiRAID can be made up of wholly single controller or several controllers including partial R A I D groups. Fig.3 shows an exemplary diagram of a more complex HiRAID, which is H i R A I D Level 1 x 5 x 5, and which depth is 3. Fig.3 shows t h a t each lower level R A I D controller has two R A I D groups. The upper level mirroring is provided by software of the host computer. The HiRAID Level 5 x 5 uses the concept of product codes, which are common used in communications and appeared in Mann's architecture [9], in which d a t a is stored in the members of a cluster using the R A I D Level 5 striping and parity techniques, allowing d a t a to be recovered when a cluster member is down. Each member possesses local disk drive devices employing the R A I D Level 5 technology twice to achieve a computer system with high reliability and lower cost. Mann's architecture shows wider bandwidth because each member has local I / O channel in comparison to HiRAID t h a t provides single I / O p a t h for interconnection of host computer. However HiRAID operates as a general disk drive, accordingly it can be a general storage device or a storage component of a complex storage structure such as Mann's architecture.

3.

RELIABILITY

Each HiRAID Level offers various characteristics of reliability and performance that are better than traditional RAID. In this section, we compare the reliability of HiR A I D with t h a t of R A I D by means of Markov process. The reliability of R A I D can be measured in mean-time-to-dataloss (MTTDL). The following subsections formally show the M T T D L of all R A I D Levels and HiRAID Levels not using any approximation and simulation. Our approach is based on the reliability engineering [1].

3.1

The M T T D L of the traditional RAID

If the failure rate and repair rate of the disk are characterized by the exponential distribution and disks are independent, the Markov process approach for the failure/repair model can be used [1]. The Markov diagram for R A I D is shown in Fig.4. Each branch of a state denotes transition rate to next state. For example, ( N - 2 ) A of two-disk-down state is a failure rate (transition rate to three-disk-down state) when two disks have failed. 2/~ of two-disk-down state is a repair rate (transition rate to single-disk-down state) when two disks have failed, where N is the number of total disks, the repair rate per disk/~ is 1 / M T T R (mean-time-to-repair), and the failure rate per disk )~ equals 1 / M T T F (mean-time-to-failure). Mean-time-to-data-loss (MTTDL) can be obtained not only using the differential equation, b u t also using the fundamental matrix M t h a t is defined by the following Equation (1) [1]. M ----[I - Q ] - I where Q is the t r u n c a t e d m a t r i x [1]. The truncated stochastic transitional probability matrixes Qs, Q6, and Q7 for the single-disk, two-disk, and three-disk fault tolerant R A I D (an imaginary R A I D t h a t has 3 redundant disks for tolerating three disk failures), respectively, are ~

Q6 1 - NA

1 - (g

N),

# 0

HiRAID is adequate to a system such as a t e r t i a r y storage device of a huge d a t a center that requires single I / O channel and a large number of disks. A system requiring less than several tens of disks is not suitable for HiRAID, which increases reliability and shows not bad performance of a d a t a storage system, however is not adequate for all application of storage device.

(1)

1 - (N-- 1)A-/z 2/~

Q7 =

-N 0 0

- 1)), - .

1

( g O 1)A 1 - ( N - 2)A - 2/~j

o

(3)

o

]

1 - ( N - 1)A--/z (N-1)A 0 2# 1 - (N - 2)A - 2/L (N - 2)A 0 3/z 1 - (N - 3)A - 3/z

(4)

The main concept of HiRAID uses a plurality of controllers to increase reliability while retaining performance. Many controllers are required to achieve this, accordingly it is imp o r t a n t for a system adopting HiRAID to get controller for a cheap price. Price of recent commerical commodities range from much expensive to almost same level of price range comparing to the disk price.

From Equation (1) and Equation (2), the fundamental matrix Ms of the R A I D Level 5 is

Lm21 m22.1 --

248

N(N

l l)A2 [ ( N -

1)A+/z/z

(5) ~A]

M T T D L 7 = {(4N 3 - 18N 2 + 22N - 6)A3 + (6N 2 - 1 4 N + 6)A2p + (8N - 6)Ap2 + 6 p 3 } / { Y ( N - I)(N - 2)(N - 3)A4} = M T T D L 6 + { N ( N - I)(N - 2)A3 + 3 N ( N - l)A2p +6NAp2 + 6 p 3 } / { N ( N - I)(N - 2)(N - 3)A4} (9)

(a) singlediskfaulttolerance(RAIDLevel5: N-I)

The MTTDL of the single-disk, two-disk, and three-disk fault tolerant RAID can be obtained simply, meanwhilethe MTTDL of the RAID Level 1 is somewhat complex• The transitional probability from single-disk-down state to twodisk-down state is neither zero nor one because there is no data loss if a second failed disk does not correspond to the mirror of the first failed disk. Fig.4 (d) shows the Markov diagram of the RAID Level 1. Transitions from single-diskdown state to (d-1)-disk-down state have the probability P ' ( n ) that one more disk fails without data loss in the ( n 1)-disk-down state. ( N - k ) P ' ( k + 1) means the probabilistic number of fault tolerable disks without data loss in the k-disk-down state. The P ' ( n ) is defined by the following equation.

2

(b) two disksfaulttolerance(RAIDLevel6: N-2)

2

3

(c) threedisksfaulttolerance(N-3) N

( N - I)P (2)

( N - d + 29P_ ( d - t)

( N - d + I)P (d)

P ( n N n - 1) P'(n)

(d) mirroring(RAIDLevel1: N/2) F i g u r e 4: M a r k o v d i a g r a m o f t h e R A I D L e v e l 5, 6, 1, a n d t h r e e d i s k f a i l u r e s t o l e r a n t R A I D .

=

P(nJn-

1) ---- P ( n -

d-bl

MTTDL = ~

m,j

(6)

j=l

P(n) = P C n - 1)

fl

, if n---- 1

/ P(n)/P(n-1)

,if n>_2

(10)

where P ( n ) is the safety probability function that there is no data loss when n number of the disk fail simultaneously. In mirroring, the data is safe if each of n mirroring set of N / 2 mirroring groups has only one faulty disk. Hence the safety probability function PI (n) of the RAID Level 1 is

Pl(n) -- 1v/2C"" The element rnij of M5 is the average time spent in state j given that the process starts in state i before reaching the data-loss state. It was assumed that the system start in state 1 (all-disk-up) and therefore M T T D L is the sum of the average time spent in all state j given that the process starts in state 1.

1)

2,,

NC(o where ,,Cr = C,

(11)

,ifn>r , otherwise

From Equation (1), Equation (10), Equation (11), and Fig.4 (d), the fundamental matrix M1 of the RAID Level 1 can be written as Equation (12). M~ -1

--

= [ I -- O l =

IN

--

1)X

-t- ,u

--2~



0 0

.

0

•.

0

0

where d + 1 is the column size of the fundamental matrix. The MTTDL of the RAID Level 5, M T T D L s , is

0

.

( N -- d q- ' ) X -F (d -- 1)t~ --(N -- d -~ l ) P { ( d ) X . . dA (N -- d)A + d ~ J (12)

(2N - 1)A + # M T T D L 5 = roll q- m12 = N(N 1)A2 -

--

1

NA

b

Equation (12) can be generalized to Equation (13).

-

NA+p

N ( N - 1)A

(7)

M~ "1 = [ I - QI] = [am,n](d+l)x(d+1) ak+l,k = - k p , Vksuch t h a t l < k < d ak,k = ( N -

NA+p = MTTDLo + N ( N -- 1)A

k + 1)A + ( k - 1)~, Vksuchthat 1

Suggest Documents