higher reliability. â higher redundancy (and cost). â more blocks to be updated for a write request. â higher overhead for writes. â lower performance in ...
Performance of Two Disk Failure Tolerant Disk Arrays Chunqi Han Computer Science Department New Jersey Institute of Technology -NJIT Newark, NJ 07102, USA
1
Background
Online disk storage cheaper and faster than paper Magnetic disks dominant storage medium Users adding nearly 100% storage capacity per year Downtime can be very expensive: hourly cost for
Retail brokerage Credit card authorizer Pay-per-view media Home shopping Airline reservations
$6,500,500 $2,600,000 $1,150,000 $113,000 $89,500
Increase data availability by introducing redundancy 2
RAID5 -- Single disk failure tolerant array
Rotated block interleaved parity (Left-Symmetric) P0-4 = D0 ⊕ D1 ⊕ D2 ⊕ D3 ⊕ D4 (definition) P0-4new = D1new ⊕ D1old ⊕ P0-4old (update, 4 accesses) D0 = D1 ⊕ D2 ⊕ D3 ⊕ D4 ⊕ P0-4 (reconstruct) 3
RAID5 operating modes:
Normal mode:
Write/update – requires updating the parity block
Degraded mode (with one disk failure)
To access data on the failed disk, read and XOR all the corresponding blocks from surviving disks.
“small write penalty”.
Doubles disk loads
Rebuild mode
4
Reasons why RAID5 not enough
RAID5 is vulnerable to data loss after a single disk failure, until its contents are reconstructed, due to:
Uncorrectable errors Hidden faults on surviving disks A second disk failure before reconstruction is completed
Increasing disk capacity makes problem worse. Solution: two disk failure tolerant arrays 5
Double disk failure tolerant arrays
RAID6 : Using Reed-Solomon code
EVENODD : proposed by M.Blaum et al.’95
StorageTek : Iceberg Hewlett-Packard : RAID5 DP (Double Parity) Using parity only, minimal redundancy
RM2 : proposed by C.I.Park
Using parity only
6
Performance considerations
No free lunch:
higher reliability higher redundancy (and cost) more blocks to be updated for a write request higher overhead for writes lower performance in degraded mode
7
Motivations for the study
Compare RAID0, RAID5, RAID6, EVENODD and RM2:
Overhead for tolerating single and double disk failures (number of accessed blocks). Performance in normal, degraded with single or double disk failures (response times).
8
Accomplishments
Device independent cost functions derived for various schemes. Maximum throughputs obtained for a given workload and disk characteristics. A queuing model developed to obtain mean response times. Queuing model validated by simulation results. Performance and scalability of small, intermediate, and large disk array configurations compared.
9
Methodology: Enumerate various cases and weigh them with appropriate frequencies. A sample case breakdown graph (RAID6 with one disk failure) Incoming Requests
RAID6 degraded mode ( 1 failed disk )
fr = fraction of read fw= fraction of write
Read ( fr )
Read from Read from normal failed disk ( (N-1)/N ) disk ( 1/N )
Write ( fw )
Both data and parity are on non-failed disks ( (N-3)/N )
Data on failed disk, parities on non-failed disk ( 1/N )
Data on non-failed disk, one of the two parities on failed disk ( 2/N ) 1 Simple Read
Fork/Join Read of the surviving (N-2) disks (F/J)(N-2)read
3 RMW on data and parity disks RMW + (F/J)2RMW
Reconstruct write: Read the surviving ( N-3 ) data block, then write the 2 parity blocks. (F/J)(N-3)read + (F/J)2write
2 RMW
RAID5 cost of operation
11
RAID6 Overview
RAID6 uses the Reed-Solomon code Parity layout similar to left symmetric in RAID5 D0 D1 D2 D3 D4 P0-4 Q0-4 D6
D7
D8
D9
P5-9
Q5-9
D5
D12
D13
D14
P10-14
Q10-14
D10
D11
D18
D19
P15-19
Q15-19
D15
D16
D17
D24
P20-24
Q20-24
D20
D21
D22
D23
P25-29
Q25-29
D25
D26
D27
D28
D29
Q30-34
D30
D31
D32
D33
D34
P30-34
12
RAID6 cost of operation
13
EVENODD organization
Two kind of parities:
Horizontal parities P (same as in RAID5) Diagonal parities Q (shown below)
a “symbol” “segment” =(m-1) symbols m is a prime number S=⊕
∞ (m+2) disks Figure extracted from M.Blaum et al “EVENODD: An optimal scheme for tolerating double disk failures in RAID architecures” with minor modification
14
RM2 ν
Sample RM with M=3 (33.3% redundancy) and T=7 :
The Redundancy Matrix
Corresponding disk layout
Each data block di,j is protect by parities pi and pj .
15
RM2 cost of operation
16
Queuing model
Mean response times estimated by M/G/1 model. Accuracy of M/G/1 model verified with a detailed simulation of a single disk, error < 1% Extreme-value distribution used to approximate nway fork-join response time (RF/Jn).
Simulation results show Rmaxn is a tight upper bound to RF/Jn Matching the first two moments of read response time with extreme-value distribution we get n Rmax = Rr + ( 6 / π )σ r ln(n)
17
Normal mode read response time 300 RAID0 (Analytical) RAID5 (Analytical) RAID6 & RM2 (Analytical) RAID0 (Simulation) RAID5 (Simulation) RAID6 (Simulation) RM2 (Simulation)
250
RAID6,RM2 RAID5
RAID0
Read Resp. Time (ms)
200
150
100
50
0 0
200
400
600
800
1000
1200
1400
1600
Arrival Rate (1/s)4KB blocks, IBM18ES disk Normal mode, 19 disks, 75% read,
18
Read response time with single disk failure 300 RAID5 (Analytical) RAID6 (Analytical) RM2 (Analytical) RAID5 (Simulation) RAID6 (Simulation) RM2 (Simulation)
250
RAID5
RAID6 RM2
Read Resp. Time (ms)
200
150
100
50
0 0
100
200
19 disks, 75% read
300
400
500
600
700
Arrival Rate (1/s)
19
Read response time with two disk failures 300 RAID6 (Analytical)
RM2
RM2 (Analytical)
250
RAID6 (Simulation)
RAID6
Read Resp. Time (ms)
RM2 (Simulation) 200
150
100
50
0 0
50
100
19 disks, 75% read
150
200
250
300
350
400
450
Arrival Rate (1/s)
20
Throughputs in degraded modes with respect to normal mode
We can find the performance degradation of schemes from the table below:
With single failure RM2 performs better due to declustering effect. With two disk failures RAID6 and EVENODD attain higher maximum throughput. 21
Conclusions
Performances with one disk failure are similar to RAID5 for all schemes With one and two disk failures, RAID6 and EVENODD achieve 70% and 50%, respectively, of the maximum throughput in normal mode. RM2 can achieve about 90% of the throughput of RAID6 (and EVENODD) with two disk failures. It can be used sparingly to recover from localized faults. 22