Performance Comparison of IDE and SCSI Disks

Reliability and Performance of Hierarchical RAID with
Multiple Controllers
Sung Hoon Baek, Bong Wan Kim, Eui Joung Joung and Chong Won Park
Dept. of Computer System
Electronics and Telecommunications Research Institute
Yuseong-gu P.O.Box 106, Daejeon, Korea
shbaek|kimbw|euijoung|cwpark@etri.re.kr
ABSTRACT
Redundant arrays of inexpensive disks (RAID) oer fault
tolerance against disk failures. However a storage system
having more disks suers from less reliability and perfor-
mance. A RAID architecture tolerating multiple disk fail-
ures shows severe performance degradation in comparison
to the RAID Level 5 due to the complexity of implementa-
tion. We present a new RAID architecture that tolerates at
least three disk failures and oers similar throughput to the
RAID Level 5. We call it the hierarchical RAID, which is
hierarchically composed of RAID Levels. Furthermore, we
formally introduce the mean-time-to-data-loss (MTTDL) of
traditional RAID and the hierarchical RAID using Markov
process for detailed comparison.
Categories and Subject Descriptors
C.4 [Computer Systems Organization]: Performance of
Systems|Fault Tolerance
General Terms
Reliability
Keywords
High reliability, Markov process, three-failure-tolerant array,
hierarchical RAID
1. INTRODUCTION
In the last few years, we have experienced huge disparity
between I/O subsystem performance and processing power
of a computer system that has been growing steadily. Myer-
s has reported that processor power has doubled every 2.25
years since 1978 [10], however the I/O performance has not
kept pace with the gains in processing power. As the gap
between the performance of processors and I/O systems is
More detailed information of this paper can be obtained
from http://moogok.etri.re.kr/shbaek/
becoming large, the overall performance of a computer sys-
tem will depend on the I/O bottleneck [8], [16]. Therefore,
it is essential to balancing the I/O bandwidth and the com-
putational power.
Improving I/O performance, known as data declustering and
disk striping in disk array systems [16], [17], [13], has been
one of the main research topic for computer architects in
recent years. Patterson et al. have proposed Redundant
Arrays of Inexpensive Disks (RAID) that is dened by ve
dierent levels of RAID (Level 1 5) depending on the data
and parity placement scheme [13], [14]. The RAID oer-
s large capacity and good performance using a number of
disks, and is a reliable storage system that prevents from
data loss by means of single disk redundancy, even if a disk
fails.
Nowadays, the demand on huge data-storing capacity re-
quired by video on demand, internet data center, data ware-
housing, digital imaging, nonlinear video editing, and etc
increases the number of disks of a RAID. As these trends
accelerate, traditional RAID cannot protect from the simul-
taneous loss of more than one disk [4]. As a result, a lot
of researches have arisen in disk array system that will not
lose data even when multiple disks fail simultaneously [2],
[3], [7], [11], [12].
Previous works [2], [3], [11] have presented fault tolerant
schemes for a disk array tolerating against two disk fail-
ures and another works [7], [12] have presented three-disk-
tolerant schemes oering better reliability of a disk array.
Their schemes at least double the complexity of implemen-
tation in comparison to the RAID Level 5, thus a disk ar-
ray oering better reliability suers from severe performance
degradation, since practical engineers hesitate to adopt these
schemes into a commercial RAID system.
RAID Level 3+1 and Level 5+1 have been introduced to
dissolve the performance degradation with attaining high
reliability [15]. The RAID Level 3+1 is mirroring of the
RAID Level 3, and the RAID Level 5+1 is mirroring of
the RAID Level 5. However, there is the drawback that
these Levels require too many redundant disks, and their
disk utilization is lower than 50%. Fig.1 shows the structure
of RAID Level 3+1 and 5+1.
The above architecture is a kind of hierarchical RAID that
RAID level 3 RAID level 3 RAID level 5 RAID level 5
Mirroring
RAID Controller RAID Controller
Mirroring
(a) RAID Level 3+1
Host Computer Host Computer
(b) RAID Level 5+1
Figure 1: Structure of RAID Level 3+1 and 5+1
is mixed with multiple RAID Levels. Wilkes et al. [18] have
presented a hierarchical AutoRAID, another kind of hierar-
chical storage architecture, in which a two level storage hi-
erarchy is implemented. In the upper level of this hierarchy,
two copies of active data are stored to provide full redundan-
cy and better performance. In the lower level, RAID Level
5 parity protection is used to provide lower storage cost.
Accordingly, AutoRAID shows better performance than the
RAID Level 5.
Therefore, hierarchical approach used in AutoRAID [18],
the RAID Level 3+1, and 5+1 enhances RAID technology.
However, current hierarchical RAID technologies are within
the scope of mixing the RAID Level 1 with another RAID
Level, considering the performance bound to a single disk
array controller.
We present a new hierarchical RAID, which consists of mul-
tiple arbitrary RAID Levels with multiple disk array con-
trollers, not within the above scope. It can oer good per-
formance and reliability without exhaustive disk utilization,
thus annihilates the drawbacks of traditional disk arrays.
For example, our hierarchical RAID Level 5 5 consists of
the upper level RAID Level 5 including virtual disks, each
of which is composed of the lower level RAID Level 5 inside
individual disk array controllers. Hierarchical RAID toler-
ates at least three disk failures, and has good performance
that is similar to the RAID Level 5 because the operations of
our hierarchical RAID disperse throughout disk array con-
trollers.
For the fair comparison of reliability of hierarchical RAID
and RAID, we present a sophisticated analysis of mean-time-
to-data-loss (MTTDL) for various hierarchical RAID Levels
and RAID Levels. Patterson et al. [14], [5], [4], [7] have
tried to show analytically the reliabilities of RAID, howev-
er they did not provide a sophisticated analysis. Gibson et
al. [7] have present the reliabilities in a simulation manner.
Patterson et al. [14], [5] have approximated their analyses
in a vague method. Burkhard et al. [4] have just shown
the reliability of the two-disk-fault tolerant disk array using
Markov process, but have approximated the reliabilities of
other disk arrays. Our analysis does not have any approxi-
mation and simulation, but we use precise Markov process
for all RAID Levels.
2. HIRAID: OUR HIERARCHICAL RAID
upper level RAID
lower level RAID
Host
Computer
RAID Level Y
RAID controller
upper channel
lower channel
upper channel
lower channel
RAID Level Y RAID Level Y
RAID Level X
Member
Disk 0
Member
Disk 1
Member
Disk 2
Member
Disk 0
Member
Disk 1
Member
Disk 2
Member
Disk 0
Member
Disk 1
Member
Disk 2
Figure 2: HiRAID Level X Y : Host computer is
connected with the upper level RAID Level X com-
prised of virtual disks, each of which is composed of
lower level RAID Level Y.
Banking Computer
upper level RAID
lower level RAID
RAID controller
RAID Level 5 RAID Level 5 RAID Level 5 RAID Level 5
RAID Level 5
Primary Site Disaster Recovery Site
Mirroring
Figure 3: HiRAID Level 1 5 5: it is mirroring
of the HiRAID Level 5 5. Each lower level RAID
controller has two RAID groups. Four RAID groups
compose the RAID Level 5, two of which compose
the RAID Level 1 that is provided by software of
the host computer.
Traditional RAID consists of only disks and single array
controller, but HiRAID is RAID over RAID. In other words,
HiRAID consists of virtual disks, each of which is composed
of RAID. Fig.2 shows the block diagram of one embodiment
of a HiRAID comprised of multiple RAID controllers. The
RAID controller of the RAID Level X connected to host
computer is connected to the RAID controllers composed of
the RAID Level Y, where X and Y mean the conventional
RAID Level such as 0, 1, 3, 5, 6. Hereinafter, it is referred
to the HiRAID Level X Y .
For example, X Y can be 0 3, 1 3, 3 1, 0 5, 5 1,
1 5, 5 5, and so on. The HiRAID Level 1 3, 1 5,
0 3, and 0 5 are equal to the RAID Level 3+1, 5+1,
3+0, and 5+0 [15], respectively. Each HiRAID Level oers
dierent characteristics such as performance, reliability, and
disk utilization, hence the HiRAID Level 1 5 is not equal
to the HiRAID Level 51 , thus X Y is not commutative.
HiRAID does not need to be composed of multiple RAID
controllers, but also the depth of HiRAID does not need to
be 2. HiRAID can be made up of wholly single controller
or several controllers including partial RAID groups. Fig.3
shows an exemplary diagram of a more complex HiRAID,
which is HiRAID Level 155, and which depth is 3. Fig.3
shows that each lower level RAID controller has two RAID
groups. The upper level mirroring is provided by software
of the host computer.
The HiRAID Level 55 uses the concept of product codes,
which are common used in communications and appeared in
Mann's architecture [9], in which data is stored in the mem-
bers of a cluster using the RAID Level 5 striping and parity
techniques, allowing data to be recovered when a cluster
member is down. Each member possesses local disk drive
devices employing the RAID Level 5 technology twice to
achieve a computer system with high reliability and lower
cost.
Mann's architecture shows wider bandwidth because each
member has local I/O channel in comparison to HiRAID
that provides single I/O path for interconnection of host
computer. However HiRAID operates as a general disk
drive, accordingly it can be a general storage device or a
storage component of a complex storage structure such as
Mann's architecture.
HiRAID is adequate to a system such as a tertiary storage
device of a huge data center that requires single I/O channel
and a large number of disks. A system requiring less than
several tens of disks is not suitable for HiRAID, which in-
creases reliability and shows not bad performance of a data
storage system, however is not adequate for all application
of storage device.
The main concept of HiRAID uses a plurality of controller-
s to increase reliability while retaining performance. Many
controllers are required to achieve this, accordingly it is im-
portant for a system adopting HiRAID to get controller for a
cheap price. Price of recent commerical commodities range
from much expensive to almost same level of price range
comparing to the disk price.
3. RELIABILITY
Each HiRAID Level oers various characteristics of reli-
ability and performance that are better than traditional
RAID. In this section, we compare the reliability of Hi-
RAID with that of RAID by means of Markov process. The
reliability of RAID can be measured in mean-time-to-data-
loss (MTTDL). The following subsections formally show the
MTTDL of all RAID Levels and HiRAID Levels not using
any approximation and simulation. Our approach is based
on the reliability engineering [1].
3.1 The MTTDL of the traditional RAID
If the failure rate and repair rate of the disk are characterized
by the exponential distribution and disks are independent,
the Markov process approach for the failure/repair model
can be used [1]. The Markov diagram for RAID is shown in
Fig.4.
Each branch of a state denotes transition rate to next state.
For example, (N��2) of two-disk-down state is a failure rate
(transition rate to three-disk-down state) when two disks
have failed. 2 of two-disk-down state is a repair rate (tran-
sition rate to single-disk-down state) when two disks have
failed, where N is the number of total disks, the repair rate
per disk is 1/MTTR (mean-time-to-repair), and the fail-
ure rate per disk equals 1/MTTF (mean-time-to-failure).
Mean-time-to-data-loss (MTTDL) can be obtained not only
using the dierential equation, but also using the fundamen-
tal matrix M that is dened by the following Equation (1)
[1].
M = [I �� Q]��1 (1)
where Q is the truncated matrix [1].
The truncated stochastic transitional probability matrixes
Q5, Q6, and Q7 for the single-disk, two-disk, and three-disk
fault tolerant RAID (an imaginary RAID that has 3 redun-
dant disks for tolerating three disk failures), respectively,
are
Q5 = 1 �� N N
1 �� (N �� 1) �� (2)
Q6 =
24
1 �� N N 0
1 �� (N �� 1) �� (N �� 1)
0 2 1 �� (N �� 2) �� 23
5
(3)
Q7 =
264
1 �� N N 0 0
1 �� (N �� 1) �� (N �� 1) 0
0 2 1 �� (N �� 2) �� 2 (N �� 2)
0 0 3 1 �� (N �� 3) �� 3
375
(4)
From Equation (1) and Equation (2), the fundamental ma-
trix M5 of the RAID Level 5 is
M5 = mij = m11 m12
m21 m22
=
1
N(N �� 1)2 (N �� 1) + N
��N
(5)
all disks
up
single disk
down
data loss
Nλ
μ
(N −1)λ
(a) single disk fault tolerance (RAID Level 5: N-1)
all disks
up
single disk
down
Nλ
μ
(N −1)λ
two disks
down
2μ
data loss
(N − 2)λ
all disks
up
single disk
down
Nλ
μ
(N −1)λ
two disks
down
2μ
three disks
down
(N − 2)λ
data loss
(N − 3)λ
3μ
(b) two disks fault tolerance (RAID Level 6: N-2)
(c) three disks fault tolerance (N-3)
all disks
up
single disk
down
Nλ
μ
(N −1)P′(2)λ
2μ
d-1 disks
down
(N − d + 2)P′(d −1)λ
data loss
(N −1)(1− P′(2))λ
d disks
down
(d −1)μ dμ
(N − d +1)P′(d)λ
(N − d +1)(1− P′(d))λ (N − d)λ
(d) mirroring (RAID Level 1: N/2)
-
Figure 4: Markov diagram of the RAID Level 5, 6,
1, and three disk failures tolerant RAID.
The element mij of M5 is the average time spent in state
j given that the process starts in state i before reaching
the data-loss state. It was assumed that the system start
in state 1 (all-disk-up) and therefore MTTDL is the sum of
the average time spent in all state j given that the process
starts in state 1.
MTTDL =
d+1
Xj=1
m1j (6)
where d + 1 is the column size of the fundamental matrix.
The MTTDL of the RAID Level 5, MTTDL5, is
MTTDL5 = m11 + m12 =
(2N �� 1) +
N(N �� 1)2
=
1
N
+
N +
N(N �� 1)
=MTTDL0 +
N +
N(N �� 1)
(7)
The MTTDL of the RAID Level 6 and three disk failures
tolerant RAID can be solved in the same ways.
MTTDL6 =
(3N2 �� 6N + 2)2 + (3N �� 2) + 22
N(N �� 1)(N �� 2)3
= MTTDL5 +
N(N �� 1)2 + 2N + 22
N(N �� 1)(N �� 2)3
(8)
MTTDL7 = f(4N3 �� 18N2 + 22N �� 6)3 + (6N2 �� 14N+
6)2 + (8N �� 6)2 + 63g=fN(N �� 1)(N �� 2)(N �� 3)4g = MTTDL6 + fN(N �� 1)(N �� 2)3 + 3N(N �� 1)2
+6N2 + 63g=fN(N �� 1)(N �� 2)(N �� 3)4g
(9)
The MTTDL of the single-disk, two-disk, and three-disk
fault tolerant RAID can be obtained simply, meanwhile the
MTTDL of the RAID Level 1 is somewhat complex. The
transitional probability from single-disk-down state to two-
disk-down state is neither zero nor one because there is no
data loss if a second failed disk does not correspond to the
mirror of the rst failed disk. Fig.4 (d) shows the Markov
diagram of the RAID Level 1. Transitions from single-disk-
down state to (d{1)-disk-down state have the probability
P0(n) that one more disk fails without data loss in the (n{
1)-disk-down state. (N �� k)P0(k +1) means the probabilis-
tic number of fault tolerable disks without data loss in the
k-disk-down state. The P0(n) is dened by the following
equation.
P0(n) = P(njn �� 1) =
P(n \ n �� 1)
P(n �� 1)
=
P(n)
P(n �� 1)
= (1 , if n = 1
P(n)=P (n �� 1) , if n 2
(10)
where P(n) is the safety probability function that there is
no data loss when n number of the disk fail simultaneously.
In mirroring, the data is safe if each of n mirroring set of
N=2 mirroring groups has only one faulty disk. Hence the
safety probability function P1(n) of the RAID Level 1 is
P1(n) = N=2
_C
n 2n
N _C
n
where n _C
r = (nCr , if n r
0 , otherwise
(11)
From Equation (1), Equation (10), Equation (11), and Fig.4
(d), the fundamental matrix M1 of the RAID Level 1 can
be written as Equation (12).
M
��1
1 = [I �� Q] =
26666666664
N ��NP0
1(1) 0 0
�� (N �� 1) + 0 0
0 ��2 0 0
...
...
.
.
.
...
...
0 0 (N �� d + 1) + (d �� 1) ��(N �� d + 1)P
0
1(d)
0 0 ��d (N �� d) + d
37777777775
(12)
Equation (12) can be generalized to Equation (13).
M��1
1 = I �� Q1 = am;n(d+1)(d+1)
ak+1;k = ��k; 8ksuch that1 k d
ak;k = (N �� k +1) + (k �� 1),
8k such that 1 k d+ 1
ak;k+1 = ��(N �� k + 1)P0
1(k), 8k such that 1 k d
am;n = 0; 8m, 8n such that m + 2 n or m �� 2 n
(13)
where d is the maximum number of fault-tolerable disks. In
the case of the RAID Level 1, d is equal to N=2. The symbol-
ic equation of the inverse of M��1
1 cannot be simply derived
from Equation (13), thus we choose numerical method to
calculate the MTTDL.
(c) the HiRAID Level 5x5
(a) the HiRAID Level 5x0
(b) the HiRAID Level 0x5
Figure 5: The case of maximum tolerable failures
of the HiRAID. The number of virtual disks com-
prising the upper level RAID is 3, and the number
of disks comprising a lower level RAID is 4, (i = 3,
j = 4)
(a) the HiRAID Level 1x5 ( i = 2, j = 6)
(b) the HiRAID Level 5x1 ( i = 6, j = 2)
Figure 6: The case of maximum tolerable failures
of the HiRAID Level 1 5 and 5 1, which are the
special case of the HiRAID Level 5 5 when i = 2
and j = 2, respectively.
3.2 The MTTDL of the HiRAID
The HiRAID Level 55 tolerates at least three disk failures
and at most i +j ��1 disk failures, where i is the number of
virtual disks comprising the upper level RAID, and j is the
number of physical disks comprising each lower level RAID.
The total number of disks including check and information
disks is equal to i j. Fig.5 and Fig.6 show the case of
maximum tolerable failures of the HiRAID.
Derivation of the MTTDL of the HiRAID Level 5 5 is
also complex. It is similar to the HiRAID Level 1 if P1(n)
of Equation (13) is replaced with P55(n), which can be
written as
P55(1) = 1
P55(2) = 1
P55(3) = 1
P55(4)
= the probability that four disks fail in four separate groups
without data loss
+ the probability that two disks fail in a same group and
other each of disks fails in each dierent group without data
loss
+ the probability that three disks fail in a same group and
other disk fails in other dierent group without data loss
+ the probability that four disks fail in a same group without
data loss.
P55(4) = fi _C
4 (j _C
1)4 +i _C
3 3 _C
1 j _C
2 (j _C
1)2
+i _C
2 2 _C
1 j _C
3 j _C
1 +i _C
1 j _C
4g=ij _C
4
(14)
Table 1: The probability P(n) and the maximum
number of fault-tolerable disks d of the HiRAID Lev-
els
Level P(n) d
0 5 (i _C
n jn)=ij _C
n i
5 0 i(i j _C
n)=ij _C
n j
1 5 (2 j _C
n +2 j _C
n��1 j)=2j _C
n j + 1
5 1 (i _C
n 2n +i _C
n��1 (n �� 1) 2n��2)=2j _C
n i + 1
By similar ways,
P55(5) = fi _C
5 (j _C
1)5 +i _C
4 4 _C
1 j _C
2 (j _C
1)3
+i _C
3 3 _C
1 j _C
3 (j _C
1)2 +i _C
2 2 _C
1 j _C
4 j _C
1
+i _C
1 j _C
5g=ij _C
5
(15)
From Equation (14) and Equation (15), we can generalize
P55, which can be written as Equation (16).
P55 = i
_C
n jn +Pn��1
k=1 i _C
n��k (n �� k) j _C
k+1 jn��k��1
ij _C
n
(16)
If the information of the lower level RAID is unknown to the
upper level RAID, P05(n) is the probability that n number
of disks fail in n separate groups without data loss, and
P50(n) is the probability that n disks fail in a same group
without data loss, and P15(n) is obtained by assigning i = 2
to Equation (16), and P51 is obtained by assigning j = 2
to Equation (16). Table 1 shows these probability functions.
P1(n) is simply derived by assigning i = N=2 and j = 2 to
P55(n).
The HiRAID Level 0 5 is a group comprised of the RAID
Level 5s. However, the MTTDL of the RAID Level 0 5 is not equal to the MTTDL of the RAID Level 5 over
the number of groups because the reliability function of the
RAID Level 5, R5(t), is not characterized by the exponential
distribution, and dened by R5(t) = number surviving at
time t / total number surviving or failed.
R5(t) =
s1e��s2t �� s2e��s1t
s1 �� s2
where
s1 = 1
2 n(2N �� 1) + +p2 + (4N �� 2) + 2o s2 = 1
2 n(2N �� 1) + ��p2 + (4N �� 2) + 2o(17)
The above equation is derived in the appendix.
The MTTDL of HiRAID is obtained from Equation (6), E-
quation (11), and Equation (13) by replacing with corre-
sponding safety probability function P(n) and d = i+j ��1.
The inverse fundamental matrix M��1 of HiRAID has very
big dimension, and the MTTDL is derived from the inverse
of M��1. It is hard to derive symbolically the inverse matrix,
hence we choose the numerical method.
3.3 Comparison
Fig.7 shows the MTTDLs of RAID and HiRAID. The MT-
TR has been chosen as one week in the gure. It takes just a
16 25 36 48 64 81 100 121 144
100
102
104
106
108
1010
the number of TOTAL disks [ea]
MTTDL [days]
Level 5×1
Level 5×5
N-4
Level 1
Level 1×5
N-3
Level 0×5
Level 6
Level 5
Level 5×0
Level 0
9 16 25 36 48 64 81 100 121 144
100
102
104
106
108
1010
the number of INFORMATION disks [ea]
MTTDL [days]
Figure 7: The MTTDLs of RAID and HiRAID when
the MTTR is one week, the MTTF is ve years,
and i = j for the HiRAID Level 5 0, 0 5, and
5 5. \N-3" and \N-4" denote a three disk failures
and a four disk failures tolerant imaginary RAID,
respectively, which MTTDL can be obtained from
Equation (13) by replacing with d = 4 and P(n) = 1
for all n. Level 0 is a disk array without redundant
disk, which MTTDL equals MTTF/N.
few hours for a failed disk to be reconstructed if spare disks
are prepared. However, if the system has no spare disk and
no daily-service, it would take from several days to several
weeks to prepare available new disk module. We have cho-
sen the MTTF as ve years. Disks in a heavy-load system
face more frequent failures, even though disk manufacturers
insist a MTTF of about 23 years. Fig.8 shows the MTTDLs
providing more fair comparison when the MTTR is one day
and the MTTF is 23 years.
The HiRAID Level 5 1 oers the best reliability. The
HiRAID Level 5 5 oers better reliability than the three
disk failures tolerant RAID that can be hardly implemented
in a real system due to its complexity, As the number of
disks increases, the reliability of the HiRAID decreases more
slowly than those of the RAIDs. The HiRAID Level 5 5
can only tolerate more than three disk faults that occur in
restricted disk locations, while can tolerate disk errors that
occur in any arbitrary three disks without regard to their
locations. Accordingly the HiRAID Level 55 always oers
better reliability than the three disk failures tolerant RAID.
We can nd out that the Level 5+1 (Level 1 5) is less
reliable than the Level 1 and the Level 5 1, hence the
reciprocal conguration of the Level 1 5 is more reliable.
The Level 0 5 is also more reliable than the Level 5 0.
16 25 36 48 64 81 100 121 144
100
105
1010
1015
the number of TOTAL disks [ea]
MTTDL [days]
Level 5×1
Level 5×5
N-4
Level 1
Level 1×5
N-3
Level 0×5
Level 6
Level 5
Level 5×0
Level 0
9 16 25 36 48 64 81 100 121 144
100
105
1010
1015
the number of INFORMATION disks [ea]
MTTDL [days]
Figure 8: TheMTTDLs of RAID and HiRAID when
the MTTR is one day, the MTTF is 23 years, and
i = j for the HiRAID Level 5 0, 0 5, and 5 5.
In this comparison of MTTDLs, we have not considered its
support components such as controller, communication line,
and etc. The HiRAID Level 55 tolerates errors that occur
in any arbitrary lower level controller, any line between an
upper level controller and a lower level controller, and any
line between a lower level controller and a disk. It cannot
tolerate upper level controller error, however the redundant
controller scheme of current RAID techniques can resolve
the problem.
4. PERFORMANCE
While the disk utilization of the HiRAID Level 51 is poor,
less than 50%, the HiRAID Level 55 oers reasonable disk
utilization, (i��1)(j��1)=(ij), which is 81% when i = j = 10.
Furthermore, its performance is not bad and similar to that
of the RAID Level 5.
Fig.9 and Fig.10 described by UML [6] show the small-write
operations of the RAID Level 5 and the HiRAID Level 55.
The complexity of the HiRAID Level 5 5 is about three
times of that of the RAID Level 5, however the operations
of the HiRAID disperse throughout three RAID controllers.
From Fig.9 (b), the operation of the RAID Level 5, O5, can
be written as
O5 = 3 W:tr + 2 R:tr +2 R:c +XOR
W.tr, R.tr, R.c, and XOR denote write-transfer, read-transfer,
read-command, and exclusive OR operation, respectively.
And from Fig.10 (b), the operations of controller1, con-
troller2, controller3 of the HiRAID Level 55 can be written
data D1 data D2 parity P1
new data D1
buffer
W2
R1
W3
host computer 1 RAID controller 1
W1.
tr
W1.
tr
W1.c
R1.c
R1.tr
X
O
R2 R
(a) (b)
W1.completion
W1
R2.c
W3.
tr
W2.
tr
R2.tr
the end of all
operations
upper channel:
host interface
lower channel:
disk interface
concurrent
internal
transfer of
two disks
concurrent
internal
transfer of
two disks
Figure 9: The small-write operation of the RAID
Level 5. W.tr, R.tr, R.c, and XOR denote write-
transfer, read-transfer, read-command, and exclu-
sive OR operation, respectively. Diagram (b) is a
sequence diagram matching to Diagram (a)
as, respectively
Oc1 = 3W:tr + 2 R:tr + 2 R:c + XOR
Oc2 = 3W:tr + 3 R:tr + 3 R:c + XOR
Oc3 = 3W:tr + 3 R:tr + 3 R:c + XOR
The controller2 and the controller3 have one more read-
transfer and read-command comprising single read trans-
action. The additional read transaction resides in the upper
channel interfacing the upper level RAID, however it does
not bottleneck the overall performance because the lower
channel has more read/write transactions. Also, the large-
write and read operation of the HiRAID Level 5 5 are
similar to that of the RAID Level 5 except slight extra la-
tency that it takes for commands to pass through the upper
level RAID controller.
Latency can be dened by elapsed time between a command
of a host computer and the completion response to the com-
mand because a next command can be transmitted after
completion response of a storage controller. Small write
latency of HiRAID is elapsed time from write-command
(W1.c) of the host computer to completion response (W1.
completion) of the RAID controller1 (Refer to Fig.10). Af-
ter completion response, residual operations of lower level
controllers are still remained. Even though the controller1
receives a next command from the host computer after com-
pletion response, prompt execution can be performed on the
next command without data inconsistency.
Accordingly, from Fig.10, small write latency of the HiRAID
Level 5 5 is \R1.c + R2.c + Int + R3.tr + R1.tr + R2.tr
+ W2.tr + XOR + W3.tr" and small write latency of the
RAID Level 5 is \R1.c + R2.c + Int + R1.tr + R2.tr +W2.tr
+ XOR + W3.tr", where \Int" denotes internal transfer
time of a disk drive. Small write latency of the HiRAID
Level 55 has just one more read-transfer (R3.tr) than the
RAID Level 5.
5. PROPERTY
new data D1
buffer
W2
R1
W3
R2
W1
data D1
parity P1 for
D1 & D2
data D2
data D1 data D3
parity for
D1 & D3
buffer
W6
W7
R4
W4
parity P1 data D4
parity for
P1 & D4
buffer
W8
W9
R6
R3
W5
host computer 1 RAID controller 1
W1.
tr
W1.
tr
W1.c
R1.c
R1.tr
X
O
R
W3.
W1.completion tr
R5
RAID controller 2 RAID controller 3
R2.c
R1.tr R5.tr
R2.tr
W5.
tr
W4.
tr
R5.c
R3.c
R1.c
R2.c
R2.tr
W2.
tr
R4.c
R6.c
R3.tr
R4.tr
R6.tr
W6.
tr
W8.
tr
X
O
R
X
O
R
W7.
tr
W9.
tr
(a)
(b)
the end of all
operations
upper channel
lower channel
RAID controller1
RAID controller2 RAID controller3
Figure 10: The small-write operation of the HiRAID
Level 55. W.tr, R.tr, R.c, and XOR denote write-
transfer, read-transfer, read-command, and exclu-
sive OR operation, respectively. Diagram (b) is a
sequence diagram matching to Diagram (a).
Table 2: The probability P(4) that there is no data
loss when 4 disks fail simultaneously
P(4) j
i 2 3 4 5 6
2 0 0.400 0.486 0.524 0.545
3 0.800 0.768 0.782 0.780 0.779
4 0.914 0.891 0.881 0.876 0.873
5 0.952 0.934 0.926 0.921 0.918
6 0.970 0.956 0.949 0.945 0.943
The RAID Level 55 has very excellent reliability and good
performance, and it has remarkable several properties.
In the HiRAID Level 5 5, i > j generally oers better
reliability than i < j. Fig.7 shows that the MTTDL of
the HiRAID Level 5 1 is always larger than that of the
HiRAID Level 1 5. The Level 5 5 is the Level 5 1
when j = 2, the Level 1 5 when i = 2. We can also see
the property from the safety probability function P(n). As
the value of the P(n) is larger, the MTTDL increases. For
example, Table 2 shows the probabilities when n is 4. The
value of P(4) is marked in bold, when i j = 12. P(4) is
always larger if i > j. We can nd this property of P(n) for
all n. According to the above characteristics, i > j generally
oers better reliability than i < j.
Lemma 1. Disk utilization of the RAID Level 5 5 that
the dimension parameters i and j are equal to each other is
best.
Proof. If the number of total disks, N, equals ij, and
i = j, the utilization U1 is
U1 =
(i �� 1)(j �� 1)
ij
=
N �� 2pN + 1
N
= 1 ��
2
pN
+
1
N
when i 6= j, the utilization U2 can be written as
U2 =
N �� i �� N=i +1
N
= 1 ��
i
N ��
1
i
+
1
N
U1 �� U2 = 1 ��
2
pN
+
1
N �� 1 +
i
N
+
1
i ��
1
N
=
(i �� pij)2
iN 0
Hence, U1 is always larger than U2 if i 6= j.
Lemma 2. X Y is not commutative.
Proof. The HiRAID Level 5 1 shows better re-
liability than the HiRAID Level 1 5 in Fig.7, thus
X Y is not commutative.
6. CONCLUSIONS
We have formally introduced the mean-time-to-data-
loss (MTTDL) of traditional RAID and HiRAID us-
ing Markov process. And we have presented hierar-
chal RAID oering high reliability and good perfor-
mance. On the other hand, a traditional RAID oer-
ing good reliability shows bad performance. The Hi-
RAID has the following advantages over traditional
RAID and other methods proposed for improving
reliability.
1. The HiRAID Level 5 5 tolerates disk errors
that occur in any arbitrary three disks. Its
MTTDL is about 106 times of that of the RAID
Level 5 when MTTF equals 23 years, MTTR is
one day, and the number of information disks
equals 100.
2. Its performance is similar to that of the RAID
Level 5 except slight extra latency that it takes
for commands to pass through the upper level
RAID controller.
3. It can be composed of traditional RAID con-
trollers without modication, not requiring ad-
ditional algorithms.
4. It oers reasonable disk utilization that is (i �� 1)(j �� 1)=(ij), which is 81% when i = j = 10.
HiRAID shows the above good merits, however it
is not suitable for all application of a storage sys-
tem. An application that requires so many disks,
just moderate bandwidth, and suÆcient reliability
is appropriate for it. Also, it requires several con-
trollers, so the price of controller should be low in
order that HiRAID be applied to a practical system.
Meanwhile, we have introduced the sophisticated
methodology to get the correct reliabilities of vari-
ous disk array systems using Markov process. Also,
our analysis shows the probability of disk failure re-
covery that can occur in a restricted disk location
of a disk array such as the RAID Level 1.
7. REFERENCES
[1] R. Billinton and R. N. Allan. Reliability
Evaluation of Engineering System: Concepts
and Techniques. Perseus Publishing, 1992.
[2] M. Blaum, J. Brady, J. Bruck, and J. Menon.
Evenodd: An eÆcient scheme for tolerating
double disk failures in raid architectures. IEEE
Trans. Computers, 44(2):192{202, February
1995.
[3] M. Blaum, J. Bruck, and A. Vardy. Mds array
codes with independent parity symbols. IEEE
Trans. Information Theory, 42(2):529{542,
March 1996.
[4] W. A. Burkhard and J. Menon. Disk array
storage system reliability. In 23rd Int. Symp.
on Fault-Tolerant Computing, pages 432{441.
IEEE Comput. Soc., August 1993.
[5] P. M. Chen, G. A. G. Edward K. Lee, R. H.
Katz, and D. A. Patterson. Raid:
High-performance, reliable secondary storage.
ACM Computing Surveys, 26(2):145{184, June
1989.
[6] B. P. Douglass. Real-Time UML: Developting
EÆcient Objects for Embedded Systems.
Addison Wesley, 1998.
[7] G. A. Gibson, L. Hellerstein, R. M. Karp,
R. H. Katz, and D. A. Patterson. Failure
correction techniques for large disk arrays. In
Third Int. Conf. on Architectural Support for
Programming Languages and Operating System,
pages 123{132. ACM, 1989.
[8] R. H. Katz, G. Gibson, and D. A. Patterson.
Disk system architectures for high
performance computing. In Proceedings of the
IEEE, pages 1842{1858. IEEE, December 1989.
[9] B. E. Mann, P. J. Trasatti, M. D. Carlozzi,
J. A. Ywoskus, and E. J. McGrath. Loosely
coupled mass storage computer cluster. United
States Patent { 5,862,312, January 1999.
[10] G. Myers, A. Yu, and D. House.
Microprocessor technology trends. In
Proceedings of the IEEE, pages 1605{1622.
IEEE, December 1986.
[11] C. I. Park. EÆcient placement of parity and
data to tolerate two disk failures in disk array
systems. IEEE Trans. on Parallel & Distributed
Systems, 6(11):177{184, November 1995.
[12] C. W. Park and Y. Y. Han. EÆcient
placement of parity and data to tolerate two
disk failures in disk array systems. Lecture
Notes in Computer Science { Asian 2000,
1961:58{68, November 2000.
[13] D. A. Patterson, P. Chen, and R. H. Katz.
Introduction to redundant arrays of
inexpensive disks (raid). In 34th IEEE
Computer Society International Conference,
pages 112{117. IEEE Comput. Soc., 1989.
[14] D. A. Patterson, G. A. Gibson, and R. H.
Katz. A case for redundant arrays of
inexpensive disks (raid). In SIGMOD Int.
Conf. Data Management, pages 109{116.
ACM, September 1988.
[15] RAB. The RAIDBook: A Source Book for
RAID Technology sixth edition. The RAID
Advisory Board, Lino Lakes MN, 1999.
[16] A. L. L. Reddy and P. Banerjee. An evaluation
of multiple-disk i/o systems. IEEE Trans.
Computers, 38(12):1680{1690, December 1989.
[17] K. Salem and H. Garcia-Molina. Disk striping.
In Int. Conf. of Data Engineering, pages
336{342. IEEE Comput. Soc., 1986.
[18] J. Wilkes, R. Golding, C. Stealin, and
T. Sullivan. The hp autoraid hierarchical
storage system. ACM Trans. Computer
Systems, 14(1):108{36, February 1996.
APPENDIX
A. THERELIABILITYFUNCTIONOFTHE
RAID LEVEL 5
Using continuous Markov processes [1], we can de-
rive the reliability function R5(t) of the RAID Lev-
el 5. We rst assume independent and exponential
failure/repair rates.
P0
1(t); P0
2(t); P0
3(t)
= P1(t); P2(t); P3(t)2
4
��N N 0
��f(N �� 1) + g (N �� 1)
0 0 0 3
5
(18)
P1(t) is the probability that all disks are in an op-
erative state at time t, P2(t) is the probability that
only one disk fails at time t, and P3(t) is the prob-
ability that the system is failed by two disk fail-
ures. Assume that the system starts in state 1, then
P1(0) = 1, P2(0) = P3(0) = 0. This set of dierential E-
quation (18) can be solved using Laplace transforms.
sP1(s) �� P1(0) = ��NP1(s) + P2(s)
sP2(s) �� P2(0) = NP1(s) ��f(N �� 1) + gP2(s)
From the above equations,
P1(t) =
[s1 �� f(N �� 1) + g]e��s1t + f(N �� 1) + �� s2ge��s2t
s1 �� s2
P2(t) = �� Ne��s1t + Ne��s2t
s1 �� s2
where
s1 =
1
2 n(2N �� 1) + +p2 + (4N �� 2) + 2o
s2 =
1
2 n(2N �� 1) + ��p2 + (4N �� 2) + 2o
The reliability of the system is the sum of two prob-
ability functions of safe state.
R5(t) = P1(t) + P2(t) =
s1e��s2t �� s2e��s1t
s1 �� s2
The MTTDL can also be derived from the reliability
function.
MTTDL5 = Z 1
0
R5(t)dt =
2(N �� 1) +
N(N �� 1)2 (19)
The MTTDL of the HiRAID Level 0 5 can be de-
rived from not only Equation (1), but also the reli-
ability function. the HiRAID Level 0 5 is a series
system of the RAID Level 5, hence the MTTDL of
the HiRAID Level 0 5 comprised of i groups of
the RAID Level 5 can be written as Equation (20),
where N equals j. The MTTDL of the mirroring
can be also calculated from Equation (20), where i
equals the number of information disks and N equals
2.
R05(t) = fR5(t)gi
MTTDL05 = Z 1
0
R05(t)dt
(20)