Ecient parity placement schemes for tolerating up to two disk
failures in disk arrays
Nam-Kyu Lee a, Sung-Bong Yang a,*, Kyoung-Woo Lee b
a Department of Computer Science, Yonsei University, 134 Shinchon-dong, Secodaemum-ku, Seoul 120-749, South Korea
b LG Electronics Inc., Seoul, South Korea
Received 13 May 1998; received in revised form 10 June 2000; accepted 1 September 2000
In order to achieve high reliability in disk array systems, two new schemes using dual parity placement, called DH1
(diagonal±horizontal) and DH2 schemes, are presented. Both DH schemes can tolerate up to two disk failures by using
two types of parity information placed in the diagonal and the horizontal directions, respectively, in a matrix of disk
partitions. DH1 scheme can reduce the occurrences of the bottleneck problem signi®cantly because the parity blocks are
evenly distributed throughout the disk array. DH2 scheme uses one more disk than DH1 scheme in order to store the
horizontal parities, while the diagonal parities are placed in the same way as in DH1 scheme with a minor change. Even
though both DH schemes use almost optimal disk space for storing the redundant information, the encoding algorithms
for them are quite simple and ecient. Moreover, both DH schemes can recover rapidly from any two disk failures.
Ó 2000 Elsevier Science B.V. All rights reserved.
Keywords: Disk arrays; DH schemes; Parity placement methods; Fault tolerence; Two disk failures
In order to construct an optimal storage system, it should provide both high performance and high
reliability. A disk array with a group of disks is able to improve I/O performance by accessing several disks
in parallel [1±3]. To achieve the more performance improvement in disk arrays, many researches have been
done such as the optimal stripe unit size , data striping and parity placement [5,6], and data prefetching
and catching . However, the reliability issue should be more seriously taken into account than I/O
performance, because even a single disk failure may cause a catastrophic result to the overall disk array
system . In order to increase the reliability of the system, some redundant information, called parities, on
the data in a disk array should be encoded. And the parities should be placed in the disk array along with
the data for recoveries from disk failures.
Nowadays, a disk array system usually has a large group of disks, each of which has an enlarged capacity
owing to the current technology. However, as the number of disks in a system increases, so does the
Journal of Systems Architecture 46 (2000) 1383±1402
* Corresponding author. Tel.: +82-02-3612716; fax: +82-02-3652579.
E-mail address: email@example.com (S.-B. Yang).
1383-7621/00/$ - see front matter Ó 2000 Elsevier Science B.V. All rights reserved.
PII: S 1 3 8 3 - 7 6 2 1 ( 0 0 ) 0 0 0 3 1 - X
probability of the occurrence of a disk failure. Therefore, disk failures in a disk array may occur more
frequently in a disk array system with a large number of disks [1,9±11], that is, once a disk fails in the
system, the next failure on another disk is highly likely to occur in the very near future. Thus, the failed disk
should be rapidly reconstructed (or decoded) to prevent catastrophic failures to the system. For a ®xed
amount of information to recover, the lower the complexity of a decoding algorithm gets, the shorter the
reconstruction time a disk array takes.
Many researches on designing a reliable disk array system with high I/O performance have been done
[9,11±15]. MacWilliams and Sloane gave two methods exploiting the extended hamming code and the
Reed-Solomon code. The extended hamming code has a capability of double erasure correcting as well as
triple-erasure-correcting, however this code shows a higher update penalty . This code can attain low
check disk overhead by using only two check disks. However, highly-integrated encoding and decoding chip
sets should be used due to the complexity of Reed-Solomon code . In addition, the method suers from
the high probability of the in®nite error propagation and has an overhead redundancy at the end of the
data because of using the convolutional type of code . Gibson [12,13] presented a method using the 2D
parity scheme to decrease update penalty. Even though the 2D parity scheme changes only two check disks
for updating a data block, this scheme has higher check disk overheads . Moreover, the three methods
aforementioned have a common drawback that the severe bottleneck problem can occur when updating a
check disk, since they use designated check disks to store the redundant information.
In order to cope with frequent disk failures, a few schemes tolerating up to two disk failures at the same
time have been recently suggested in [9,11,15]. Park has given a method which alleviates the bottleneck
problem by distributing the parity blocks over the disk array space . However, the overhead for the
management of the redundancy matrix that holds the redundant information cannot be ignored since the
redundant matrix should be searched for the positions of the associated blocks whenever updating a data
block or reconstructing failed disks is done. Blaum et al.  have developed a scheme called EVENODD
which has better performance than the methods described so far. EVENODD scheme uses optimal redundant
storage while tolerating up to two disk failures. In EVENODD scheme, however, when the data
blocks contained in diagonal S are to be updated, every diagonal parity block should also be updated.
Moreover, Alvarez  proposed a method that can tolerate multiple disk failures in a disk array. However,
the method has a defect that a large amount of redundant information should be included in the disk array.
In this paper, we present a couple of new ecient parity placement methods called DH (diagonal±
horizontal) 1 and DH2 schemes. Both are dual parity schemes based on the diagonal and horizontal parities
for tolerating up to two disk failures in a disk array. DH1 scheme places the horizontal parities in the
antidiagonal direction and the diagonal parities in the horizontal direction. DH1 scheme does not use any
dedicated parity disks. Therefore, the bottleneck problem occurring when updating the parities can be
avoided in DH1 scheme. DH2 scheme uses one more disk than DH1 scheme to store the horizontal parities
in the vertical direction, that is the horizontal parities are stored into a disk. This allows for DH2 scheme to
reduce the overheads in updating a certain parity block. Therefore, when small writes are frequently required,
DH2 scheme performs better than DH1 scheme.
DH1 and DH2 schemes can be implemented by pure software technology. In other words, they use only
exclusive-or (XOR) operations that are already used in the existing disk array systems for computing
parities, and can be implemented without modifying the hardware. Moreover, they are simpler and more
ecient than other schemes that tolerate two disk failures. If two controllers are available in the disk array
system, the reconstruction time for any pair of failed disks can be reduced signi®cantly, since the decoding
algorithms for both DH schemes allow two dierent directions of the recoveries at the same time. Hence,
the probability of an occurrence of catastrophic failures in the system can be lowered.
Gibson  showed that 2D scheme performs better than the extended hamming code even though it has
a higher check disk overhead. Blaum et al.  also showed that EVENODD scheme is more ecient than
both 2D and Reed-Solomon code schemes. The redundancy matrix scheme in  requires more parity
1384 N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402
space than EVENODD scheme, and the management overheads for the redundancy matrix cannot be
negligible. Therefore, we have compared our DH schemes only with EVENODD scheme for performance
analysis. The analysis has shown that the encoding schemes for both DH schemes are more ecient than
EVENODD scheme. It has also shown that reconstruction using either DH1 or DH2 scheme has taken
shorter time for any two failed disks than EVENODD scheme. The ratio of the number of the parity blocks
to that of the overall blocks (data and parity blocks) in a given disk array for each scheme is also compared.
It has been shown that both DH schemes use near optimal disk space for storing the redundant information.
The rest of this paper is organized as follows. In Section 2, the encoding method of DH1 scheme for the
parity placement is described. The decoding algorithm for recovering from disk failures is also explained.
Section 3 illustrates both the encoding and decoding methods of DH2 scheme. In Section 4, the performances
of DH1 and DH2 schemes are evaluated by comparing with EVENODD scheme.
2. DH1 scheme
In order to tolerate up to two disk failures, DH1 scheme places the horizontal parities in the diagonal
direction and the diagonal parities in the horizontal direction. When a single disk failure occurs, DH1
scheme reconstructs it easily by using either the horizontal or the diagonal parities. In the case of two disk
failures, the faulty blocks can be sequentially reconstructed by using both the diagonal and the horizontal
2.1. Encoding method
For DH1 scheme, we assume that there are N disks in a disk array, where N is a prime number as the
same as in [9,15]. N must be prime in order for the decoding algorithm to reconstruct two failed disks
successfully. In DH1 scheme a disk array can be considered as an N ÿ 1 N matrix, A, where each disk
contains (N ÿ 1) logical blocks. Let Ai; j refer to the block in the ith row and in the jth column of the
matrix A, where 06i6N ÿ 2 and 06j6N ÿ 1, that is, Ai; j denotes the ith logical block of diskj. In order
to simplify the description for the encoding and decoding algorithms of both DH schemes, the following
notations and de®nitions are used. Let Ai; j%N be AimodN; jmodN, and k1; k2; . . .%N denote
(k1 modN; k2 modN; . . .). We also de®ne an error correcting group as a group of blocks such that if any
block in the group has a fault, the block can be reconstructed by XORing the rest of the blocks in the
Each row in the matrix A can then be de®ned as a horizontal error correcting group (HECG). The ith
row of A; 06i6N ÿ 2, can be denoted as HECGi. The horizontal parity Hi of HECGi can be obtained
with Eq. (1) and is placed into Ai; N ÿ 2 ÿ i as shown in Fig. 1, where N 7. In this ®gure row i holds
N ÿ 1 i's and an Hi in order to indicate each element (block) of the row belongs to HECGi.
Fig. 1. Placement of the horizontal parities and HECGs for DH1 scheme.
N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402 1385
Ai; j; where 06i6 N ÿ 3 and j 6 N ÿ 2 ÿ i: 1
That is, Hi is computed by XORing the ith data block of each disk except the ith block of diskNÿ2ÿi. Observe
that there are N ÿ 1 data blocks and a horizontal parity block in each HECG.
Another type of an error correcting group called a diagonal error correcting group (DECG) is also
required for tolerating two disk failures. Let DECGj; 06j6N ÿ 1, be the blocks that are located in the
antidiagonal direction as shown in Fig. 2. The diagonal parity Dj of DECGj can be calculated with Eq. (2)
and is placed into AN ÿ 2; j. In this ®gure DECGj consists of the elements holding j or Dj.
AN ÿ 3 ÿ i; j 1 i%N ; where 06j6 N ÿ 1: 2
That is, Dj is computed by XORing the data blocks in DECGj. Observe that the last block of disk j holds
Dj, and the number of the blocks in each DECG is N ÿ 1 while each HECG has N blocks.
Fig. 3 shows the locations of the horizontal and the diagonal parities along with HECGs and DECGs in
DH1 scheme by combining Figs. 1 and 2, where N 7. In Fig. 3 the last block of disk0 holds D0 which is the
diagonal parity for Hi's, where 06i6 N ÿ 3. And D0 happens to be the horizontal parity for Dj's, where
16j6 N ÿ 1. Hence D0 also plays the role of HNÿ2. The following lemma proves that D0 HNÿ2:
Lemma 1. The diagonal parity D0 and the horizontal parity HNÿ2 are always the same.
Proof. By Eq. (1), HNÿ2 is obtained as follows:
AN ÿ 2; j
The above equation can be unfolded and then each term of the right-hand side of the above equation can be
replaced by the terms using Eq. (2).
Fig. 2. Placement of the diagonal parities and DECGs for DH1 scheme.
Fig. 3. The diagonal and horizontal parity placements in DH1 scheme.
1386 N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402
HNÿ2 AN ÿ 2; 1 AN ÿ 2; 2 AN ÿ 2; 3
AN ÿ 2; N ÿ 1
D1 D2 D3 DNÿ1
AN ÿ 3; 2 AN ÿ 4; 3
A0; N ÿ 1 = for D1 =
AN ÿ 3; 3 AN ÿ 4; 4
A0; 0 = for D2 =
AN ÿ 3; 4 AN ÿ 4; 5
A0; 1 = for D3 =
. . .
AN ÿ 3; 0 AN ÿ 4; 1
A0; N ÿ 3 = for DNÿ1 =:
When the above terms are rearranged in row major order, HNÿ2 can be rewritten as in the following:
HNÿ2 AN ÿ 3; 0 AN ÿ 3; 2
AN ÿ 3; 3 AN ÿ 3; N ÿ 1
AN ÿ 4; 0 AN ÿ 4; 1
AN ÿ 4; 3 AN ÿ 4; N ÿ 1
. . .
A0; 0 A0; 1 A0; 2
A0; N ÿ 3 A0; N ÿ 1
AN ÿ 3; 1 AN ÿ 4; 2
A0; N ÿ 2
HNÿ3 HNÿ4 H0
In the following example the encoding of DH1 scheme is illustrated when the number of disks is 7.
Example 1. In the ®gure below, each data block contains an arbitrary binary digit for the simplicity.
Empty blocks (elements) in the ®gure are the places where the horizontal and diagonal parities are to be
N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402 1387
Using Eqs. (1) and (2), the horizontal and the diagonal parities are calculated and deposited to the
prede®ned locations as follows.
2.2. Decoding method
If a single disk has a failure in a disk array encoded with the encoding algorithm of DH1 scheme, the disk
can be recovered easily using either the horizontal or the diagonal parities. When two disk failures in the
disk array have occurred at the same time, DH1 scheme performs error correction using the diagonal and
the horizontal parities alternately. First, the recovery procedure is initiated with a certain DECG. Among
DECGs, there exist two groups each of which has only one faulty block. This fact will be proved in the
following subsection. Thus, one of the two DECGs can be selected arbitrarily for initiating the recovery
procedure. The faulty block can then be reconstructed with the rest of the blocks in the DECG. After the
reconstruction of the faulty block, the HECG to which the recovered block belongs now has only one faulty
block. The blocks with unfaulty information in the HECG can be used to reconstruct the faulty block. The
block just recovered with the HECG allows the recovery of the next faulty block using another DECG.
Using DECGs and HECGs repeatedly in this way, all the rest of the faulty blocks of two failed disks can be
entirely recovered. The following algorithm shows the error recovery procedure of DH1 scheme when two
disks are faulty at the same time.
Decoding algorithm 1. Assume that two disks, diskj1 and diskj2 , are faulty at the same time, where
06j1 < j2 6 N ÿ 1:
In the following example the above algorithm is applied to a disk array with 7 disks. Each block in the
disk array holds an arbitrary bit for the simplicity.
Example 2. Assume that disk1 and disk3 have failed at the same time, when a disk array has 7 disks. Then
the following table can be viewed as a snap shot before decoding.
Step 1: k j2 ÿ j1;
for (m 0; m6 N ÿ 2;m) f
Step 2: /* r ®nds the row index such that Ar; j2 can be recovered in step 3. */
r N ÿ 1 ÿ k m 1%N ;
Step 3: /* reconstruct Ar; j2 using the DECG to which Ar; j2 belongs */
Ar; j2 aNÿ2
l0 AN ÿ 2 ÿ l; j1 1 ÿ mk l%N , where j1 1 ÿ mk l%N 6 j2;
Step 4: /* reconstruct Ar; j1 using the HECG to which Ar; j2 belongs */ Ar; j1 aNÿ1
l0 Ar; l;
where l 6 j1
1388 N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402
In the above ®gure, ?s indicate that disk1 and disk3 are faulty. In step 1 of Algorithm 1 we get
k j2 ÿ j1 2. Then in the for-loop, the following are executed varying m from 0 to N ÿ 2, that is, steps 3
and 4 can be given as the following equations after r has been replaced:
A4 ÿ 2m; 3%7 a
A5 ÿ l; 2 ÿ 2m l%7;
A4 ÿ 2m; 1%7 a
A4 ÿ 2m; l%7:
The following table shows the sequence of the recovery of each faulty blocks in disk1 and disk3 as the forloop
Observe that the following ®gure is the same as the original disk array in the previous example.
m Calculations Results
0 A4; 3 A5; 2 A3; 4 A2; 5 A1; 6 A0; 0 1
A4; 1 A4; 0 A4; 2 A4; 3 A4; 4 A4; 5 A4; 6 1
1 A2; 3 A5; 0 A4; 1 A3; 2 A1; 4 A0; 5 1
A2; 1 A2; 0 A2; 2 A2; 3 A2; 4 A2; 5 A2; 6 1
2 A0; 3 A5; 5 A4; 6 A3; 0 A2; 1 A1; 2 1
A0; 1 A0; 0 A0; 2 A0; 3 A0; 4 A0; 5 A0; 6 0
3 A5; 3 A4; 4 A3; 5 A2; 6 A1; 0 A0; 1 1
A5; 1 A5; 0 A5; 2 A5; 3 A5; 4 A5; 5 A5; 6 1
4 A3; 3 A5; 1 A4; 2 A2; 4 A1; 5 A0; 6 1
A3; 1 A3; 0 A3; 2 A3; 3 A3; 4 A3; 5 A3; 6 0
5 A1; 3 A5; 6 A4; 0 A3; 1 A2; 2 A0; 4 0
A1; 1 A1; 0 A1; 2 A1; 3 A1; 4 A1; 5 A1; 6 1
N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402 1389
2.3. Correctness of the decoding method
Let a disk array have N disks, where N is prime. Also, let two failed disks be diskj1 and diskj2 , and
k j2 ÿ j1 where j1 < j2. If the encoding method of DH1 scheme has been applied to the disk array, then
the following lemmas and a theorem prove that the decoding algorithm for DH1 scheme always recovers
successfully from two disk failures.
Lemma 2. Among DECGs of the disk array described above, there exist two DECGs such that each of theses
DECGs has only one faulty block.
Proof. When two disks have failed, the total number of the faulty blocks is 2 N ÿ 1, since there are N ÿ 1 blocks in each disk. Observe that there are N DECGs in the disk array and each DECG consists of N ÿ 1 blocks. It can be easily seen that each DECG has at most two faulty blocks since each block in a DECG is
placed into a dierent disk. Therefore, there are at most 2N faulty blocks. But we know that there are
exactly 2 N ÿ 1 faulty blocks. Hence there should exist some DECGs that have less than two faulty blocks.
However, it is not possible that a DECG has no faulty blocks and each of the rest of DECGs has two faulty
blocks, because each block in a DECG is placed into a dierent disk, and the blocks in each DECG span
N ÿ 1 disks. This implies that each DECG has at least one faulty block. Therefore, there should exist two
DECGs each of which has only one faulty block.
The following lemma shows that DECGj11 is one of the two DECGs which has only one faulty block.
Lemma 3. DECGj11 has only one faulty block which is in AN ÿ 1 ÿ k; j2%N .
Proof. If we look into diskj11 which is the next disk to diskj1 , the last block of diskj11 is supposed to contain
Dj11. Recall that Dj11 was obtained during the encoding by XOR operations over the rest of the blocks in
DECGj11 (refer to Eq. (2)). Since each DECG has N ÿ 1 blocks and each block in a DECG is located at
dierent disk in the antidiagonal direction as shown in Fig. 2, none of the blocks in DECGj11 is in diskj1
and only one faulty block b in DECGj11 is in disk j2.
We claim that the block, b, is in AN ÿ 1 ÿ k; j2%N . Since the blocks in DECGj11 are placed into from
AN ÿ 2; j1 1 in the antidiagonal direction, if j2 j1 1, the block b is in AN ÿ 2; j2. In general, the
block b is AN ÿ 2 ÿ k ÿ 1; j1 k%N , since j2 j1 k, and since the blocks in DECGj11 are placed in the
antidiagonal direction. Therefore, the block b is AN ÿ 1 ÿ k; j1 k%N AN ÿ 1 ÿ k; j2%N .
Note that the other DECG that has only one faulty block can be found to be DECG j21%N similarly. In
the decoding algorithm of DH1 scheme, the row index r is obtained to reconstruct the faulty block Ar; j2 in
the beginning of the recovery. After the block has recovered, the faulty block of diskj1 in the same row can
be ®xed. Then the next row index is calculated to reconstruct the faulty block of diskj2 in another row, and
so on. If we enumerate these row numbers that the decoding algorithm traverses during the recovery, the
enumeration sequence r can be given as follows:
r N ÿ 1 ÿ k; N ÿ 1 ÿ 2k; N ÿ 1 ÿ 3k; . . . ;
N ÿ 1 ÿ N ÿ 1k%N ; where k j2 ÿ j1:
In the following lemma we prove that the row numbers are unique and are within the range between 0
and N ÿ 2. For the proof of the lemma, we use the following fact which was proven in . The following N
numbers in r0 are distinct each other, when N and k are relatively prime.
1390 N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402
r0 0modN; k modN; 2k modN; . . . ;
N ÿ 2kmodN; N ÿ 1k modN
0; k; 2k; 3k; . . . ; N ÿ 2k; N ÿ 1k%N :
Therefore, the numbers in r0 can be rewritten as follows, since each number has been obtained by modN
and is distinct each other.
0; 1; 2; . . . ; N ÿ 2; N ÿ 1:
Lemma 4. Each number in r is unique and the numbers in r can be reordered as follows:
0; 1; 2; . . . ; N ÿ 2; N ÿ 1:
Proof. The sequence r0 can be converted into the following sequence while the numbers in this new sequence
are distinct each other.
r00 N ÿ 0; N ÿ k; N ÿ 2k; . . . ; N ÿ N ÿ 2k;
N ÿ N ÿ 1k%N :
The numbers in the above sequence are merely shifted of their positions within the sequence r0. We now
shift each number in r00 as follows once more:
r000 N ÿ 1 ÿ 0; N ÿ 1 ÿ k; N ÿ 1 ÿ 2k; . . . ;
N ÿ 1 ÿ N ÿ 2k; N ÿ 1 ÿ N ÿ 1k%N :
The numbers in the sequence r000 are still distinct each other, because each number has been merely
subtracted by 1 and modN has been applied to each number. Hence the numbers in r000 can be reordered
from 0 to N ÿ 1. The sequence r is, therefore, equivalent to r000 except for the ®rst term of r000, i.e.,
N ÿ 1 ÿ 0%N . But the value of this term is N ÿ 1. Therefore, the numbers in r can be rewritten as
We now prove the following theorem for the correctness of the decoding algorithm in DH1 scheme.
Theorem 1. Let a disk array have N disks, where N is prime. If the disk array is constructed with the encoding
method of DH1 scheme, then Algorithm 1 can recover from any two disk failures successfully.
Proof. When two disks are faulty at the same time, each row of the disk array has exactly two faulty blocks.
Lemma 3 has shown that DECGj11 has only one faulty block, AN ÿ 1 ÿ k; j2%N . Therefore, the faulty
block can be reconstructed by XORing the remaining blocks in DECGj11. Then the faulty block
AN ÿ 1 ÿ k; j1%N in disk j1 in the same row can be reconstructed by using the unfaulty blocks in
HECGNÿ1ÿk . It can be seen that the reconstructed block AN ÿ 1 ÿ k; j1%N belongs to DECG j11ÿk%N . This
DECG contains only one faulty block in diskj2 , since each DECG has at most two faulty blocks and one of
the two blocks has been just recovered. So this block in diskj2 can be reconstructed with DECG j11ÿk%N .
Then the faulty block of diskj1 in the row at which the block of diskj2 just recovered is located can now be
recovered using the rest of the blocks in the corresponding HECG. When we continue this recovery process
as in the decoding algorithm using both DECGs and HECGs alternately, the sequence of the row numbers
visited by Algorithm 1 becomes r as illustrated before this theorem. In Lemma 4 we showed that r can be
rewritten as follows:
N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402 1391
0; 1; 2; . . . ; N ÿ 2:
This illustrates that Algorithm 1 visits each row exactly once and traverses all the rows of the disk array.
Whenever the algorithm visits a row, it uses the corresponding DECG to reconstruct the faulty block in
disk j2, then recovers the faulty block of disk j1 in the same row. Therefore, all the blocks in the two failed
disks can be recovered entirely.
2.4. Two sequences in the recovery procedure
A recovery procedure is generally proceeded in a predetermined sequence. Therefore, if some block of a
disk is not accessible during the recovery due to some bad sectors of the disk, the recovery procedure cannot
be proceeded further on. In order to enhance the reliability, an alternative sequence in a recovery procedure
may be provided. DH1 scheme is able to provide an alternative sequence which is the reverse of the sequence
r. Fig. 4 shows the sequence of the reconstructed blocks in the recovery procedure of the decoding
algorithm for DH1 scheme.
The recovery procedure can be proceeded either from (1) to (12) (the forward direction) or from (12)
down to (1) (the backward direction). For example, if a certain faulty block cannot be recovered while the
recover procedure follows the forward direction of the sequence, the recovery using the backward direction
can be processed so that the number of faulty blocks that could not be recovered can be reduced. If two
controllers are available in the disk array system, the recovery procedure can proceed using both directions
of the sequence at the same time. As a result, the reconstruction time for any pair of failed disks can be
reduced signi®cantly. Also, the probability of an occurrence of catastrophic failures in the system can be
3. DH2 scheme
DH2 scheme also uses both DECGs and HECGs as in DH1 scheme. In DH2 scheme, however, the
horizontal parities are placed in the vertical direction using an extra parity disk, i.e., N 1 disks are used,
where N is prime. We assume that each disk has N ÿ 1 logical blocks as in DH1 scheme. By placing the
horizontal parities vertically, the computations for the parity block, D0 (or HNÿ2), in DH1 scheme can be
In order to reduce the occurrences of the bottleneck problem that may occur when updating some
horizontal parities, the horizontal parities can be stored rotationally as shown in Fig. 5, since a disk array
can be viewed, without loss of generality, as a multiple of N ÿ 1 N 1 matrices.
In the above ®gure each disk consists of a sequence of N ÿ 1 logical blocks. Hence the horizontal
parities can be distributed into dierent disks with minor changes in the encoding and decoding methods
for DH2 scheme. However, for the sake of simplicity, we assume that a disk array is viewed only a
N ÿ 1 N 1 matrix from now on.
Fig. 4. The recovery sequence of the decoding algorithm for DH1 scheme, where disk1 and disk3 are faulty and N 7.
1392 N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402
3.1. Encoding method
In DH2 scheme a disk array can be viewed as an N ÿ 1 N 1 matrix, since there are N 1 disks
and each disk is assumed to have N ÿ 1 logical blocks. We call the matrix B. In DH2 scheme the last disk,
diskN , holds all the horizontal parities.
In the matrix B, the blocks in each row are grouped into a HECG as in DH1 scheme. Let HECGi denote
the ith HECG; 06i6N ÿ 2. The parity Hi in HECGi can be obtained with Eq. (3) and is placed into
Bi; N as shown in Fig. 6, where N 7. In this ®gure each block in the ith row holds i or Hi to indicate that
the blocks in the row belong to the same HECGi.
Bi; j; where 06i6 N ÿ 3: 3
That is, Hi can be obtained by XORing the ith block of each disk except the ith block of the last disk.
Observe that the number of the data blocks in each HECG is N.
the ith matrix
the 2nd matrix
the 1st matrix
Fig. 5. Rotational placement of the horizontal parities in DH2 scheme.
Fig. 6. Placement of the horizontal parities and their corresponding HECGs in DH2 scheme.
N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402 1393
Let a DECG be a group of blocks located in the antidiagonal direction as in DH1 scheme. The diagonal
parity Dj in DECGj is obtained with Eq. (4) and is placed into BN ÿ 2; j, the last block of diskj.
BN ÿ 3 ÿ i; j 1 i%N ; where 06j6 N ÿ 1: 4
That is, Dj is computed by XORing the data blocks in DECGj. Fig. 7 shows the locations of the diagonal
parities. In this ®gure, the element holding j or Dj indicates that it belongs to DECGj. Note that there are
N ÿ 1 blocks in each DECG including its parity block.
Fig. 8 illustrates the placement of the horizontal and diagonal parities along with the HECGs and
DECGs by combining both Figs. 6 and 7.
The block indicated by `E' in Fig. 8 represents an empty block and remains empty after the encoding.
However, this block may be used for temporary storage such as a swap space during runtime. UNIX, for
example, allocates some disk space for temporary storage which does not require any redundancy for reliability
since it holds valid data only during runtime. The block E may also be used to substitute for a
physically defective block such that the defects may occur when the disk drive is newly manufactured. The
following example illustrates the encoding of DH2 scheme when N is 7.
Example 3. The following matrix depicts a disk array such that each data block is assumed to have an
arbitrary bit for the simplicity.
Fig. 8. The diagonal and horizontal parity placements in DH2 scheme.
Fig. 7. Placement of the diagonal parities and their corresponding DECGs in DH2 scheme.
1394 N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402
For the encoding of the disk array using DH2 scheme, the horizontal and diagonal parities are calculated
with Eqs. (3) and (4), respectively. The following matrix shows the disk array after placing the parities
3.2. Decoding method
A single disk failure can be recovered easily by using either DECGs or HECGs. However, in case when
any pair of disks are faulty at the same time, DH2 scheme uses two dierent recovery procedures. One
procedure takes care of the case when the last disk holding the horizontal parities has no failure. The other
procedure is used when the last disk is one of the two disks with failures. Algorithms 2-1 and 2-2 illustrate
these two procedures, respectively. For the algorithms, we assume that diskj and diskj2 are faulty, where
06j1 6j2 6N.
Decoding algorithm 2-1. In case when the last disk has no failure, i.e. 06j1 < j2 6 N ÿ 1:
Algorithm 2-1 reconstructs the faulty blocks of diskj1 and diskj2 as in the decoding algorithm of DH1
scheme, except the last block of diskj1 . Step 1 ®nds the dierence k between the indices of the two failed
disks. In step 2 the row number r is computed for ®nding the faulty block of diskj2 . In step 3, the rth block of
diskj2 can be reconstructed by XORing the rest of the unfaulty blocks of DECG j11ÿmk%N . Next, the rth
block in diskj1 can be recovered with the blocks in HECGr, if r is not N ÿ 2. But when r is N ÿ 2, the last
block of diskj1 should be recovered by XORing all the horizontal parities as well as the unfaulty blocks of
the last row, i.e., HECGNÿ2, because the horizontal parity for the last row is not stored in the array. As in
the decoding algorithm for DH1 scheme, every faulty block of the two failed disks can be sequentially
recovered using DECGs and HECGs alternately.
Step 1: k j2 ÿ j1;
for m 0;m6N ÿ 2;m f
Step 2: /* r ®nds the row at which the next faulty block of diskj2 is located. */
r N ÿ 1 ÿ k m 1%N ;
Step 3: /* The block at Br; j2 of diskj2 is recovered using DECG j11ÿmk%N. */
Br; j2 aNÿ2
l0 BN ÿ 2 ÿ l; j1 1 ÿ mk l%N ,
where j1 1 ÿ mk l%N 6 j2;
Step 4: if r 6 N ÿ 2 /* The faulty block at Br; j1 of diskj1 is recovered using HECGr */
Br; j1 aN
l0Br; l, where l 6 j1;
/* The block at BN ÿ 2; j1 is recovered using Hi's and HECGNÿ2 */
Br; j1 aNÿ3
h0 Bh; N Nl
0BN ÿ 2; l, where l 6 j1;
N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402 1395
Decoding algorithm 2-2. In case when one of the two failed disks is the last disk, i.e., 06j1 6 N ÿ 1 and
Algorithm 2-2 describes the decoding method when the last disk is one of the two failed disks, i.e.,
j2 N. This decoding algorithm is simpler than Algorithm 2-1, since in this case every DECG has only one
faulty block which is in diskj1 . Algorithm 2-2 shows the recovery process that uses DECGs and HECGs
alternately as in Algorithm 2-1 or the decoding algorithm of DH1 scheme. However, another recovery
can be proceeded by recovering each faulty block of diskj1 by XORing of the unfaulty blocks of the corresponding
DECG ®rst. And then each faulty block of diskj2 can be reconstructed by the blocks
in the corresponding HECG. The following example shows a recovery of a disk array of size 7 for
Example 4. The following matrix shows the disk array when two disks, disk1 and disk3, are faulty,
In step 1 of Algorithm 2-1, we obtain k j2 ÿ j1 2. In step 2 r N ÿ 1 ÿ k m 1%N 4 ÿ 2m%7.
Then in steps 3 and 4 the following equations can be obtained, where 06m65:
B4 ÿ 2m; 3%7 a
B5 ÿ l; 2 ÿ 2m l%7;
B4 ÿ 2m; 1%7
Bh; N a
B4 ÿ 2m; l; l 6 1; if m 3;
B4 ÿ 2m; l%7; otherwise:
>>>: Using the above equations and varying m, the two failed disks can be sequentially recovered as shown in the
for (r 0; r6N ÿ 3; r f
Step 1: /* Find the DECGc to recover Br; j1 =c j1 2 r%N ;
Step 2: /* Reconstruct Br; j1 using the blocks in DECGc */ Br; j1 aNÿ2
l0 BN ÿ 2 ÿ l; c l%N ,
where c l%N 6 j1;
Step 3: /* Reconstruct Br; j2 using the blocks in HECGr */ Br; j2 aNÿ1
l0 Br; l;
1396 N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402
We omit a decoding example for Algorithm 2-2 since it is immediate. The decoding algorithms for DH2
scheme can also be veri®ed. But we omit the proof for the correctness of Algorithms 2-1 and 2-2 because it
is very similar to the one shown for DH1 scheme.
4. Performance analysis
In , the performance of EVENODD scheme was compared with 2D scheme and Reed-Solomon code
scheme in recovering from two disk failures. EVENODD scheme outperformed these schemes. Hence, in
this section we have compared our DH schemes only with EVENODD scheme. For the analysis, we assume
that the disk arrays for EVENODD, DH1, and DH2 schemes have the dimensions as in Table 1. Note that
N is a prime number, since each scheme works only if N is prime.
The eciency of each scheme can be measured by the numbers of XOR operations required for encoding
and decoding for a given disk array, and by the number of XOR operations needed for a small write. The
ratio of the number of the parity blocks to that of the data blocks for each scheme is also compared in this
4.1. Encoding eciencies
The encoding eciency of each scheme is compared using the number of XOR operations. In order to
perform fair comparisons, the ratio Rs de®ned below is used, where s is a scheme, since each scheme uses an
array of dierent size
the number of XOR operations for encoding a disk array
the number of data blocks in the array
That is, Rs indicates the number of XOR operations to encode a single data block using scheme s. We now
calculate RDH1; RDH2, and REVENODD for their encoding eciencies.
For DH1 scheme, each horizontal parity can be calculated with N ÿ 2 XOR operations using Eq. (1).
Since there are N ÿ 2 horizontal parities in a disk array, the number of XOR operations for all the
horizontal parities is N ÿ 22. Similarly, all the diagonal parities can be computed with N ÿ 3N XOR
operations as shown in Eq. (2). Therefore, the total number of XOR operations to encode a disk array of
size N ÿ 1 N becomes N ÿ 22 N ÿ 3N 2N2 ÿ 7N 4. Hence, RDH1 can be found as follows:
The dimensions of disk arrays for EVENODD, DH1, and DH2 schemes
EVENODD DH1 DH2
Total array size N ÿ 1 N 2 N ÿ 1 N N ÿ 1 N 1
No. of data blocks N ÿ 1 N N ÿ 2 N ÿ 1 N ÿ 2 N
N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402 1397
RDH1 2N2 ÿ 7N 4
N ÿ 2 N ÿ 1
In DH2 scheme, all the horizontal parities can be obtained with N ÿ 1 N ÿ 2 XOR operations by
Eq. (3), and all the diagonal parities require N ÿ 3N XOR operations by Eq. (4). The total number of
XOR operations is, therefore, N ÿ 1 N ÿ 2 N ÿ 3N 2N2 ÿ 6N 2. We now get RDH2 as in the
RDH2 2N2 ÿ 6N 2
N ÿ 2N
In the encoding of EVENODD scheme, the diagonal S, the horizontal parities, and the diagonal parities
should be calculated. The equations for computing these parities are given below :
AN ÿ 1 ÿ l; l; 5
Ai; N a
Ai; l; where 06i6N ÿ 2; 6
Ai; N 1 S a
Ai ÿ l; l%N; where 06i6N ÿ 2: 7
In order to obtain diagonal S; N ÿ 2 XOR operations are required because there are N ÿ 1 diagonal
blocks. In calculating the diagonal parities using Eq. (7), we need N N ÿ 1 XOR operations since ®nding
each diagonal parity needs N XOR operations and the number of all the diagonal parities is N ÿ 1. For
the horizontal parities, we need N ÿ 12 operations because there are N ÿ 1 rows in a disk array and each
row needs N ÿ 1 XOR operations. The total number of XOR operations for encoding in EVENODD
scheme is, therefore, 2N2 ÿ 2N ÿ 1. REVENODD can then be given as follows:
REVENODD 2N2 ÿ 2N ÿ 1
N ÿ 1N
In Fig. 9, Rs for each scheme s is compared while N increases. DH2 scheme requires the smaller number of
XOR operations than others. This result is mainly because EVENODD scheme needs to calculate the
diagonal S for the diagonal parities and DH1 scheme should calculate the parity of the parities, i.e., D0 (or
HNÿ2). DH1 scheme also shows better performance than EVENODD scheme when N is less than 17. This
®gure shows only little dierences among the schemes, because we have used only 1 bit as the size of a data
block. When a data block gets a realistic size, the dierences should be much greater.
4.2. Decoding eciencies
Even though the reliability of a disk array can be increased by using an encoding scheme for tolerating
two disk failures, a catastrophic failure such as an another disk failure during a recovery from two disk
failures may occur. Thus if we can reduce the reconstruction time, the probability of occurring of failures of
this type could be lowered. In a decoding process, the unfaulty blocks associated with the block to be
reconstructed are read into memory and are XORed. Then the reconstructed block is stored into its place in
the disk array. Therefore, if I/O time of a block for each scheme is assumed to be the same, we could say
1398 N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402
that the smaller the number of XOR operations for decoding a scheme has, the shorter the reconstruction
time of a block for two disk failures it takes.
In order to compare the decoding eciencies of DH1, DH2, and EVENODD schemes, we classify the
disks in a disk array into threes types; H, D, and I. Let H denote a dedicated disk storing all the horizontal
parities, D be a dedicated disk holding all the diagonal parities, and I be a disk that is neither H nor D. I
may hold data and some parities together, but not all parities. In DH1 scheme, any pair of disks with
failures are of the types I; I since both the horizontal and the diagonal parities are distributed over the
disk array, i.e., there is no dedicated parity disk. In DH2 scheme, since there exists a dedicated disk to hold
all the horizontal parities, two failed disks have type I; I or type I;H. However, in EVENODD scheme
there are two dedicated parity disks for the horizontal and diagonal parities. Therefore,
H;D; I;H; I;D, and I; I are all possible types for any pair of disks with failures. Table 2 shows the
expected numbers of XOR operations for recovering two failed disks, the numbers of occurrences of different
types of two failed disks, and the probabilities of the occurrences in dierent types of two disk
failures with the assumption that any pair of disks in the disk array are equally likely to fail.
As in the comparisons for encoding the expected number of XOR operations each scheme is divided by
the number of data blocks used in the scheme for a fair comparison. These results have been drawn in
Fig. 10 while N varies.
As shown in Fig. 10, the decoding in DH2 scheme can be done faster than other schemes. Also, DH1
scheme is better than EVENODD scheme, since EVENODD scheme suers from the overheads in computation
involving diagonal S. This ®gure again shows only little dierences among the schemes, because
Comparison of three schemes for decoding for two disk failures
Schemes Types of two failed
Expected no. of
No. of types Probability of occurring
each pair of types
DH1 (I, I) 2N2 ÿ 6N 2 N N ÿ 1=2 1.0
DH2 (I, H) 2N2 ÿ 5N 2 N 2= N 1
(I, I) 2N2 ÿ 5N 1 N N ÿ 1=2 N ÿ 1= N 1
EVENODD (H, D) 2N2 ÿ 2N ÿ1 1 2= N 2 N 1
(I, H) N2 N 2N= N 2 N 1
(I, D) 2N2 ÿ 2N ÿ1 N 2N= N 2 N 1
(I, I) 2N2 N ÿ 7 N N ÿ 1=2 N N ÿ 1/ N 2 N 1
5 7 11 13 17 19 23 29 31 41 43
Expected # of XOR operations per data block
# of disks(N)
Fig. 9. The number of XOR operations to encode a single data block.
N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402 1399
we have used only 1 bit as the size of a block. When a block size gets realistic, the dierences should be
4.3. Eciencies in small writes
Whenever a data block is updated, the associated parity blocks should also be updated. In DH1 scheme
the number of block accesses for such updates is 8, since there are three parity blocks related with the block
to be updated. These are the corresponding horizontal and diagonal parity blocks along with the special
block D0 (or HNÿ2). Each of these blocks should be also read and written. Therefore, DH1 scheme needs
four reads and four writes for updating a block, regardless of the number of disks used for a disk array.
DH2 scheme requires only three reads and three writes, i.e., 6 block accesses, since it does not keep the
special parity block, i.e., D0. Recall that DH2 scheme spreads the diagonal parities throughout the disk
array as in DH1 scheme, but uses a dedicated disk for all the horizontal parities.
However, in EVENODD scheme, if a data block is not included in the diagonal S, the number of block
accesses only is 6. Otherwise, the number of block accesses becomes 2 N 1, where N is the number of
disks. When N ranges from 5 to 43, the average number of block accesses for EVENODD scheme is 7.2±
7.9. So the expected number of block accesses for updating a block in EVENODD scheme is smaller than
that of DH1 scheme in this range. However, when two or more data blocks should be updated simultaneously,
EVENODD scheme should suer from updating the parity blocks sequentially because all the
diagonal and the horizontal parities are placed into the two dedicated disks. Moreover, when a data block
contained in diagonal S should be updated, the waiting time takes much longer, because every diagonal
parity should also be updated.
4.4. Parity block ratio
In a disk array, the blocks holding the parities cannot be used to hold the actual data. So, it is desirable
to keep as small a number of parity blocks as possible in a given disk array. Theoretically, any scheme that
can tolerate up to two simultaneous disk failures needs at least two redundant disks. This is known as
singleton bound . EVENODD scheme uses exactly two redundant disks for the parities. Both DH1 and
DH2 schemes also use approximately two disks for them. Fig. 11 shows the ratio of the number of the
5 7 11 13 17 19 23 29 31 41 43
# of XOR operations/# of data blocks
# of disks(N)
Fig. 10. The expected numbers of XOR operations required for decoding.
1400 N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402
parity blocks to that of the overall blocks for each scheme. It can be seen that DH1 and DH2 schemes use
near optimal disk space for storing the redundant information.
In this paper, two ecient parity placement schemes, called DH1 and DH2 schemes, are presented for
tolerating up to two disk failures. These schemes use both the diagonal and the horizontal parities for
encoding a disk array. DH1 scheme places all the parity blocks evenly throughout the disk array, while
DH2 scheme places the horizontal parity blocks into a dedicated disk and stores the diagonal parity blocks
evenly as in DH1 scheme. DH1 and DH2 schemes can be implemented by pure software technology. In
other words, they use only exclusive-or operations that are already used in the existing disk array systems
for computing parities, and can be implemented without modifying the hardware.
The algorithms of both DH schemes can be implemented with lower complexities than that of
EVENODD scheme. Moreover, DH1 scheme can reduce the occurrences of the bottleneck problems when
updating parities, because the data and parity blocks are evenly distributed throughout the disk array. In
DH1 and DH2 schemes, the recovery procedure can be overlapped by proceeding the procedure in two
opposite directions and can also be processed in parallel with some additional hardware. Therefore, the
reconstruction time for the disk array with two disk failures can be reduced signi®cantly. As a result, the
probability of an occurrence of catastrophic failures in a disk array system can also be lowered.
It has been shown that both DH schemes require smaller number of XOR operations for encoding a disk
array, and need smaller number of XOR operations for recovering two disk failures than EVENODD
scheme, respectively. DH2 scheme is more ecient for small writes than EVENODD scheme, too, while
both DH schemes use near optimal disk space for the redundant information.
 D.A. Patterson, G.A. Gibson, R.H. Katz, A case for redundant arrays of inexpensive disks (RAID), in: Proceedings of the ACM
SIGMOD Conference, 1988, pp. 109±116.
 R.H. Katz, G.A. Gibson, D.A. Patterson, Disk system architecture for high performance computing, IEEE Proc. 77 (12) (1989)
5 7 11 13 17 19 23 29 31 41 43
The ratio of parity blocks
# of disks(N)
Fig. 11. The ratio of parity blocks to the overall blocks in a disk array.
N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402 1401
 G.R. Ganger, B.L. Worthinton, R.Y. Hou, Y.N. Patt, Disk arrays: high-performance high-reliability storage subsystems, IEEE
Comput. 27 (3) (1994) 30±36.
 P.M. Chen, D.A. Patterson, Maximizing performance in a striped disk array, Berkeley Technical Report, 1989.
 P. Shenoy, H.M. Vin, Ecient striping techniques for various bit rate continuous media ®le servers, University of Massachusetts,
Amherst, UM-CS-1998-053, December 1998.
 E.K. Lee, R.H. Katz, The performance of parity placements in disk arrays, IEEE Trans. Comput. 42 (6) (1993) 651±664.
 R. Karedla, J.S. Love, B.G. Wherry, Caching strategies to improve disk system performance, IEEE Comput. 27 (3) (1994) 38±46.
 M. Schulze, G.A. Gibson, R.H. Katz, D.A. Patterson, How reliable is a RAID, IEEE COMPCON (Spring) (1989) 118±123.
 M. Blaum, J. Brady, J. Bruck, J. Menon, EVENODD: an ecient scheme for tolerating double disk failures in RAID
architectures, IEEE Trans. Comput. 44 (2) (1995) 192±202.
 P.M. Chen, E.K. Lee, G.A. Gibson, R.H. Katz, D.A. Patterson, RAID: high-performance, reliable secondary storage, ACM
Comput. Surveys 26 (2) (1994) 145±185.
 C. Park, Ecient placement of parity and data to tolerate two disk failures in disk array systems, IEEE Trans. Parallel Distribut.
Syst. 6 (11) (1995) 1177±1184.
 G.A. Gibson, Redundant disk arrays: reliable, parallel secondary storages, Ph.D. Thesis, Department of Computer Science,
University of California, Berkeley, 1990.
 G.A. Gibson, L. Hellerstein, R.M. Karp, R.H. Katz, D.A. Patterson, Coding techniques for handling failures in large disk arrays,
Computer Science Technical Report, CSD88-477, University of California, Berkeley, 1988.
 F.J. MacWilliams, N.J.A. Sloane, The Theory of Error-Correcting Codes, vol.16, Elsevier, Amsterdam, 1977.
 G.A. Alvarez, W.A. Burkhard, L.J. Stockmeyer, F. Cristian, Declustered disk array architectures with optimal and near-optimal
parallelism, Proceedings of the 25th Annual International Symposium on Computer Architecture 6 (3) (1998) 109±120.
 R.L. Graham, D.E. Knuth, O. Patashnik, Concrete Mathematics: A Foundation for Computer Science, second ed., Addison-
Wesley, Reading, MA, 1994, pp. 129±130.
Nam-Kyu Lee received his B.S. and M.S. degree in computer science from the Yonsei University, Seoul,
Korea, in 1988 and 1995, respectively. He was employed at Samsung Advanced Institute of Technology from
1988 to 1995, and Samsung Electronics Co. Ltd. in Korea from 1995 to 1997. He is currently a Ph.D. student
of computer science at the Yonsei University. His research interests are high performance I/O architecture,
advanced computer architecture design, human computer interaction (HCI), and mobile computing.
Sung-Bong Yang received his B.S. degree form the Yonsei University, Seoul, Korea, in 1981, and his M.S. and
Ph.D. degrees from the University of Oklahoma, in 1986 and 1992, respectively. He was an adjunct assistant
professor at the University of Oklahoma for Fall semester of 1992. He is currently an associate professor of
computer science at Yonsei University, Seoul, Korea. His research interests include parallel processing, 3D
graphics, and GIS.
Kyoung-Woo Lee received his B.S. and M.S. degree in computer science from Yonsei University, Seoul,
Korea, in 1995 and 1997, respectively. Since 1997, he has been working at Digital TV Research Laboratory,
LG Electronics Inc. as a research engineer. His research interests include high performance parallel architecture,
disk array, multimedia server design, VOD (Video On-Demand), SMIL (Synchronized Multimedia
Integration Language), and mobile computing.
1402 N.-K. Lee et al. / Journal of Systems Architecture 46 (2000) 1383±1402