A Disk RAID array Volume Manager for Disk

the primary dimension is achieved by striping contiguous
VLBNs across quadrangles on all disks. Much like
ordinary disk arrays, which map LBNs across individual
stripe units, each quadrangle row contains a contiguous
run of VLBNs covering a contiguous run of a single
disk’s DLBNs on a single track. Hence, sequential access
naturally exploits the high efficiency of track-based
access explained in Section 2.2. For example, in Figure
3, an access to 16 sequential blocks starting at VLBN
0, will be broken into four disk I/Os executing in parallel
and fetching full tracks: VLBNs 0–3 from disk 0, VLBNs
4–7 from disk 1, VLBNs 8–11 from disk 2, and VLBNs
12–15 from disk 3.
Efficient access to the secondary dimension is
achieved by mapping it to semi-sequential VLBNs. Figure
3 indicates the semi-sequential VLBNs with a dashed
line. Requests to the semi-sequential VLBNs in a single
quadrangle are all issued together in a batch. The disk’s
internal scheduler then chooses the request that will incur
the smallest positioning cost (the sum of seek and
rotational latency) and services it first. Once the first request
is serviced, servicing all other requests will incur
only a track switch to the adjacent track. Thanks to the
semi-sequential layout, no rotational latency is incurred
for any of the subsequent requests, regardless of which
request was serviced first.
Naturally, the sustained bandwidth of semi-sequential
access is smaller than that of sequential access. However,
semi-sequential access is more efficient than reading
d effectively-random VLBNs spread across d tracks,
as would be the case in a normal striped disk array. Accessing
random VLBNs will incur rotational latency, averaging
half a revolution per access. In the example of
Figure 3, the semi-sequential access, depicted by the arrow,
proceeds across VLBNs 0;16;32; : : :;240 and occurs
on all p disks, achieving the aggregate semi-sequential
bandwidth of the disk array.
3.2 Quadrangle layout parameters
The values that determine efficient quadrangle layout depend
on disk characteristics, which can be described by
two parameters. The parameter N describes the number
of sectors, or DLBNs, per track. The parameter H
describes the track skew in the mapping of DLBNs to
physical sectors. The layout and disk parameters are
summarized in Table 1.
Track skew is a property of disk data layouts as a consequence
of track switch time. When data is accessed
sequentially on a disk beyond the end of a track, the
disk must switch to the next track to continue accessing.
Switching tracks takes some amount of time, during
which no data can be accessed. While the track switch is
in progress, the disk continues to spin, of course. Therefore,
sequential LBNs on successive tracks are physi-
Symbol Name Units
Quadrangle layout parameters
p Parallelism # of disks
d Quadrange depth # of tracks
b Block size # of DLBNs
w Quadrange width # of VLBNs
Disk physical parameters
N Sectors per track
H Head switch in DLBNs
Table 1: Parameters used by Atropos.
cally skewed so that when the switch is complete, the
head will be positioned over the next sequential LBN.
This skew is expressed as the parameter H which is the
number of DLBNs that the head passes over during the
track switch time.
Figure 4 shows a sample quadrangle layout and its parameters.
Figure 4(a) shows an example of how quadrangle
VLBNs map to DLBNs. Along the x-axis, a quadrangle
contains w VLBNs, each of size b DLBNs. In the
example, one VLBN consists of two DLBNs, and hence
b = 2. As illustrated in the example, a quadrangle does
not always use all DLBNs when the number of sectors
per track, N, is not divisible by b. In this case, there are
R residual DLBNs that are not assigned to quadrangles.
Figure 4(b) shows the physical locations of each b-sized
VLBN on individual tracks, accounting for track skew,
which equals 3 sectors (H= 3 DLBNs) in this example.
3.2.1 Determining layout parameters
To determine a suitable quadrangle layout at format
time, Atropos uses as its input parameters the automatically
extracted disk characteristics, N and H, and the
block size, b, which are given by higher level software.
Based on these input parameters, the other quadrangle
layout parameters, d and w, are calculated as described
below.
To explain the relationship between the quadrangle
layout parameters and the disk physical parameters, let’s
assume that we want to read one block of b DLBNs from
each of d tracks. This makes the total request size, S,
equal to db. As illustrated in Figure 4(b), the locations
of the b blocks on each track are chosen to ensure the
most efficient access. Accessing b on the next track can
commence as soon as the disk head finishes reading on
the previous track and repositions itself above the new
track. During the repositioning, H sectors pass under
the heads.
To bound the response time for reading the S sectors,
we need to find suitable values for b and d to ensure that
the entire request, consisting of db sectors, is read in at
most one revolution. Hence,
db
N
+
(d��1)H
N 1 (1)Figure 4: Single quadrangle layout. In this example, the quadrangle layout parameters are b=2 (a single VLBN consists of two DLBNs), w=10
VLBNs, and d= 4 tracks. The disk physical parameters are H=3 DLBNs and N=21 DLBNs. Given these parameters, R=1.
where db=N is the media access time needed to fetch
the desired S sectors and (d��1)H=N is the fraction of
time spent in head switches when accessing all d tracks.
Then, as illustrated at the bottom of Figure 4(b), reading
db sectors is going to take the same amount of time as if
we were reading db+(d��1)H sectors on a single track
of a zero-latency access disk.
The maximal number of tracks, d, from which at least
one sector each can be read in a single revolution is
bound by the number of head switches that can be done
in a single revolution, so
d N
H ��1 (2)
If we fix d, the number of sectors, b, that yield the
most efficient access (i.e., reading as many sectors on a
single track as possible before switching to the next one)
can be determined from Equation 1 to get
b
N +H
d ��H (3)
Alternatively, if we fix b, the maximal depth, called
Dmax, can be expressed from Equation 1 as
Dmax
N +H
b+H
(4)
For certain values of N, db sectors do not span a full
track. In that case, db+(d��1)H < N and there are
R residual sectors, where R < b, as illustrated in Figure
4. The number of residual DLBNs on each track not
mapped to quadrangle blocks is R = N mod w, where
w = N
b (5)
Hence, the fraction of disk space that is wasted with
Atropos’ quadrangle layout is R=N; these sectors are
skipped to maintain the invariant that db sectors can be
accessed in at most one revolution. Section 5.2.4 shows
that this number is less than 2% of the total disk capacity.
While it may seem that relaxing the one revolution
constraint might achieve better efficiency, Appendix B
shows that this intuition is wrong. Accessing more than
Dmax tracks is detrimental to the overall performance unless
d is some multiple of Dmax. In that case, the service
time for such access is a multiple of one-revolution time.
3.2.2 Mapping VLBNs to quadrangles
Mapping VLBNs to the DLBNs of a single quadrangle
is straightforward. Each quadrangle is identified
by DLBNQ, which is the lowest DLBN of the quadrangle
and is located at the quadrangle’s top-left corner.
The DLBNs that can be accessed semi-sequentially
are easily calculated from the N and b parameters. As
illustrated in Figure 4, given DLBNQ = 0 and b = 2,
the set f0;24;48;72g contains blocks that can be accessed
semi-sequentially. To maintain rectangular appearance
of the layout to an application, these DLBNs
are mapped to VLBNs f0;10;20;30g when b=2, p =1,
and VLBNQ = DLBNQ = 0.
With no media defects, Atropos only needs to know
the DLBNQ of the first quadrangle. The DLBNQ for all
other quadrangles can be calculated from the N, d, and
b parameters. With media defects handled via slipping
(e.g., the primary defects that occurred during manufacturing),
certain tracks may contain fewer DLBNs. If the
number of such defects is less than R, that track can
be used; if it is not, the DLBNs on that track must be
skipped. If any tracks are skipped, the starting DLBN of
each quadrangle row must be stored.
To avoid the overhead of keeping a table to remember
the DLBNs for each quadrangle row, Atropos could reformat
the disk and instruct it to skip over any tracks that
contain one or more bad sectors. By examining twelve
Seagate Cheetah 36ES disks, we found there were, on
average, 404 defects per disk; eliminating all tracks with
defects wastes less than 5% of the disk’s total capacity.
The techniques for handling grown defects still apply.Figure 5: Atropos quadrangle layout for different RAID levels.
3.3 Practical system integration
Building an Atropos logical volume out of p disks is not
difficult thanks to the regular geometry of each quadrangle.
Atropos collects a set of disks with the same basic
characteristics (e.g., the same make and model) and
selects a disk zone with the desired number of sectors
per track, N. The VLBN size, b, is set according to application
needs, specifying the access granularity. For
example, it may correspond to a file system block size
or database page size. With b known, Atropos uses disk
parameters to determine the resulting d Dmax.
In practice, volume configuration can be accomplished
in a two-step process. First, higher-level software
issues a FORMAT command with desired values of
volume capacity, level of parallelism p, and block size
b. Internally, Atropos selects appropriate disks (out of a
pool of disks it manages), and formats the logical volume
by implementing a suitable quadrangle layout.
3.3.1 Zoned disk geometries
With zoned-disk geometries, the number of sectors per
track, N, changes across different zones, which affects
both the quadrangle width, w, and depth, d. The latter
changes because the ratio of N to H may be different for
different zones; the track switch time does not change,
but the number of sectors that rotate by in that time does.
By using disks with the same geometries (e.g., same
disk models), we opt for the simple approach: quadrangles
with one w can be grouped into one logical volume
and those with another w (e.g., quadrangles in a different
zone) into a different logical volume. Since modern
disks have fewer than 8 zones, the size of a logical volume
stored across a few 72 GB disks would be tens of
GBs.
3.3.2 Data protection
Data protection is an integral part of disk arrays and the
quadrangle layout lends itself to the protection models
of traditional RAID levels. Analogous to the parity unit,
a set of quadrangles with data can be protected with a
parity quadrangle. To create a RAID5 homologue of a
parity group with quadrangles, there is one parity quadrangle
unit for every p��1 quadrangle stripe units, which
rotates through all disks. Similarly, the RAID 1 homologue
can be also constructed, where each quadrangle
has a mirror on a different disk. Both protection schemes
are depicted in Figure 5.
3.3.3 Explicit information to applications
To allow applications to construct efficient streaming access
patterns, Atropos needs to expose the parameter w,
denoting the stripe unit size. I/Os aligned and sized to
stripe unit boundaries can be executed most efficiently
thanks to track-based access and rotating stripe units
through all p disks. Applications with one-dimensional
access (e.g., streaming media servers) then exercise access
patterns consisting of w-sized I/Os that are aligned
on disk track boundaries.
For applications that access two-dimensional data
structures, and hence want to utilize semi-sequential access,
Atropos also needs to expose the number of disks,
p. Such applications then choose the primary order for
data and allocate w p blocks of this data, corresponding
to a portion of column 1 fa1; : : : ;h1g in Figure 2.
They allocate to the next wp VLBNs the corresponding
data of the other-major order (e.g., the fa2; : : : ;h2g portion
of column 2) and so on, until all are mapped. Thus,
the rectangular region fa1; : : : ;h4g would be mapped to
4wp contiguous VLBNs.
Access in the primary-major order (columns in Figure
2) consists of sequentially reading wp VLBNs. Access
in the other-major order is straightforward; the application
simply accesses every wp-th VLBN to get the
data of the desired row. Atropos need not expose to applications
the parameter d. It is computed and used internally
by Atropos.