the primary dimension is achieved by striping contiguous VLBNs across quadrangles on all disks. Much like ordinary disk arrays, which map LBNs across individual stripe units, each quadrangle row contains a contiguous run of VLBNs covering a contiguous run of a single disk’s DLBNs on a single track. Hence, sequential access naturally exploits the high efficiency of track-based access explained in Section 2.2. For example, in Figure 3, an access to 16 sequential blocks starting at VLBN 0, will be broken into four disk I/Os executing in parallel and fetching full tracks: VLBNs 0–3 from disk 0, VLBNs 4–7 from disk 1, VLBNs 8–11 from disk 2, and VLBNs 12–15 from disk 3. Efficient access to the secondary dimension is achieved by mapping it to semi-sequential VLBNs. Figure 3 indicates the semi-sequential VLBNs with a dashed line. Requests to the semi-sequential VLBNs in a single quadrangle are all issued together in a batch. The disk’s internal scheduler then chooses the request that will incur the smallest positioning cost (the sum of seek and rotational latency) and services it first. Once the first request is serviced, servicing all other requests will incur only a track switch to the adjacent track. Thanks to the semi-sequential layout, no rotational latency is incurred for any of the subsequent requests, regardless of which request was serviced first. Naturally, the sustained bandwidth of semi-sequential access is smaller than that of sequential access. However, semi-sequential access is more efficient than reading d effectively-random VLBNs spread across d tracks, as would be the case in a normal striped disk array. Accessing random VLBNs will incur rotational latency, averaging half a revolution per access. In the example of Figure 3, the semi-sequential access, depicted by the arrow, proceeds across VLBNs 0;16;32; : : :;240 and occurs on all p disks, achieving the aggregate semi-sequential bandwidth of the disk array. 3.2 Quadrangle layout parameters The values that determine efficient quadrangle layout depend on disk characteristics, which can be described by two parameters. The parameter N describes the number of sectors, or DLBNs, per track. The parameter H describes the track skew in the mapping of DLBNs to physical sectors. The layout and disk parameters are summarized in Table 1. Track skew is a property of disk data layouts as a consequence of track switch time. When data is accessed sequentially on a disk beyond the end of a track, the disk must switch to the next track to continue accessing. Switching tracks takes some amount of time, during which no data can be accessed. While the track switch is in progress, the disk continues to spin, of course. Therefore, sequential LBNs on successive tracks are physi- Symbol Name Units Quadrangle layout parameters p Parallelism # of disks d Quadrange depth # of tracks b Block size # of DLBNs w Quadrange width # of VLBNs Disk physical parameters N Sectors per track H Head switch in DLBNs Table 1: Parameters used by Atropos. cally skewed so that when the switch is complete, the head will be positioned over the next sequential LBN. This skew is expressed as the parameter H which is the number of DLBNs that the head passes over during the track switch time. Figure 4 shows a sample quadrangle layout and its parameters. Figure 4(a) shows an example of how quadrangle VLBNs map to DLBNs. Along the x-axis, a quadrangle contains w VLBNs, each of size b DLBNs. In the example, one VLBN consists of two DLBNs, and hence b = 2. As illustrated in the example, a quadrangle does not always use all DLBNs when the number of sectors per track, N, is not divisible by b. In this case, there are R residual DLBNs that are not assigned to quadrangles. Figure 4(b) shows the physical locations of each b-sized VLBN on individual tracks, accounting for track skew, which equals 3 sectors (H= 3 DLBNs) in this example. 3.2.1 Determining layout parameters To determine a suitable quadrangle layout at format time, Atropos uses as its input parameters the automatically extracted disk characteristics, N and H, and the block size, b, which are given by higher level software. Based on these input parameters, the other quadrangle layout parameters, d and w, are calculated as described below. To explain the relationship between the quadrangle layout parameters and the disk physical parameters, let’s assume that we want to read one block of b DLBNs from each of d tracks. This makes the total request size, S, equal to db. As illustrated in Figure 4(b), the locations of the b blocks on each track are chosen to ensure the most efficient access. Accessing b on the next track can commence as soon as the disk head finishes reading on the previous track and repositions itself above the new track. During the repositioning, H sectors pass under the heads. To bound the response time for reading the S sectors, we need to find suitable values for b and d to ensure that the entire request, consisting of db sectors, is read in at most one revolution. Hence, db N + (d1)H N 1 (1)Figure 4: Single quadrangle layout. In this example, the quadrangle layout parameters are b=2 (a single VLBN consists of two DLBNs), w=10 VLBNs, and d= 4 tracks. The disk physical parameters are H=3 DLBNs and N=21 DLBNs. Given these parameters, R=1. where db=N is the media access time needed to fetch the desired S sectors and (d1)H=N is the fraction of time spent in head switches when accessing all d tracks. Then, as illustrated at the bottom of Figure 4(b), reading db sectors is going to take the same amount of time as if we were reading db+(d1)H sectors on a single track of a zero-latency access disk. The maximal number of tracks, d, from which at least one sector each can be read in a single revolution is bound by the number of head switches that can be done in a single revolution, so d N H 1 (2) If we fix d, the number of sectors, b, that yield the most efficient access (i.e., reading as many sectors on a single track as possible before switching to the next one) can be determined from Equation 1 to get b N +H d H (3) Alternatively, if we fix b, the maximal depth, called Dmax, can be expressed from Equation 1 as Dmax N +H b+H (4) For certain values of N, db sectors do not span a full track. In that case, db+(d1)H < N and there are R residual sectors, where R < b, as illustrated in Figure 4. The number of residual DLBNs on each track not mapped to quadrangle blocks is R = N mod w, where w = N b (5) Hence, the fraction of disk space that is wasted with Atropos’ quadrangle layout is R=N; these sectors are skipped to maintain the invariant that db sectors can be accessed in at most one revolution. Section 5.2.4 shows that this number is less than 2% of the total disk capacity. While it may seem that relaxing the one revolution constraint might achieve better efficiency, Appendix B shows that this intuition is wrong. Accessing more than Dmax tracks is detrimental to the overall performance unless d is some multiple of Dmax. In that case, the service time for such access is a multiple of one-revolution time. 3.2.2 Mapping VLBNs to quadrangles Mapping VLBNs to the DLBNs of a single quadrangle is straightforward. Each quadrangle is identified by DLBNQ, which is the lowest DLBN of the quadrangle and is located at the quadrangle’s top-left corner. The DLBNs that can be accessed semi-sequentially are easily calculated from the N and b parameters. As illustrated in Figure 4, given DLBNQ = 0 and b = 2, the set f0;24;48;72g contains blocks that can be accessed semi-sequentially. To maintain rectangular appearance of the layout to an application, these DLBNs are mapped to VLBNs f0;10;20;30g when b=2, p =1, and VLBNQ = DLBNQ = 0. With no media defects, Atropos only needs to know the DLBNQ of the first quadrangle. The DLBNQ for all other quadrangles can be calculated from the N, d, and b parameters. With media defects handled via slipping (e.g., the primary defects that occurred during manufacturing), certain tracks may contain fewer DLBNs. If the number of such defects is less than R, that track can be used; if it is not, the DLBNs on that track must be skipped. If any tracks are skipped, the starting DLBN of each quadrangle row must be stored. To avoid the overhead of keeping a table to remember the DLBNs for each quadrangle row, Atropos could reformat the disk and instruct it to skip over any tracks that contain one or more bad sectors. By examining twelve Seagate Cheetah 36ES disks, we found there were, on average, 404 defects per disk; eliminating all tracks with defects wastes less than 5% of the disk’s total capacity. The techniques for handling grown defects still apply.Figure 5: Atropos quadrangle layout for different RAID levels. 3.3 Practical system integration Building an Atropos logical volume out of p disks is not difficult thanks to the regular geometry of each quadrangle. Atropos collects a set of disks with the same basic characteristics (e.g., the same make and model) and selects a disk zone with the desired number of sectors per track, N. The VLBN size, b, is set according to application needs, specifying the access granularity. For example, it may correspond to a file system block size or database page size. With b known, Atropos uses disk parameters to determine the resulting d Dmax. In practice, volume configuration can be accomplished in a two-step process. First, higher-level software issues a FORMAT command with desired values of volume capacity, level of parallelism p, and block size b. Internally, Atropos selects appropriate disks (out of a pool of disks it manages), and formats the logical volume by implementing a suitable quadrangle layout. 3.3.1 Zoned disk geometries With zoned-disk geometries, the number of sectors per track, N, changes across different zones, which affects both the quadrangle width, w, and depth, d. The latter changes because the ratio of N to H may be different for different zones; the track switch time does not change, but the number of sectors that rotate by in that time does. By using disks with the same geometries (e.g., same disk models), we opt for the simple approach: quadrangles with one w can be grouped into one logical volume and those with another w (e.g., quadrangles in a different zone) into a different logical volume. Since modern disks have fewer than 8 zones, the size of a logical volume stored across a few 72 GB disks would be tens of GBs. 3.3.2 Data protection Data protection is an integral part of disk arrays and the quadrangle layout lends itself to the protection models of traditional RAID levels. Analogous to the parity unit, a set of quadrangles with data can be protected with a parity quadrangle. To create a RAID5 homologue of a parity group with quadrangles, there is one parity quadrangle unit for every p1 quadrangle stripe units, which rotates through all disks. Similarly, the RAID 1 homologue can be also constructed, where each quadrangle has a mirror on a different disk. Both protection schemes are depicted in Figure 5. 3.3.3 Explicit information to applications To allow applications to construct efficient streaming access patterns, Atropos needs to expose the parameter w, denoting the stripe unit size. I/Os aligned and sized to stripe unit boundaries can be executed most efficiently thanks to track-based access and rotating stripe units through all p disks. Applications with one-dimensional access (e.g., streaming media servers) then exercise access patterns consisting of w-sized I/Os that are aligned on disk track boundaries. For applications that access two-dimensional data structures, and hence want to utilize semi-sequential access, Atropos also needs to expose the number of disks, p. Such applications then choose the primary order for data and allocate w p blocks of this data, corresponding to a portion of column 1 fa1; : : : ;h1g in Figure 2. They allocate to the next wp VLBNs the corresponding data of the other-major order (e.g., the fa2; : : : ;h2g portion of column 2) and so on, until all are mapped. Thus, the rectangular region fa1; : : : ;h4g would be mapped to 4wp contiguous VLBNs. Access in the primary-major order (columns in Figure 2) consists of sequentially reading wp VLBNs. Access in the other-major order is straightforward; the application simply accesses every wp-th VLBN to get the data of the desired row. Atropos need not expose to applications the parameter d. It is computed and used internally by Atropos. |