To alleviate the inefficiency of column-major access with NSM, a decomposition storage model (DSM) vertically partitions a table into individual columns [5]. Each DSM page thus contains a single attribute for a fixed number of records. However, fetching full records requires n accesses to single-attribute pages and n1 joins on the record ID to reconstruct the entire record. The stark difference between row-major and columnmajor efficiencies for the two layouts described above is so detrimental to database performance that some have even proposed maintaining two copies of each table to avoid it [15]. This solution requires twice the capacity and must propagate updates to each copy to maintain consistency. With Atropos’s data layout, which offers efficient access in both dimensions, database systems do not have to compromise. 2.5 A more explicit storage interface Virtually all of today’s disk arrays use an interface (e.g., SCSI or ATA) that presents the storage device as a linear space of equally-sized blocks. Each block is uniquely addressed by an integer, called a logical block number (LBN). This linear abstraction hides non-linearities in storage device access times. Therefore, applications and storage devices use an unwritten contract, which states that large sequential accesses to contiguous LBNs are much more efficient than random accesses and small I/O sizes. Both entities work hard to abide by this implicit contract; applications construct access patterns that favor large I/O and LVMs map contiguous LBNs to media locations that ensure efficient execution of sequential I/Os. Unfortunately, an application decides on I/O sizes without any more specific information about the LBN mappings chosen by an LVM because current storage interfaces hide it. In the absence of clearly defined mechanisms, applications rely on knobs that must be manually set by a system administrator. For example, the IBM DB2 relational database system uses the PREFETCHSIZE and EXTENTSIZE parameters to determine the maximal size of a prefetch I/O for sequential access and the number of pages to put into a single extent of contiguous LBNs [6]. Another parameter, called DB2 STRIPED CONTAINERS, instructs DBMS to align I/Os on stripe unit boundaries. Relying on proper knob settings is fragile and prone to human errors: it may be unclear how to relate them to LVM configuration parameters. Because of these difficulties, and the information gap introduced by inexpressive storage interfaces, applications cannot easily take advantage of significant performance characteristics of modern disk arrays. Atropos exposes explicit information about stripe unit sizes and semi-sequential access. This information allows applications to directly match their access patterns to the disk array’s characteristics. 16 32 48 64 80 96 112 128 144 160 176 0 0 4 8 12 76 64 68 72 136 140 128 132 disk 0 disk 1 disk 2 disk 3 quadrangle 0 quadrangle 1 quadrangle 2 quadrangle 3 quadrangle 4 quadrangle 5 quadrangle 6 quadrangle 7 quadrangle 8 quadrangle 9 quadrangle 10 quadrangle 11 Figure 3: Atropos quadrangle layout. The numbers to the left of disk 0 are the VLBNs mapped to the gray disk locations connected by the arrow (not the first block of each quadrangle row). The arrow illustrates efficient access in the other-major. 3 Atropos logical volume manager The Atropos disk array LVM addresses the aforementioned shortcomings of many current disk array LVM designs. It exploits disk-specific characteristics to construct a new data organization. It also exposes high-level features of this organization to higher-levels of the storage stack, allowing them to directly take advantage of key device-specific characteristics. This section details the new data organization and the information Atropos exposes to applications. 3.1 Atropos data organization As illustrated in Figure 3, Atropos lays data across p disks in basic allocation units called quadrangles. A quadrangle is a collection of logical volume LBNs, here referred to as VLBNs, mapped to a single disk. Each successive quadrangle is mapped to a different disk. A quadrangle consists of d consecutive disk tracks, with d referred to as the quadrangle’s depth. Hence, a single quadrangle is mapped to a contiguous range of a single disk’s logical blocks, here referred to as DLBNs. The VLBN and DLBN sizes may differ; a single VLBN consists of b DLBNs, with b being the block size of a single logical volume block. For example, an application may choose a VLBN size to match its allocation units (e.g., an 8 KB database block size), while a DLBN is typically 512 bytes. Each quadrangle’s dimensions are w d logical blocks (VLBNs), where w is the quadrangle width and equals the number of VLBNs mapped to a single track. In Figure 3, both d and w are four. The relationship between the dimensions of a quadrangle and the mappings to individual logical blocks of a single disk are described in Section 3.2.2. The goal of the Atropos data organization is to allow efficient access in two dimensions. Efficient access of |