To alleviate the inefficiency of column-major access
with NSM, a decomposition storage model (DSM) vertically
partitions a table into individual columns . Each
DSM page thus contains a single attribute for a fixed
number of records. However, fetching full records requires
n accesses to single-attribute pages and n1 joins
on the record ID to reconstruct the entire record.
The stark difference between row-major and columnmajor
efficiencies for the two layouts described above is
so detrimental to database performance that some have
even proposed maintaining two copies of each table to
avoid it . This solution requires twice the capacity
and must propagate updates to each copy to maintain
consistency. With Atropos’s data layout, which offers
efficient access in both dimensions, database systems do
not have to compromise.
2.5 A more explicit storage interface
Virtually all of today’s disk arrays use an interface (e.g.,
SCSI or ATA) that presents the storage device as a linear
space of equally-sized blocks. Each block is uniquely
addressed by an integer, called a logical block number
(LBN). This linear abstraction hides non-linearities in
storage device access times. Therefore, applications and
storage devices use an unwritten contract, which states
that large sequential accesses to contiguous LBNs are
much more efficient than random accesses and small I/O
sizes. Both entities work hard to abide by this implicit
contract; applications construct access patterns that favor
large I/O and LVMs map contiguous LBNs to media
locations that ensure efficient execution of sequential
I/Os. Unfortunately, an application decides on I/O
sizes without any more specific information about the
LBN mappings chosen by an LVM because current storage
interfaces hide it.
In the absence of clearly defined mechanisms, applications
rely on knobs that must be manually set by a
system administrator. For example, the IBM DB2 relational
database system uses the PREFETCHSIZE and
EXTENTSIZE parameters to determine the maximal size
of a prefetch I/O for sequential access and the number of
pages to put into a single extent of contiguous LBNs .
Another parameter, called DB2 STRIPED CONTAINERS,
instructs DBMS to align I/Os on stripe unit boundaries.
Relying on proper knob settings is fragile and prone to
human errors: it may be unclear how to relate them to
LVM configuration parameters. Because of these difficulties,
and the information gap introduced by inexpressive
storage interfaces, applications cannot easily take
advantage of significant performance characteristics of
modern disk arrays. Atropos exposes explicit information
about stripe unit sizes and semi-sequential access.
This information allows applications to directly match
their access patterns to the disk array’s characteristics.
0 4 8 12
76 64 68 72
136 140 128 132
disk 0 disk 1 disk 2 disk 3
quadrangle 0 quadrangle 1 quadrangle 2 quadrangle 3
quadrangle 4 quadrangle 5 quadrangle 6 quadrangle 7
quadrangle 8 quadrangle 9 quadrangle 10 quadrangle 11
Figure 3: Atropos quadrangle layout. The numbers to the left
of disk 0 are the VLBNs mapped to the gray disk locations connected
by the arrow (not the first block of each quadrangle row). The arrow
illustrates efficient access in the other-major.
3 Atropos logical volume manager
The Atropos disk array LVM addresses the aforementioned
shortcomings of many current disk array LVM
designs. It exploits disk-specific characteristics to construct
a new data organization. It also exposes high-level
features of this organization to higher-levels of the storage
stack, allowing them to directly take advantage of
key device-specific characteristics. This section details
the new data organization and the information Atropos
exposes to applications.
3.1 Atropos data organization
As illustrated in Figure 3, Atropos lays data across p
disks in basic allocation units called quadrangles. A
quadrangle is a collection of logical volume LBNs, here
referred to as VLBNs, mapped to a single disk. Each
successive quadrangle is mapped to a different disk.
A quadrangle consists of d consecutive disk tracks,
with d referred to as the quadrangle’s depth. Hence, a
single quadrangle is mapped to a contiguous range of a
single disk’s logical blocks, here referred to as DLBNs.
The VLBN and DLBN sizes may differ; a single VLBN
consists of b DLBNs, with b being the block size of a
single logical volume block. For example, an application
may choose a VLBN size to match its allocation units
(e.g., an 8 KB database block size), while a DLBN is
typically 512 bytes.
Each quadrangle’s dimensions are w d logical
blocks (VLBNs), where w is the quadrangle width and
equals the number of VLBNs mapped to a single track.
In Figure 3, both d and w are four. The relationship between
the dimensions of a quadrangle and the mappings
to individual logical blocks of a single disk are described
in Section 3.2.2.
The goal of the Atropos data organization is to allow
efficient access in two dimensions. Efficient access of