Appears in Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST’04). San Francisco, CA. March 2004. Atropos: A Disk Array Volume Manager for Orchestrated Use of Disks Jiri Schindler, Steven W. Schlosser, Minglong Shao, Anastassia Ailamaki, Gregory R. Ganger Carnegie Mellon University Abstract The Atropos logical volume manager allows applications to exploit characteristics of its underlying collection of disks. It stripes data in track-sized units and explicitly exposes the boundaries, allowing applications to maximize efficiency for sequential access patterns even when they share the array. Further, it supports efficient diagonal access to blocks on adjacent tracks, allowing applications to orchestrate the layout and access to twodimensional data structures, such as relational database tables, to maximize performance for both row-based and column-based accesses. 1 Introduction Many storage-intensive applications, most notably database systems and scientific computations, have some control over their access patterns. Wanting the best performance possible, they choose the data layout and access patterns they believe will maximize I/O efficiency. Currently, however, their decisions are based on manual tuning knobs and crude rules of thumb. Application writers know that large I/Os and sequential patterns are best, but are otherwise disconnected from the underlying reality. The result is often unnecessary complexity and inefficiency on both sides of the interface. Today’s storage interfaces (e.g., SCSI and ATA) hide almost everything about underlying components, forcing applications that want top performance to guess and assume [7, 8]. Of course, arguing to expose more information highlights a tension between the amount of information exposed and the added complexity in the interface and implementations. The current storage interface, however, has remained relatively unchanged for 15 years, despite the shift from (relatively) simple disk drives to large disk array systems with logical volume managers (LVMs). The same information gap exists inside disk array systems—although their LVMs sit below a host’s storage interface, most do not exploit devicespecific features of their component disks. This paper describes a logical volume manager, called Atropos (see Figure 1), that exploits information about its component disks and exposes high-level information about its data organization. With a new data organization and minor extensions to today’s storage interface, Now with EMC Corporation. APPLICATION disk drive parameters I/O requests Atropos LVM disk array LVM parameters explicit hints to applications layout w/ efficient host 2 1 data access Figure 1: Atropos logical volume manager architecture. Atropos exploits disk characteristics (arrow 1), automatically extracted from disk drives, to construct a new data organization. It exposes high-level parameters that allow applications to directly take advantage of this data organization for efficient access to one- or two-dimensional data structures (arrow 2). it accomplishes two significant ends. First, Atropos exploits automatically-extracted knowledge of disk track boundaries, using them as its stripe unit boundaries. By also exposing these boundaries explicitly, it allows applications to use previously proposed “track-aligned extents” (traxtents), which provide substantial benefits for mid-sized segments of blocks and for streaming patterns interleaved with other I/O activity . Second, Atropos uses and exposes a data organization that lets applications go beyond the “only one dimension can be efficient” assumption associated with today’s linear storage address space. In particular, twodimensional data structures (e.g., database tables) can be laid out for almost maximally efficient access in both row- and column-orders, eliminating a trade-off [ 15] currently faced by database storage managers. Atropos enables this by exploiting automatically-extracted knowledge of track/head switch delays to support semisequential access: diagonal access to ranges of blocks (one range per track) across a sequence of tracks. In this manner, a relational database table can be laid out such that scanning a single column occurs at streaming bandwidth (for the full array of disks), and reading a single row costs only 16%–38% more than if it had been the optimized order. We have implemented Atropos as a host-based LVM, and we evaluate it with both database workload experiments (TPC-H) and analytic models. Because Atropos exposes its key parameters explicitly, these performance benefits can be realized with no manual tuning of storage-related application knobs.