The learner is the most complicated portion of the system. It is the learner’s job to decide which constraints
to apply and which to ignore. This section discusses the basic problem the learner seeks to solve, and an
overview of the methods we have used to attempt to solve it.
4.2.1 Problem Overview
The learner considers the problem of optimizing the response time for a given set of trace data. We use
a time period of one day’s worth of trace, although the time period could be modified. We are seeking to
optimize the sum of the response times for the trace segment, , where is the response
time of request , and is all of the requests in the trace. This can be stated as a block-centric problem
by formulating R in terms of blocks: , where is the sum of the response times of all
accesses to block , and is the set of all blocks accessed by the trace.
However, because the learner is deciding whether to apply constraints, we convert the problem into a
constraint-centric problem. To do this conversion, we assign weights to the constraints to approximate the
block-centric weights. The weight of a constraint is the sum of the response times of each of the blocks
that the constraint places, or , where is the set of blocks contained in the constraint. In
other words, the weight of a constraint is the sum of the response times for the requests to all of the blocks it
affects. The learner then uses a variant of this constraint weight, which we will call the constraint potential,
to determine whether to apply each constraint. This constraint potential is a measure of the actual usefulness
of the constraint. Section 8.3 discusses several methods of computing constraint potential in detail.
An alternate way of formulating the problem, that we have not explored, is to keep the problem in a
block-centric form. In this formulation, the response time of each set of blocks can be written as
where is a boolean variable indicating whether the th constraint affecting this block is
applied. This approach is an interesting alternative that warrants exploration.
One difficulty inherent in the problem is constraint conflicts. Because of their independence, different
heuristics may generate conflicting constraints (e.g., the front loading heuristic places blocks on the outer
tracks, while disk shuffling places them near the center of the disk). The learner must resolve these conflicts,
choosing which constraints to apply and which to ignore. Figure 3 shows some example conflicts. The figure
shows how conflicts can involve just two constraints, or many constraints.
1 2 3 4 1 2 4 constraint A constraint B 3 constraint C 1 A constraint D 2 B
Figure 3: Conflict example. This figure illustrates conflict examples. In this example, the sequential constraints A and B
conflict; only one of them can be applied. This is a conflict of length one. In addition, if either of the sequential constraints is
applied, only one of the two placement constraints C and D can be applied. This second conflict is a conflict of length 2.
These conflicts are dependent on both the workload and the heuristics being used. Section 7.1 discusses
conflicts in more detail, and quantifies their frequency.
We approach the optimization problem using a variety of heuristic approaches. We expected that the interdependencies
in the problem would be loose enough, and the gains high enough, to allow these ad-hoc
approaches to have some success. The details of each individual learning approach are found in Section 8.3.
Each approach has a slightly different way of computing the constraint potential, but all of the approaches
share a similar method.
Each method uses the constraint potential to determine which constraints to apply. Once it has chosen
a set of constraints to apply, it sorts them by constraint potential and applies them greedily. This assures that
if two constraints conflict, the constraint with a higher potential will be applied.
Each approach will also try to apply seemingly sub-optimal constraints occasionally to test new options
and avoid local minima. While this method is certainly not an optimal solution, it is a reasonable first try
at the problem. Though the problem is too large to allow exhaustive search (the number of constraints is
usually in the thousands), there are other more sophisticated approaches that would be interesting to apply.
4.2.4 Test window
The learner is looking for constraints that not only perform well on the given day, but also for at least several
days in the future. Section 7.2 discusses this in detail. To accomplish this goal, the learner will not actually
apply a constraint until after it is tested on several days. This is called the test window. The test window
used in this paper is 3 days.
For each constraint the learner also keeps a cumulative weight which is the sum of the constraint
potential for the last several periods. After test window periods, if the constraint has a positive cumulative
weight, meaning it would have been a net gain over the last few periods, it will be applied. Otherwise, it
will be discarded.
4.3 Layout Manager
Once the learner has decided which constraints it would like to apply, the layout manager must convert these
constraints into an actual disk layout.
The current implementation assumes that the layout manager knows which blocks are in use and which
are free. In the trace, this is accomplished by simulating a larger disk than that used in the trace. All of
the experiments use a disk that is 25% unallocated. This free block knowledge allows for more intelligent
reorganization, because the system can move blocks into free space, instead of having to disrupt the location
of live blocks.
The layout manager uses several steps to convert the constraints into a layout. First, it sorts the constraints
into the areas into which they must be placed. If constraints are not placed in any area, it will place
them in the first open area starting at the beginning of the disk.
Once it has sorted the constraints into disk areas, it sorts the constraints in each disk area by the first
logical block number it affects, to try to keep as similar an order as possible to the previous layout. Once it
has done this, it attempts to place all of the newly placed blocks into empty spaces in the disk area. If that is
not possible, it de-fragments the area and places the new blocks at the end of the area.
If there is not enough empty space in the area, the layout manager will move a chunk of contiguous
blocks from the area to another location of the disk in order to free up enough space for the new blocks. It
will try to move this data to an area as close to the original area as possible.
This method helps avoid the problem of incremental change. If the layout manager is not careful, it
may undo work that has been done previously by the learner. For example, a naive approach would be to
place blocks in the beginning of each area, but this means that the blocks placed in the last iteration would
be disturbed each day.
A completely accurate solution would have to maintain the constraints which have been applied previously
in order to avoid breaking them by the application of new constraints. This is infeasible because of
the amount of storage necessary. The current method attempts to avoid breaking many constraints by only
moving large chunks of the disk at a time and trying to move the data nearby.
RAID 1Disk array Disk.php 5 Experimental Set Up
This section describes much of the common set up for the experiments found in this paper. All of the
experiments are done in simulation using Disksim  with parameters for a Seagate Cheetah 36ES 18GB
disk drive extracted using the DiXTrac tool . All of th