4.2 Learner The learner is the most complicated portion of the system. It is the learner’s job to decide which constraints to apply and which to ignore. This section discusses the basic problem the learner seeks to solve, and an overview of the methods we have used to attempt to solve it. 4.2.1 Problem Overview The learner considers the problem of optimizing the response time for a given set of trace data. We use a time period of one day’s worth of trace, although the time period could be modified. We are seeking to optimize the sum of the response times for the trace segment, , where is the response time of request , and is all of the requests in the trace. This can be stated as a block-centric problem by formulating R in terms of blocks: , where is the sum of the response times of all accesses to block , and is the set of all blocks accessed by the trace. However, because the learner is deciding whether to apply constraints, we convert the problem into a constraint-centric problem. To do this conversion, we assign weights to the constraints to approximate the block-centric weights. The weight of a constraint is the sum of the response times of each of the blocks that the constraint places, or , where is the set of blocks contained in the constraint. In other words, the weight of a constraint is the sum of the response times for the requests to all of the blocks it affects. The learner then uses a variant of this constraint weight, which we will call the constraint potential, to determine whether to apply each constraint. This constraint potential is a measure of the actual usefulness of the constraint. Section 8.3 discusses several methods of computing constraint potential in detail. An alternate way of formulating the problem, that we have not explored, is to keep the problem in a block-centric form. In this formulation, the response time of each set of blocks can be written as where is a boolean variable indicating whether the th constraint affecting this block is applied. This approach is an interesting alternative that warrants exploration. 4.2.2 Conflicts One difficulty inherent in the problem is constraint conflicts. Because of their independence, different heuristics may generate conflicting constraints (e.g., the front loading heuristic places blocks on the outer tracks, while disk shuffling places them near the center of the disk). The learner must resolve these conflicts, choosing which constraints to apply and which to ignore. Figure 3 shows some example conflicts. The figure shows how conflicts can involve just two constraints, or many constraints. 8 1 2 3 4 1 2 4 constraint A constraint B 3 constraint C 1 A constraint D 2 B Figure 3: Conflict example. This figure illustrates conflict examples. In this example, the sequential constraints A and B conflict; only one of them can be applied. This is a conflict of length one. In addition, if either of the sequential constraints is applied, only one of the two placement constraints C and D can be applied. This second conflict is a conflict of length 2. These conflicts are dependent on both the workload and the heuristics being used. Section 7.1 discusses conflicts in more detail, and quantifies their frequency. 4.2.3 Method We approach the optimization problem using a variety of heuristic approaches. We expected that the interdependencies in the problem would be loose enough, and the gains high enough, to allow these ad-hoc approaches to have some success. The details of each individual learning approach are found in Section 8.3. Each approach has a slightly different way of computing the constraint potential, but all of the approaches share a similar method. Each method uses the constraint potential to determine which constraints to apply. Once it has chosen a set of constraints to apply, it sorts them by constraint potential and applies them greedily. This assures that if two constraints conflict, the constraint with a higher potential will be applied. Each approach will also try to apply seemingly sub-optimal constraints occasionally to test new options and avoid local minima. While this method is certainly not an optimal solution, it is a reasonable first try at the problem. Though the problem is too large to allow exhaustive search (the number of constraints is usually in the thousands), there are other more sophisticated approaches that would be interesting to apply. 4.2.4 Test window The learner is looking for constraints that not only perform well on the given day, but also for at least several days in the future. Section 7.2 discusses this in detail. To accomplish this goal, the learner will not actually apply a constraint until after it is tested on several days. This is called the test window. The test window used in this paper is 3 days. For each constraint the learner also keeps a cumulative weight which is the sum of the constraint potential for the last several periods. After test window periods, if the constraint has a positive cumulative weight, meaning it would have been a net gain over the last few periods, it will be applied. Otherwise, it will be discarded. 9 4.3 Layout Manager Once the learner has decided which constraints it would like to apply, the layout manager must convert these constraints into an actual disk layout. The current implementation assumes that the layout manager knows which blocks are in use and which are free. In the trace, this is accomplished by simulating a larger disk than that used in the trace. All of the experiments use a disk that is 25% unallocated. This free block knowledge allows for more intelligent reorganization, because the system can move blocks into free space, instead of having to disrupt the location of live blocks. The layout manager uses several steps to convert the constraints into a layout. First, it sorts the constraints into the areas into which they must be placed. If constraints are not placed in any area, it will place them in the first open area starting at the beginning of the disk. Once it has sorted the constraints into disk areas, it sorts the constraints in each disk area by the first logical block number it affects, to try to keep as similar an order as possible to the previous layout. Once it has done this, it attempts to place all of the newly placed blocks into empty spaces in the disk area. If that is not possible, it de-fragments the area and places the new blocks at the end of the area. If there is not enough empty space in the area, the layout manager will move a chunk of contiguous blocks from the area to another location of the disk in order to free up enough space for the new blocks. It will try to move this data to an area as close to the original area as possible. This method helps avoid the problem of incremental change. If the layout manager is not careful, it may undo work that has been done previously by the learner. For example, a naive approach would be to place blocks in the beginning of each area, but this means that the blocks placed in the last iteration would be disturbed each day. A completely accurate solution would have to maintain the constraints which have been applied previously in order to avoid breaking them by the application of new constraints. This is infeasible because of the amount of storage necessary. The current method attempts to avoid breaking many constraints by only moving large chunks of the disk at a time and trying to move the data nearby. RAID 1Disk array Disk.php 5 Experimental Set Up This section describes much of the common set up for the experiments found in this paper. All of the experiments are done in simulation using Disksim [4] with parameters for a Seagate Cheetah 36ES 18GB disk drive extracted using the DiXTrac tool [25]. All of th |