5. DISK SCANNING ALGORITHMS
Disk scanning algorithms that are discussed in this thesis are based on the same basic principle as the memory scrubbing algorithm [Saleh 1990]. The memory scrubbing algorithm scans primary memory during the idle time of processor in order to find memory defects. Similarly, the disk scanning algorithms use the idle time of disks in detecting latent sector faults as reported in [Kari 1994, Kari 1993, Kari 1993a, Kari 1993b].
The disk scanning algorithms differ significantly from the memory scrubbing algorithm. For the memory scrubbing algorithm, time to access any memory location is almost constant while a disk access depends on the previous disk location. Especially with high data locality (where the probability of accessing nearby the previous request is high), the scanning requests tend to move the heads away from their previous location causing increased seek times.
In this chapter, four different scanning algorithms are presented. The first two algorithms were published in [Kari 1994, Kari 1993, Kari 1993a, Kari 1993b] while the two later ones are only presented here.
The original scanning algorithm was presented in [Kari 1993]. In this algorithm, the idle time of the system is used. As the disk driver is typically interrupt driven and no polling is allowed, the scanning algorithm is a timer based process that checks whether a disk is idle. If the disk is already in use, no action is taken. If the disk is idle, then the algorithm issues a read request to the disk.
This method reduces significantly the longest delays that any user disk request may experience. The maximum time that a user disk request needs to wait (beside the other user requests) is one scanning request. This is because the scanning algorithm waits for an idle slot thus each subsequent user request will in the worst case have one scanning request ahead of it. Thus, it is not possible to have two scanning requests on a disk as the other request would have been established while the previous one was still pending and this would violate the idle test of the algorithm.
The scanning algorithm utilizes the advanced statistics of a modern SCSI disk as described in the previous chapter [ANSI 1994, Seagate 1992a]. The parameters of the scanning algorithms are fixed and they depend on the estimated maximum load of the system.
The original (non-adaptive) scanning algorithm is as follows:
1. Reset initial parameters (waiting time wt, request size rs, start address sa, and size of check region cr).
2. Reset the start address offset, sao=0.
3. Wait for a fixed amount of time wt.
4. If there is at least one user request waiting for the disk (in process or waiting in a queue), go to step 3.
5. Issue a disk read request at location sa+sao with size rs.
6. When the read request has been completed, increment sao by rs.
7. If sao < cr, go to step 3.
8. Read the extended disk statistical information.
9. If no faults were encountered, go to step 13.
10. Test the potentially faulty address (pfa) that was reported by the statistics.
11. If the sector in pfa is faulty (or the test reads cause more retries), start the REASSIGN BLOCK command to repair that sector.
12. If pfa<sa or pfa>(sa+sao), go to step 2.
13. Increment sa by sao,
14. If the entire disk is not yet scanned, g o to step 2.
15. Reset the start address sa=0 and g o to step 2.
The main benefit of Algorithm 1 is that it needs to access only once a deteriorated sector to locate it while scanning a large number of sectors. However, there are three minor disadvantages. First, the advanced statistics are available only in SCSI-2 (and SCSI-3) disks and it is partly vendor specific [ANSI 1994]. Second, the masking effect may hide some errors as described in the previous chapter. Third, the early warning signs of media deterioration can cause an extensive usage of spare sectors unless a thorough test is performed to check each potentially faulty sector. However, it is still a better alternative to replace entire disk (without data loss) than to lose critical data due to latent sector faults.
The time to scan the entire disk depends strongly on the wait time and the system load. For example, if the constant load in the array is 80%, the wait time is 10 seconds, and the scanning request size is 64 kB, then the time to scan a 1 GB hard disk is
where is the disk size and is the disk load. If the load in the array is not constant (e.g., the array is used 12 hours per day with 80% load and rest of the day it is totally idle), the scanning time would be only about 3.2 days.
The main problem of Algorithm 1 is that it is not able to utilize uneven load patterns very efficiently. Especially, the low load period is underutilized. The situation can be improved by an adaptive algorithm that adjusts its activity based on the current disk load. When the disk load caused by normal user requests goes down, the scanning algorithm can increase its activity. Respectively, when the disk load increases, the scanning algorithm should reduce its activity to ensure that the delays caused by the scanning process are limited. The size of the request block is usually kept constant as it is optimized with respect to the rotation delays.
Beside the scanning process, this algorithm utilizes also a timer-based process. This process checks the disk subsystem activity at regular intervals. Based on the current instantaneous disk activity and the activity history, an estimate of the system activity is calculated. The system activity is then used for determining proper parameters in the scanning algorithm.
The proposed enhanced scanning algorithm is as follows [Kari 1994].
1. Reset the initial parameters (waiting time wt, request size rs, start address sa, and size of check region cr).
2. Reset the start address offset, sao=0.
3. Wait for an amount of time given by wt.
4. If there are no user requests in the disk (in process or waiting in a queue), go to step 7.
5. Call function adjust(wt).
6. Go to step 3.
7. Call function adjust (wt).
8. Issue a disk read request at location sa+sao with size rs.
9. When the read request has been completed, increment sao by rs.
10. If sao < cr, go to step 3.
11. Read the extended disk statistics information.
12. If no faults were encountered, go to step 16.
13. Test the potentially faulty address (pfa) that was reported by the statistics.
14. If the sector in pfa is faulty (or the test reads cause more retries), start the REASSIGN BLOCK command to repair that sector.
15. If pfa<sa or pfa>(sa+sao), go to step 2.
16. Increment sa by sao.
17. If the entire disk is not yet scanned, g o to step 2.
18. Reset the start address sa=0 and g o to step 2.
The scanning parameters are adjusted using the function adjust(wt) ) and the estimation of the system activity. The system activity is given by
where is the previous estimation of the system activity, is the instantaneous activity (either 0 or 1) at time , and is the history factor. The history factor specifies how strongly the most recent measurements influence on the estimation. This allows fast adjustment to changes.
Based on the activity estimation, the adjust function is given by
where is the minimum wait time for the scanning algorithm (to limit the maximum system load increase caused by the scanning process) and is the maximum wait time for the scanning algorithm (to limit the maximum scanning time).
In Equations (6), the quadratic term of the estimated activity is used for compensating the increase of the queue length with higher system load. If a linear function ( ) were used, the wait time would increase too much even with a moderate disk load (0.25 < r < 0.75) so that a significant part of the scanning capacity would be wasted.
Using the parameters of the above example, this scanning algorithm would scan the entire 1 GB hard disk in 12 hours during idle period of night time, if the minimum wait time ( ) is set to 2.5 seconds. Actually, if the minimum wait time were set to zero, the scanning time would be less than an hour.
The proposed scanning algorithms can be further improved when modifications on the disk device drivers are allowed. Instead of having a timer based operation, the intelligence can be built into the disk device driver. First, the device driver will have two priority queues: one for user requests and one for scanning requests. When there is at least one user disk request in the queue (or in the disk processing), the scanning requests are suspended. When the last user request is completed, a timer is started (e.g., length of one second). If during that time no new user request has arrived, then the scanning queue is resumed and the next scanning request can be issued. When the scanning request completes and if there are still no user disk requests, then the next scanning request can be issued immediately after the previous one.
Waiting a short time after the latest user disk request ensures that if the user issues another disk command right after the previous one was completed, the scanning algorithm does not delay that request. On the other hand, if there have not been any user requests coming for a certain period, it is quite unlikely that there will be a new one in a near future. Especially, the probability of having a new user disk request drops significantly if no user requests have been received for several seconds [Kamunen 1996, Räsänen 1996]. Thus, the scanning algorithm only needs to monitor the time since the completion of the latest user request. This can be a self learning procedure where the scanning algorithm learns the user behavior and adjusts its timer accordingly.
The main concern on this new algorithm is to keep the total load on the disk moderate. As this new algorithm would basically load the disk with 100% load soon after the user disks requests have stopped, there must be a mechanism that slows down the scanning algorithm. The simplest way to do this is to let the scanning algorithm run only once (or a few times) per day. Hence, as soon as the disk is scanned once per given period of time, the scanning algorithm is suspended. The scanning algorithm is again resumed when a new period starts.
It is also possible to simplify the scanning algorithm by setting the SCSI disk parameters properly. Instead of reading the disk statistics every now and then, the disk can be configured to report all faults immediately [Räsänen 1996, Antony 1992]. Then, it is possible to get automatically an indication of all read errors in the disk. Then, both user and scanning disk requests can be used for detecting the latent faults.
The disk scanning can be expedited even more using the VERIFY command [Räsänen 1996, Scritsmier 1996, ANSI 1994, Conner 1992, Seagate 1992a]. Instead of transferring the data from the disk to the disk controller using a normal READ command, a VERIFY command is used. With this command, the data is read from the disk surface into the internal buffers, the consistency of the data is checked, and errors reported. This reduces even further the extra load caused by the scanning algorithm.
With the above described enhancements on the scanning algorithms, it is possible to scan the entire hard disk efficiently (in a matter of hours or even minutes) with minimum effect on user requests [Scritsmier 1996]. If the load in the disk array is uneven during the day, the disks can be scanned once (or a few times) per day during the low load periods. Otherwise, scanning can be done in the background with minimum impact on performance. Hence, it can be safely assumed that a disk scanning algorithm efficiently detects latent sector faults with no or only a marginal effect on performance.