9. PERFORMABILITYIn this chapter, performability of a disk array subsystem is studied. The focus is concentrated on the performability as the definition of “cost” in the cost-performability is somewhat ambiguous. A simple performance model of the disk array is used in this chapter, because more accurate models are considered to be out of the scope of this thesis. 9.1 Performability modelsFigure 37. Simple Markov model for performability of TMM ( is the number of disks, is the disk failure rate, is the repair rate, defines the probability of the system being at state at time where defines the number of faulty disks in the disk array, and defines the reward function at state at time ) Performability (i.e., combined performance and reliability) of a system can be expressed using Markov reward models [Trivedi 1994, Catania 1993, Pattipati 1993, Smith 1988, Furchgott 1984, Meyer 1980, Beaudry 1978]. Figure 37 illustrates a performability model for TMM presented in Chapter 6. This is a typical RAID-5 array with a D+1 redundancy scheme. For each state i (i=0,1, and 2), two parameters are defined: probability and reward. The first parameter, probability ( ), defines the probability of the system being in state i at a given time t. This is the same probability as used for the reliability analysis in Chapters 6, 7, and 8. The second parameter, reward ( ), defines the reward what system gets while being in state i at a given time t. The reward function can be, for example, the performance of the system. In a disk array, the performance can be expressed using the number of I/O operations per second. For example, in state 0 of Figure 37, the reward function specifies the number of user I/O operations that the disk array can perform in the fault-free state. In state 1, the reward function specifies the number of user I/O operations that the crippled disk array can perform while it is either waiting for the repair process to start or while the repair process in ongoing. In state 2, the reward function is zero because the data is lost and the disk array has failed. For simplicity, it is assumed in this thesis that the reward function is constant (i.e., ) and only depends on state i but not the time. The performability (or computational availability) in state i at a given time t can be then expressed as a product of the two above mentioned parameters (i.e., reward and probability of state i) as follows . (106) And, the total performability at a given time t can be expressed as the sum of the performabilities of all states i as follows . (107) Steady state performability If the Markov model describes a steady state system, the performability can be expressed as follows . (108) Non-steady state performability In a non-steady state system, the probability of state i is changing. Eventually, the system will fail (in Figure 37, ). The cumulative performability of a system with non-repairable faults can be expressed as (109) where is the cumulative reliability of state i. 9.1.1 Performability of TMMThe performability of a RAID-5 disk array that is modeled with TMM can be expressed using the above equation (109) and the probabilities of states 0, 1, and 2 as expressed in Chapter 6 in equations (12) - (14). The cumulative reliabilities of TMM are: , (110) , (111) and the value of has no effect since . Then, the performability of TMM is (112) where and are the reward functions of state 0 and 1, respectively. The reward functions depend on the type of the operation (read or write). It should be noticed that equals MTTDL of TMM if and equal one. As is typically greater than or equal to , it is possible to obtain an upper limit estimation for performability by multiplying MTTDL of the array with the reward function of the fault-free state. Hence, the approximation of the performability can be expressed as (113) where is MTTDL of the array. 9.1.2 Performability of EMM1The performability of a RAID-5 disk array that is modeled with EMM1 can be expressed using the above equation (109) and the probabilities of states 00, 01, 10, and f as expressed in Chapter 7 in equations (61) - (64). The cumulative reliabilities of EMM1 are: , (114) , (115) , (116) and the value of has no effect since . Then, the performability of EMM1 is (117) where , , and are the reward functions of state 00, 01, and 10, respectively. The reward functions depend on the type of the operation (read or write). 9.1.3 Reward functions of disk array subsystemsThe performance of disk arrays can be modeled using a simple performance model for the arrays like in [Hillo 1993, Gibson 1991, Kemppainen 1991]. Here, the reward functions are modeled using the performance of the disk array that is estimated for either read or write operations but not for mixed read and write operations. More accurate performance model of the disk arrays is considered to be out of the scope of this thesis. RAID-5 performance In a RAID-5 array, a total of disks is used for building an array of data disks. There are disks in the crippled array. If I/O requests are not assumed to span over several disks, each request would require the following number of disk accesses: • 1 disk operation to read from a fault-free disk array; • disk operations to read from a crippled disk array (in the worst case); • 4 disk operations to write to a fault-free disk array; • disk operations to write to a crippled disk array (in the worst case); and • disk operations to reconstruct a faulty disk block. RAID-1 performance The above equations (112) and (117) for performability are dedicated to RAID-5 arrays analysis. However, later in this chapter, it is shown that good estimation for the performability can be made using MTTDL of the array and the reward function of the fault-free state. Hence, reward functions for the RAID-1 array are also included here. In a RAID-1 array, a total of disks is used for building an array of data disks. There are disks in the crippled array. If I/O requests are not assumed to span over several disks, each request would require the following number of disk accesses: • 1 disk operation to read from a fault-free disk array; • 1 disk operation to read from a crippled disk array; • 2 disk operations to write to a fault-free disk array; • 2 disk operations to write to a crippled disk array (in the worst case); and • 2 disk operations to reconstruct a faulty disk block. Relative performance In a disk array, the maximum number of I/O operations depends on the array configuration, the type of the operation and the properties of the disks. The array configuration specifies how many parallel read and write operations can be performed as illustrated in the introduction in Chapter 1. In this thesis, the performance is expressed as relative comparison with a single disk. Relative performance value one corresponds to one fully working disk serving user requests. For example, a fault-free RAID-1 with two disks has relative performance two for read operations and one for write operations. Effect of the scanning algorithm The effect of the scanning algorithm is studied by reserving a certain capacity for the scanning algorithm. For every disk, a certain capacity (as expressed with ) is reserved for scanning and remaining capacity ( ) is available for other requests (user requests or repair). Effect of the repair process The repair process decreases the maximum number of user operations in the crippled array. The degree of degradation depends on the activity of the repair process. When, for example, a disk array of a total of ten disks is being repaired using 20% of the capacity for repair (as expressed with the repair activity, ), the theoretical remaining capacity is 8 units. This is further reduced if the read or write request needs to have several disk operations. For example, to write to a crippled RAID-5 array needs 10 disk operations. Hence, the relative performance is only . As for comparison, the relative write performance in the same size fault-free array would be 2.5. Reward functions of RAID-5 and RAID-1 The relative reward functions of RAID-5 and RAID-1 arrays are illustrated in Table 15. It is assumed that three different states from the point of view of performance are: • all disks working (state 0 in TMM and states 00 and 01 in EMM1); • one disk unit failed (state 1 in TMM and state 10 in EMM1); and • data lost (state 2 in TMM and state f in EMM1). Sector faults are considered not to degrade the performance. Table 15. Relative reward functions of RAID-5 and RAID-1
In a RAID-5 array, all disks are involved with the repair process. As the worst case scenario is used here, the read operation to a crippled array would require to access all remaining disks. Hence, the relative performance is one from which the repair activity is deducted. Similarly in the worst case, the write operation requires to read all remaining disks once and to write to one disk. From this relative performance, the repair activity is deducted. In a RAID-1 array, only two disks are involved with the repair process. When the array has data disks ( disks totally), data disks are not effected by a disk unit fault. For a read operation, there are disks available and the performance is further reduced by the repair process in one disk. For a write operation, there are data disks that are not effected by the disk unit fault and one data disk that is effected by the repair process. 9.1.4 Performability comparisonsPerformability of a RAID-5 array modeled using TMM and EMM1 models is illustrated in Figure 38. Here, the same default parameters are used as in Chapter 8. This figure shows that the approximation (performability equals MTTDL multiplied with the reward function of the fault-free state) provides accurate results. Hence, the same approximation principle is used with the RAID-1 array. It is also concluded that both performability models provide similar results that correspond to the reliability results. Figure 38. Performability of RAID-5 array as a function of the number of disks in the array Effect of the repair activity The effect of the repair activity is studied in a configuration where the repair time depends on the number of disks and the repair activity. In Figure 39, performability of a RAID-5 array is illustrated as a function of the repair activity. Here, a RAID-5 array of 50 data disk is studied in two configurations: hot swap (repair starts 8 hours after the disk failure) and hot spare (repair starts immediately after the disk failure). The read operation provides four times better performability than the write operation as its reward function in state 00 is four times better. The repair time is assumed to be two hours with 100% repair activity and relatively longer, if the repair activity is less than 100%. The hot spare configuration provides significantly better performability than the hot swap configuration as the repair time in the latter case is shorter. The performability of state 10 of EMM1 has only a marginal effect (less than 1%) on the total performability of the RAID-5 array. This is because the failure rates in EMM1 are much smaller than the repair rates and therefore the system is mainly in state 00. The only factor that may limit the disk repair activity is the performance requirement during the repair time. If no minimum requirement for performance during the repair time is set, then the repair can and should be done at full speed, otherwise the repair activity should be obstructed to guarantee the minimum performance. The reliability increase due to faster repair is much more significant than the minor performance degradation during the repair time when the total performability is considered. Figure 39. Performability of RAID5 array modeled with EMM1 as a function of repair activity Effect of the scanning algorithm The effect of the scanning algorithm on the performability is studied by varying the scanning activity. It is assumed that it takes 24 hours to scan all disks in the array with 5% scanning activity. The performability of a RAID-5 array is presented in Figure 40 as a function of the scanning activity. When the hot swap (hot spare) configuration is used, the optimum performability is achieved in this sample configuration when the scanning algorithm uses 20% (30%) of the capacity for scanning the disk surface. The hot swap configuration reaches its peak performability earlier than the hot spare configuration as its reliability is dominated more by the longer disk repair time than the hot spare where the reliability can be increased longer with the increased scanning activity and its sector faults detection. Eventually in both cases, the performability starts decreasing when the scanning activity approaches 100%. This is obvious since less and less capacity of the array remains for user disk requests and the reliability does not increase because it is limited by the repair time of the disk unit failure. Figure 40. Performability of RAID5 array modeled with EMM1 as a function of scanning activity RAID-5 vs. RAID-1 The performability of RAID-1 and RAID-5 arrays is compared in Figure 41. The performability of the RAID-5 array is achieved using the above equations (112) and (117) while the performability of the RAID-1 array is approximated using MTTDL of RAID-1 multiplied with the appropriate reward function. As MTTDL of a RAID-1 array is approximated by dividing MTTDL of RAID-1 with two disks with the number of data disks in the array while the reward function is relative to the number of disks in the array, the performability of the RAID-1 array is constant (i.e., while the performance of the disk array increases with the number of disks in the array, at the same time the reliability decreases thus keeping the performability constant). Actually, the same effect can be found also with RAID-0 arrays where the performability remains constant but at a much lower level because the RAID-0 array has no redundancy. On the other hand, the performance of the RAID-5 array increases almost linearly with the number of disks in the array, but the reliability decreases more rapidly as more and more disks are protected just with a single disk. 9.1.5 Conclusions of performability analysisThe conclusions of the performability analysis are gathered in the following list: • Performability of a disk array can be well approximated by multiplying MTTDL of the array with the reward function of the fault-free state when the repair rates are much higher than the failure rates. • Performability of RAID-0 and RAID-1 arrays is constant regardless of the number of disks in the array. Higher performance is achieved with larger number of disks but at the expense of reduced reliability. Figure 41. Performability of RAID-1 and RAID-5 arrays • Performability of a RAID-5 array decreases as the number of disks increases. This is because reliability drops more than what performance increases. • A RAID-1 array provides better performability than a RAID-5 array with the same number of data disks. The penalty for higher performability of the RAID-1 array is the larger number of disks in the array and higher number of failed disks. • A scanning algorithm can improve performability. The scanning algorithm increases first the performability as the disk array reliability increases while the performance degradation remains still moderate. When the scanning activity increases further, the reliability no longer increases because the reliability bottleneck will be the disk unit faults, but at the same time the performance of the array drops. Thus, the performability also sinks. • The increased speed of the repair process effects the performability by improving the reliability while the effect on the average performance is marginal. The only reason to limit the speed of the repair process is to guarantee a certain performance even with a crippled array. |
RAID data recovery, Mac data recovery, Unix data recovery, Linux data recovery, Oracle data recovery, CD data recovery, Zip data recovery, DVD data recovery , Flash data recovery, Laptop data recovery, PDA data recovery, Ipaq data recovery, Maxtor HDD, Hitachi HDD, Fujitsi HDD, Seagate HDD,
Hewlett-Packard HDD, HP HDD, IBM HDD, MP3 data recovery, DVD data recovery, CD-RW data recovery, DAT data recovery, Smartmedia data recovery, Network data recovery, Lost data recovery, Back-up expert data recovery, Tape data recovery, NTFS data recovery, FAT 16 data recovery, FAT 32 data recovery, Novell data recovery, Recovery tool data recovery, Compact flash data recovery, Hard drive data recovery, IDE data recovery, SCSI data recovery, Deskstar data recovery, Maxtor data recovery, Fujitsu HDD data recovery, Samsung data recovery, IBM data recovery, Seagate data recovery, Hitachi data recovery, Western Digital data recovery, Quantum data recovery, Microdrives data recovery, Easy Recovery, Recover deleted data , Data Recovery, Data Recovery Software, Undelete data, Recover, Recovery, Restore data, Unerase deleted data, unformat, Deleted, Data Destorer, fat recovery, Data, Recovery Software, File recovery, Drive Recovery, Recovery Disk , Easy data recovery, Partition recovery, Data Recovery Program, File Recovery, Disaster Recovery, Undelete File, Hard Disk Rrecovery, Win95 Data Recovery, Win98 Data Recovery, WinME data recovery, WinNT 4.x data recovery, WinXP data recovery, Windows2000 data recovery, System Utilities data recovery, File data recovery, Disk Management recovery, BitMart 2000 data recovery, Hard Drive Data Recovery, CompactFlash I, CompactFlash II, CF Compact Flash Type I Card,CF Compact Flash Type II Card, MD Micro Drive Card, XD Picture Card, SM Smart Media Card, MMC I Multi Media Type I Card, MMC II Multi Media Type II Card, RS-MMC Reduced Size Multi Media Card, SD Secure Digital Card, Mini SD Mini Secure Digital Card, TFlash T-Flash Card, MS Memory Stick Card, MS DUO Memory Stick Duo Card, MS PRO Memory Stick PRO Card, MS PRO DUO Memory Stick PRO Duo Card, MS Memory Stick Card MagicGate, MS DUO Memory Stick Duo Card MagicGate, MS PRO Memory Stick PRO Card MagicGate, MS PRO DUO Memory Stick PRO Duo Card MagicGate, MicroDrive Card and TFlash Memory Cards, Digital Camera Memory Card, RS-MMC, ATAPI Drive, JVC JY-HD10U, Secured Data Deletion, IT Security Firewall & Antiviruses, PocketPC Recocery, System File Recovery , RAID