10. DISK ARRAY AS PART OF A COMPUTER SYSTEM

When a highly reliable disk array system is designed, it should be remembered that the disk array is just a part of a larger system and the reliability of the system is dominated by its weakest link. The average reliability of various components of the computer system is far less than the reliability of the disk arrays discussed in this thesis [PCMagazine 1996, Hillo 1993, Gibson 1991].

Disk subsystem

Beside hard disks, a disk subsystem has components such as fans, power supplies, power cables, data cables, and a disk controller [Hillo 1993, Gibson 1991]. The importance of the fans, for example, was stated already earlier as the temperature of disks rises rapidly if the fans do not operate or the ventilation is inadequate. Similarly, the importance of a reliable power supply is obvious. Beside the normal reliability requirements, the power supply should provide stable voltage for disks despite their activity as a disk can shut itself down if the voltage is not stable enough [Räsänen 1994, Seagate 1992]. The power and data cables are typically very reliable (at least when compared with other components) [Gibson 1991]. As a fault in cabling can disable several disks at the same time, a special care must be taken to arrange the disk array with minimized risk of related faults.

One of the most unreliable parts of the disk subsystem is the disk controller [Hillo 1993, Gibson 1991]. Especially, the large amount of RAM (e.g., used for cache buffers) reduces significantly the reliability of the controller unless non-volatile ECC based memory is used [Hillo 1993].

The major difference of the faults in the surrounding components of a disk subsystem compared with the faults in the disk units themselves is data unavailability instead of permanent data loss. The surrounding components can fail causing temporary data unavailability while the data is not actually lost (i.e., data can be made available again by repairing the faulty unit). However, some of the faults in the surrounding components may also cause data loss. For example, data stored temporarily in a disk controller (but not yet written into a disk) is lost during a power failure if the memory has no battery backup.

Computer system

The other parts of the computer system (such as host CPU, main memory, network interface, other I/O devices, and operating system) have also a significant impact on the total reliability. Typically, the reliability of the system is reduced further by these components. Only in highly reliable/available computer systems, the reliability of these other parts of the computer system is high enough (e.g., due to redundant components) that the impact of the disk subsystem reliability becomes significant.

Here, only hardware related components have been discussed, but, in practical systems, significant portion of faults is caused by software errors for example in the operating system, the device drivers, or the disk array firmware.

Human errors

One of the main causes for data loss in a modern computer system is neither the physical failures of the equipment nor the software errors but human errors. A disk array or any other reliable hardware configuration does not prevent a user from deleting accidentally the wrong files from the system.

Some of the human errors can be prevented by advanced hardware design. For example, if the disk array supports the hot swap concept, those disks that are currently in use should be protected against accidental pull out. A typical example that can cause data loss in such a system is when a serviceman pulls accidentally a wrong disk out of a crippled array. By pulling out the wrong disk, the consistency of the array is lost since no redundancy was left after the disk failure. This can be prevented by software controlled physical locks that allow the serviceman to pull out only the failed disk.

Importance of backups

Reliability improvement of a computer system does not make the backups obsolete. On the contrary, the backups are still needed and they are a way to protect against human errors and major accidents that could destroy an entire computer system. A good example of such an approach is a distributed computing and backup system where distant computers are mirrored to ensure a survival even after a major catastrophe [Varhol 1991].

Links

RAID data recovery, Mac data recovery, Unix data recovery, Linux data recovery, Oracle data recovery, CD data recovery, Zip data recovery, DVD data recovery , Flash data recovery, Laptop data recovery, PDA data recovery, Ipaq data recovery, Maxtor HDD, Hitachi HDD, Fujitsi HDD, Seagate HDD, Hewlett-Packard HDD, HP HDD, IBM HDD, MP3 data recovery, DVD data recovery, CD-RW data recovery, DAT data recovery, Smartmedia data recovery, Network data recovery, Lost data recovery, Back-up expert data recovery, Tape data recovery, NTFS data recovery, FAT 16 data recovery, FAT 32 data recovery, Novell data recovery, Recovery tool data recovery, Compact flash data recovery, Hard drive data recovery, IDE data recovery, SCSI data recovery, Deskstar data recovery, Maxtor data recovery, Fujitsu HDD data recovery, Samsung data recovery, IBM data recovery, Seagate data recovery, Hitachi data recovery, Western Digital data recovery, Quantum data recovery, Microdrives data recovery, Easy Recovery, Recover deleted data , Data Recovery, Data Recovery Software, Undelete data, Recover, Recovery, Restore data, Unerase deleted data, unformat, Deleted, Data Destorer, fat recovery, Data, Recovery Software, File recovery, Drive Recovery, Recovery Disk , Easy data recovery, Partition recovery, Data Recovery Program, File Recovery, Disaster Recovery, Undelete File, Hard Disk Rrecovery, Win95 Data Recovery, Win98 Data Recovery, WinME data recovery, WinNT 4.x data recovery, WinXP data recovery, Windows2000 data recovery, System Utilities data recovery, File data recovery, Disk Management recovery, BitMart 2000 data recovery, Hard Drive Data Recovery, CompactFlash I, CompactFlash II, CF Compact Flash Type I Card,CF Compact Flash Type II Card, MD Micro Drive Card, XD Picture Card, SM Smart Media Card, MMC I Multi Media Type I Card, MMC II Multi Media Type II Card, RS-MMC Reduced Size Multi Media Card, SD Secure Digital Card, Mini SD Mini Secure Digital Card, TFlash T-Flash Card, MS Memory Stick Card, MS DUO Memory Stick Duo Card, MS PRO Memory Stick PRO Card, MS PRO DUO Memory Stick PRO Duo Card, MS Memory Stick Card MagicGate, MS DUO Memory Stick Duo Card MagicGate, MS PRO Memory Stick PRO Card MagicGate, MS PRO DUO Memory Stick PRO Duo Card MagicGate, MicroDrive Card and TFlash Memory Cards, Digital Camera Memory Card, RS-MMC, ATAPI Drive, JVC JY-HD10U, Secured Data Deletion, IT Security Firewall & Antiviruses, PocketPC Recocery, System File Recovery , RAID