RAID (redundant array of inexpensive disks) is a method of storing data across multiple physical disk drives in order to minimise data loss in the event of a hardware failure and to reduce the cost of building large scale disk environments. The subject of RAID was first discussed in the paper: ‘A Case for Redundant Arrays of Inexpensive Disks’ by David Patterson, Garth Gibson and Randy Katz. A copy of the original document can be found at http://sunsite.berkeley.edu/TechRepPages/CSD-87-391 for those who are interested.

At the time, their logic was based on expensive disk technology in the mainframe world compared to cheaper and no less reliable PC disks and how cheaper disks could be made to perform with the same (or better) reliability and performance than the expensive mainframe counterparts.

The remainder of this article discusses the RAID options that are now standard within the storage world and moves on to review practical implementations of those concepts in software and hardware products. RAID Levels Patterson, Gibson and Katz proposed the division of RAID options into RAID levels. Each level describes an enhancement or modification on the previous level. We will present and discuss all existing levels here.

RAID LEVEL 0
RAID 0 specifies simply striping file across all disks in an array. Each disk has a stripe of fixed width and files are laid out across the disks, a stripe at a time.

As data is written to the filesystem, the RAID array is able to make use of writing to multiple disks concurrently and therefore an improvement in performance is achieved. Multiple I/O operations to different files can also be achieved in the same manner. The disadvantage of the striped array is in the loss of a single disk unit. Loss of a single disk drive causes a loss of all data within the striped array. In practice this needs to be measured in context against the chance of device failure. For example the latest Fujitsu drives quote a MTBF of 1,200,000 hours, which in an array of any size is unlikely to cause an issue. It is more likely that external damage to the drive will occur (for example shock damage). Benefits: Improved read/write performance, no space overhead; Disadvantages: loss of any device causes loss of the entire array contents. RAID LEVEL 1
RAID 1 specifies disk mirroring of each drive with an entire mirrored copy. Effectively this means that available space appears to be 50% used. The disk controller writes to both mirror copies and must wait for completion of both I/O requests before signalling to the host that the I/O is complete, therefore there is no improvement in response times. In terms of reliability, then failure of a single mirror disk does not cause data loss. The failing device can be replaced and the mirroring re-established. The likelihood of a second disk failure whilst the first failed disk is being replaced is extremely small.

RAID LEVEL 2
RAID 2 relates to the use of error correction for disk drives that have no inherent error correction technology. As all modern drives have in-built error correction, this format is not discussed any further.

RAID LEVEL 3
RAID 3 introduces the idea of a parity disk which is used to re-create data on a failing disk due to a hardware problem. The example in figure 3 shows a 3 disk array and fourth parity disk. Data is striped across the three data disks, with the fourth disk created as a combination of the three data disks. This configuration gives good read and write performance as data needs only be read from the data disks and is spread across three (or more devices) for writing. RAID 3 stripes at the byte level, which means no I/O overlap (concurrent reads or writes). As a consequence, RAID 3 is better used in single user systems.

RAID LEVEL 4
RAID 4 is similar to RAID 3, except data is written in blocks rather than at the byte level. This creates the benefit of being able to read records from multiple drives concurrently, thus improving read performance. Write performance is not improved as all writes must queue for the parity disk, creating the risk of a bottleneck when performing concurrent write operations.

RAID LEVEL 5
RAID 5 improves upon the redundancy of RAID 3/4 but spreads the parity and data across all disks within the array of devices. Parity and matching data are never stored on the same physical device. Figure 4 shows the parity block spread across of the 4 disks in the array. Read performance is good as the data is read from multiple devices in the array. Write performance is comparable with RAID 4, requiring 2 reads (to obtain the data block and parity block) and 2 writes (to rewrite both blocks). RAID 5 is one of the most popular RAID implementations.

Implementation
So, now we know what the RAID formats are, how do we go about implementing them? There are two options, we can implement at the Operating System level or we can implement in dedicated RAID hardware. Initially, RAID hardware implementations were expensive and justified only on those systems where RAID was essential in order to maintain high availability. Now, from the Enterprise down, RAID is a standard (and expected) offering from disk manufacturers, right down to PC motherboards which include RAID directly off the main board. Adaptec for example offer a RAID PCI card (2400A) that operates with ATA disks, making it an extremely cheap and cost effective way of implementing a high-availability RAID solution.

Software implementations within different operating systems enable software RAID functionality. For example, Windows 2000/NT 4.0 have always provided RAID implementations under O/S control. However, the RAID configuration is stored in the registry and a loss or corruption of the O/S can lead to complications when attempting to perform a restore. Veritas offers Volume Manager, which can be used on many operating systems including Unix and Windows. This enables disks to be subdivided and then recombined in RAID configurations.

SUMMARY
RAID enables physical disks to be grouped together for performance and reliability benefits. Both software and hardware solutions are available for many O/S platforms, with the current preference being hardware implementations due to stability. reliability and cost.