When you’re building a server of any decent size, you’ll want to do more than simply plonking a pile of disks in the box and configuring it as a big RAID5 partition. If you want to ensure that the kit you’ve bought isn’t being held back by the way you’ve arranged the disks, you need to follow a few simple steps.
1. Access speed
Regardless of what type of interface the disk has (SCSI, IDE/ATA, SSA, etc) the physical properties of the disk itself are a factor in its performance. There are a few rules of thumb here. First, the faster it spins, the faster the data transfer. So a 15,000rpm disk will be quicker than a 7,200rpm one.
Secondly, the more platters (layers of magnetic media), the faster the data transfer. Each side of each platter has its own read/write head(s), and all the heads move at once. Physical movement takes milliseconds, read/write activity takes microseconds. So if you can do fewer head movements and more read/writes, you’re on a winner.
You may find, then, that a larger disk is faster than a smaller disk even if it spins at the same speed, because it has more platters and can therefore do less head moves per read/write.
2. Concurrent access
If your server is running a number of applications at once, you may look to run them on different disks. The reason is simple: if you have your mail software and your Web server application both reading and writing vast amounts of data, if they’re both competing for time on the same physical disk then you have a bottleneck. If you can give each application its own disk, then you’re increasing the ability for reads and writes to happen independently and concurrently. Note there that the term “disk” could equally apply to a group of disks in a RAID partition – see later.
Although the physical disk, being a mechanical item, is the main potential bottleneck, if you have an I/O intensive server there’s the potential for the disk interface bus or controller to become congested. If this is the case, the solution is simple – add more of them and make them independent, generally by putting the disks in one or more external enclosures and connecting them to the servers via separate SCSI or Fibre Channel adaptors. If the operating system knows that disk A is on card X and disk B is on card Y, it will address them separately and the only bottleneck you’ll have left is the one you’re stuck with – the computer’s CPU and the internal hardware between it and the disk subsystem.
It’s also a good idea to consider having the operating system itself split its work over multiple disks. Most modern operating systems use the concept of “swapfiles” – where a chunk of disk is used as “virtual memory” – and for best results you should have this on a disk of its own, separate from the main OS boot disk. Consider also having core system functions such as the server-wide event log allocated their own disks – where significant amounts of data are flowing, performance improvements are often gained with a little thought about where the bottlenecks might be.
3. RAID choices
An extension of part 1’s discussion of “more platters equals more performance” comes with the introduction of RAID – a system that allows you to have multiple disks appearing as a single “partition”. If you have n identical disks, then by moving all of the heads on all n disks at once you can instantly multiply the amount of data read or written for a single head move by n. Because this is a discussion on performance optimisation we’re not going to go into the compromises of such “striping” versus the resilience benefits of, say, RAID 5. Suffice it to say that if you want performance, use this “striping” approach but think hard about what happens if a disk breaks.
Incidentally, if you want performance, don’t ever consider using the software RAID facilities offered by Linux, Windows 2000 Server and the like. They’re excellent implementations, but if you want speed, buy a hardware RAID card.
Finally, use all the tools at your disposal to monitor the loading on each of the elements of the disk subsystem. If you know the loading of the various interface cards and individual disks, you can make adjustments to the configuration (or split off functions into yet more separate partitions) in order to keep things optimal.
It’s not hard to get optimal performance from your storage subsystem. It just takes a little thought as to what’s reading and writing what, and when. Bear in mind also, though, that you don’t have to spend zillions to get more from your server – even a basic PC with a CD-ROM drive can take up to three disks on two separate IDE channels, whose uses can be decided sensibly in order to optimise performance.