Solid-state disks (SSDs) are hardly new, but their growing usage represents a significant shift in the primary storage landscape. SSDs have been increasing in capacity and decreasing in cost at an accelerating rate, so the chances that you're going to bump into them in the wild are climbing as a result. However, SSDs are not perfect. A solid understanding of their history and differentiating factors will help you debunk some of the hype and leverage them more effectively in your environment.
The idea of a solid-state disk has been around for a very long time. Essentially, an SSD is persistent storage media constructed using transistors rather than an electromechanical disk or tape. SSDs have been holding the firmware for our switches, routers, cell phones, calculators, and just about any other kind of non disk persistent memory for easily 30 or more years.
What is different today is that we're well down the path of using these SSDs in our enterprise primary storage environments -- either augmenting traditional disks or replacing them completely. This type of application for SSD hasn't been possible until recently due to the tremendous difficulty involved in constructing very large SSD memory modules that are cheap, reliable, and fast and that have a long lifetime. We're still working to overcome some of these challenges, and being aware of them is key to implementing them successfully.
Volatile versus non-volatile SSDs
The biggest distinction to make right off the bat when talking about SSDs is whether they are volatile DRAM-based devices (RAM storage) or non-volatile NAND memory (flash storage) devices. They both often fall under the SSD moniker, so it's easy to get them confused.
DRAM-based devices essentially use the same type of memory that makes up the primary system memory of your server; they are both extremely fast and susceptible to total data loss if power is interrupted for some reason. To combat this, most DRAM-based SSD devices require a battery backup to power the memory and ensure data integrity until power is restored.
In some cases, this battery backup is a super capacitor that can power the device for a few days; this is common in very high-performance DRAM SSDs that ship in a PCIe card factor. However, in the event that power isn't ever restored, your data probably won't be, either. In other cases, the DRAM is paired with an equal-capacity array of hard disks or slower NAND flash memory in a rack-mount chassis that is used to stage and de-stage the DRAM memory during power up and power down (with an internal bank of batteries or capacitors providing enough power to perform the de-stage operation in the event of an unexpected power outage).
NAND-based (flash) devices use the same general breed of memory found in cell phones and USB sticks. These memory devices do not require power to hold their state. Thus, they don't require a battery backup of any kind to ensure data integrity. On the other hand, they're several times slower than DRAM-based devices, though their speed is improving as the devices and their controllers mature.
MLC versus SLC SSDs
NAND devices come in two major flavours: MLC (multilevel cell) and SLC (single-level cell). MLC devices are so named because they can store a few bits of data in the same cell, whereas SLC devices can store only a single bit of data per cell. SLC devices are much more expensive to make because they require more transistors to store the same quantity of data, but they're significantly faster and have a longer lifespan than MLC devices.
Most consumer-grade SSDs, such as the one in your fancy new laptop, are likely MLC devices. In those applications, low cost, lower power usage, and higher reliability when dragged off your coffee table by your dog are key concerns, while raw performance is not. Any enterprise-grade NAND-based storage device is likely to be SLC-based - and much more expensive as a result.
It's all about the controller
As with traditional primary storage devices, NAND flash-based SSDs live or die by the functionality delivered by their controllers. The strength of the on-device controller represents a significant cost in delivering an enterprise-grade SSD, but the controller is also responsible for providing exceptional (or less than exceptional) reliability and performance. In addition, it's where significant technology growth and innovation is still taking place.
Unlike DRAM-based SSDs, flash-based SSDs suffer from long-term reliability issues. Individual single-bit SLC SSD cells usually wear out after about 5 million write cycles. Multibit MLC cells become unreliable after just 500,000 to 1 million write cycles.
To combat this issue, the SSD controller performs a technique called write-levelling that tries to spread writes across the cells that make up the SSD, to ensure relatively equal write-load distribution. Additionally, some SSD controllers use slack (unprovisioned) space that can take over from cells that are near the end of their expected lifetime.
Some cheaper controllers perform this write-levelling without regard to how much load the device is under, while others wait until the device is under lower load before running. This is one of many reasons that benchmarking performance on SSDs is difficult: Performance often looks stellar for a few hours until the write-levelling algorithm starts running, but then it crashes and burns.
Write-levelling can also have some unintended security side effects. Let's say you have an unencrypted file and then decide to encrypt it. As you do this, your server reads the unencrypted file from disk, encrypts it, and writes the encrypted file over the unencrypted file -- usually deleting it in the process. Due to how some write-leveling techniques work, it's possible for your server to believe that the unencrypted files have been overwritten when in fact they have not. Some controllers honour these requests and erase the blocks, while others do not.
Putting it all together
As you start digging into SSDs and deciding whether they're right for your primary storage environment, remember that they are a completely different animal than traditional spinning disks and are still undergoing growing pains. To be sure, the enormous performance potential of SSDs will ensure that they will be an option for IT for many years. Just don't charge into SSD usage without understanding how they work, so you don't have nasty surprises in your production environment.