Guest post by Fusion-io senior director Mat Young

Today, data storage has two masters, performance and persistence, and serving both leads to compromise, with performance often finding itself on the short end of the stick. Compromising storage performance causes CPUs to sit idle, applications to be limited, resources to be wasted and business to be held back.

CPUs work in the nanosecond range, and in order to manage the efficient entry and exit of data a number of memory tiers are used. These tiers are like a high speed motorway - speeds are expected to increase as you enter the freeway and move into the DRAM fast lane on the motherboard.

Several years ago a key change occurred: NAND overtook DRAM in terms of density and the possibility of a new memory tier was created. Essentially, this was like expanding the motorway with a whole new fast lane. NAND based devices then had the potential, if engineered correctly, to have the density of capacity 100x of DRAM and 1000x of disk arrays. To really feel the highest benefit from NAND flash it should be used as a new memory tier as close to the CPU as possible to help accelerate data into and out of the core, the fact it is persistent is simply a happy coincidence.

Placing NAND outside the server introduces latency with physical distance, transport protocols, translation, connectors, and a host of other architectural burdens which limit how fast data can be supplied to the CPU and therefore how much work can be done. Removing all this overhead also enables the CPU to lose the wait to read or write data, which unleashes a level of application performance that is unparalleled in other NAND flash implementations and products.

So how much performance is possible when it is uncoupled from storage? Double? Triple? It depends on the application 99% of times, but it can be as much as 10x and greater. The reason for this is because one of the things ‘hidden’ by idle CPUs waiting on storage to read or write data, is exactly how much more work they could do if optimised.

In simple terms; the more you reduce the time it takes to respond to a thread IO request, then the less threads you need to have to deliver per given workload. This in turn frees up more clock cycles for potentially more threads. If you are responding 10x as fast, then you can fit 10x more work through the CPU. An additional side effect of this is that you don’t need as much DRAM in the system, which reduces financial cost, power and cooling requirements significantly.

If you’re looking to get the maximum amount of performance to your application, then hosting all or as much of the data as possible on the NAND is the best route for you. However, if this approach is not right for your infrastructure, there is another way. By adding intelligent caching software it is possible to gain most of the original benefits whilst reducing the amount of change in the architecture.

When looking at combined NAND and caching solutions, the following criteria should be met:

  • NAND needs to be of a large enough size to host and maximise the chance of a cache hit.
  • The software should support the entire environment, especially virtualisation.
  • The software should allow for different importance levels to be dynamically assigned even down to the guest VM, file or block level.
A side effect of a good server side caching solution is that it reduces requirements for large disk arrays, on a huge scale. Over time, less expensive “good enough” arrays dedicated to storing data through time - rather than supporting performance - would be all that is needed. This is because random read operations put the most load on the array. By eliminating nearly all of these IOs, the disk array is able to cope with much more than it could before. Often a modest modular array will be more than sufficient where the most powerful was needed before installing a performance memory tier.

With storage now serving two masters - performance and persistence - using correctly engineered NAND products in the server, often with specialised caching software, you can let the disk array take the strain of persistence and focus on performance in the server CPU to let your applications fly on your behalf.

Posted by Mat Young, senior director at Fusion-io. Follow Mat on twitter @ispider and Fusion-io via @fusionioUK.