Following on from flash-in-the-datacentre-evangelist Joseph Reger's interview I was contacted by Doug Dumitru, the chief technology office (CTO) for EasyCo, a supplier of flash management software. Dumitru said about the Reger interview: "I agree with his assessment, but think that the future is likely to arrive a bit sooner than he envisions."
"Our company does Flash SSD integration for database servers and has also developed a software solution (that can also live at the controller level) that mitigates the random write performance issue inherent in all Flash solid state disks."
Here is a performance example of his: "If you take four Mtron (flash SSD) serial ATA (SATA) drives, running RAID-5, you can expect about 50,000 4K random reads/sec and 250 4K random writes/sec. This read/write performance asymmetry is the Achilles heal for Flash SSDs. In read-only applications, they are fabulous, but if you start to do random writes, they quickly start acting like floppies. The Mtron drives at 125 write IOPS are actually very good. We have tested MLC Flash SSDs with write IOPS as low as 3.3 writes/sec."
Then he adds in EasyCo's secret sauce: "This is where our "Managed Flash Technology" management layer steps in. By dynamically re-mapping the drive, we operate in a mode where the drive is always written linearly. Our basic premise is that if the drive writes randomly slowly, then you should never write randomly."
That seems astute. You can find out more here.
Dumitru said: "With the MFT layer in place, this same 4 drive Raid-5 array benchmarks at just over 29,000 4K random writes/sec when running on a "host raid" dumb controller. By my calculations, this is the random performance equal of about 200 15K SAS drives, using 4 2.5-inch devices that draw less than 10Watts of power."
He seems to have somehow realised flash SSD array technology potential and made it usable today. Is this real?
This prompted an interview
Techworld asked Dumitru a set of questions to find out more.
Techworld: Do you any more details about customers using your MFT that you could share please?
Doug Dumitru: Our company is historically involved in the "MultiValue Database" market. This started with the old Pick Systems products dating back to the 1980s and continues today with products like Universe from IBM and D3 from Raining Data. As such, most of our early customers are in our industry. We have about 5 production systems running MFT. The oldest ones use CF cards. Some are running Samsung laptop drives. The newest are using Mtron SATA drives. These databases tend to be small (5-20 GB).
We also have a couple of "demo boxes" for larger datasets (50-150 GB), but none of these are running production applications yet. Now that the Mtron drives are easily available, we hope to expand into this mid-market segment over the next couple of months.
Techworld: I'd be interested to learn about the performance they have obtained, compared with the pre-MFT performance, both flash pre-MFT and disk.
Doug Dumitru: A lot of the applications that MultiValue databases host tend to be random write limited. Thus the usefulness of MFT. In terms of performance improvement, we have people who came off of 10K SCSI systems moving to single drive Samsungs and reports 15x improvements in the run-time of single-threaded batch jobs.
Techworld: What price did they pay for the flash SSD?
Doug Dumitru: Our existing customers are running servers that we built, with the exception of 1 dealer in Switzerland who builds their own white boxes. We price the flash drives as a part of a disk subsystem to cover the cost of the drives and our software. It usually works out to marking up the drives an additional 50% to cover the MFT license. Thus a 4 drive Raid-5 set with 32G Mtron drives will cost around $6000 including the controller and SATA bays. This gives you about 85GB of mountable space. Performance wise, doing 4K random IO with 10 threads, you get about 43,000 read IOPS and 25,000 write IOPS. This puts this array in the "160 15K SAS Drives" class, but the amount of space is obviously a lot smaller.
Techworld: How much capacity is involved?
Doug Dumitru: We use normal Flash disks. Right now, that means 32G and 64G drives from the mainstream vendors. While very large disks from people like bitMicro would work, they are far to expensive to make sense for us. We are very much positioned as a "middle market" solution.
MFT itself requires some space as "overcommit". For database applications we recommend 10%, but this is tunable from 5 percent upward. Dedicating more space can help out very busy database patterns, but most of the time this is hard to measure.
We can also use stock hardware to build Raid arrays. Because of the nature of MFT, RAID-5 works very well and RAID-10 is unnecessary. The biggest array we have tested is eight 32G drives. The biggest array that we think is practical is 32 64G drives or 2TB. This limit mostly has to do with direct attach and SATA issues. Because of how the current series of Flash SSDs work, we don't think they will scale well when used with SAS port expanders.
Techworld: Could they get the same performance from disk?
Doug Dumitru: Multi-threaded this is possible but it takes a lot of drives. When you are talking about measured 50,000 4K random read/write IOPS, this is a 200 drive raid-10 array of 15K SAS drives. We know there are arrays that are this big, but I have not actually seen one in person. I did some testing on a 24 drive 15K SAS array (HP) with 4K operations and got the expected 5,400 read IOPS and 3,000 write IOPS.
And this was "short seeking" a small partition on the array. We also had a VAR do some testing comparing 3 Samsung drives configured raid-0 with an IBM 16 drive storage array. This test involved Oracle. The MFT drives were over iSCSI and the IBM array was direct connect over FC. The 3 Samsung drives were about 4x faster than the 16 15K FC drives. This included the added overhead of iSCSI (which is a lot).
If you are looking for single-threaded performance, then hard disks don't even come close. At 225 IOPS for a single drive, the Flash drives are easily 50x faster.
Techworld: Are they pleased with their MFT+flash SSD solution robustness?
Doug Dumitru: We had one demo server running RAID-0 glitch with a drive failure that forced us to reformat the partition. We don't sell RAID-0, but this was a performance demo built from "parts on hand". Otherwise, all of our production servers are 100 percent uptime so far.
Our "calculations" for things like drive endurance lead us to believe that the drives will, on average, be long obsolete before they wear out. One side effect of MFT is that we avoid random writes to the drives. This not only makes the drives perform a lot better, it also wears them out a lot slower.
For example, if you take an Mtron 32G drive and write at 80 MB/sec to it continuously 24x7, it takes 1.25 years to reach 100,000 erase cycles with wear leveling. Some vendors quote 1,000,000 or even 5,000,000 erase cycles. In a server, the drive will tend to have some quiet times and do at least some reads, so a 5 year life is pretty close to "worst case" for anything other than a data logging application.
Techworld: Could you discuss the idea of using MFT at the controller level in more detail please?
Doug Dumitru: This is possible, but it adds real cost to the controller implementation. MFT requires about 0.12 percent of the drive's capacity in DRAM. Thus a 32GB drive needs about 40 Megabytes of ram to manage our mapping tables. When implemented on the host, 40MB is pretty much in the noise for current servers. Putting that amount of intelligence on the drive is costly. And right now, flash drives need to get cheaper, not more expensive. Anything that drives the costs up is moving in the wrong direction.
There is also the issue of price/performance for where you implement MFT. Putting MFT in the drive does not really make it run any faster (or any slower for that matter). There are advantages for MFT managing an entire array instead of single drives, so that is a plus for a host or raid controller implementation. We started out on the host side because it got us to market quickest.
- SSD disruptive technology in the datacentre - the Joseph Reger interview.