There are now two important and surprising disk drive failure papers on Techworld. Neither MTTF, SMART diagnostics, temperature, nor drive type are well correlated with drive failure.

On the 21st Feb we ran a news story saying Google researchers had found out that SMART diagnostics were not good predictors of disk drive failure. It was based on a website version of the source paper and this was a ghastly-looking thing with split-up reference sections, missing graphs, and interpolated heading text. Trying to save it and then turn it into text or a PDF resulted in disastrous-looking text or PDF file. So we ran the news story and provided a URL for the source web pages.

Now Carnegie-Mellon researchers have also released a report into real-world disk drive failures. It was, in point of fact, referenced in the Google paper. The C-M report was turned into a PDF easily and can be downloaded from Techworld here.

The C-M report said Mean Time To Failure figures are poor predictors of disk drive failure rates as well, also that whether the drives were SATA, SCSI or Fibre Channel didn't affect failure rates either.

I had another go at the Google paper and turned it after a morning of Word work into a readable, reasonably well laid-out PDF file. You can download it from here.

So SMART diagnostics, MTTF figures, temperature, and whether the disk is a SATA, SCSI or FC drive; none of these are good predictors of failure rates. These two white papers are the only high-standard, user or academic (meaning supplier-independent) studies into disk drive failures in very large populations of drives. They are, in my opinion, absolutely essential reading for anyone concerned with disk drive maintenance. Download them both, read thoroughly, and reconsider your disk failure protection arrangements in light of them.

RAID 6 might seem like a much better idea after reading them.