CBL data recovery service

Any system manager with any sense protects his important data using RAID technology. RAID, in its many forms, provides you with an excellent degree of protection against hardware failures, and anyone who has ever been able to recover from a failed disk in a mirrored pair or a RAID 5 cluster will cheerfully admit how much grief they were saved by this now-commoditised technology. The thing is, though: what happens when something goes wrong?

RAID is designed to allow you to lose a disk in an array and will generally recover the data when you replace a faulty drive. If, however, something goes dramatically wrong and you lose more than one drive – perhaps through a controller fault, or perhaps something got over-hot and the card decided to flag it as faulty – you’re in trouble.

Of course, if you’re happy to drop back to last night’s backup, that’s no problem. But if your system was transactional and you absolutely can’t lose anything, you’re into the realms of phoning up a data recovery company and begging them to save your universe. CBL is such a company and we tried out their service. Although they can work with physical disks, they prefer not to because you never know when a faulty one’s going to give up completely.

So, step one is to make images of all of the disks in the array that are still running. The customer could do this, using one of the many inexpensive imaging tools on the market, or if you ship a box of disks to CBL they’ll do it for you. The clever bit of CBL’s service is the package they’ve written to handle the actual recovery process. By their own admission, it used to take anything up to a week to do a recovery job, because when presented with a pile of disks they have to figure out how the data’s structured.

Although there are RAID standards, there are many different implementations within these standards – the order in which blocks and parity bits are written, what order the disks were in, the block size in use, the parity algorithm used, and so on. Although much of this can be gleaned from the customer, it’s common to find someone who hasn’t a clue what model of RAID card was used or what the block size was.

This is where their neat software package comes in. It understands most of the common RAID cards, their algorithms and their default settings, and so if you know the card type you’re giving them a head start. If not, there’s still light at the end of the tunnel. We set up a trio of deliberately corrupted disk images, representing three out of four disks in a RAID 5 array (i.e. pretending the fourth had failed). Spread across the three disks were three partitions – one FAT16, one FAT32 and one NTFS. We gave them to the package, arranged them in the wrong order (i.e. although we knew disk 3 was actually disk 3, we pretended it was disk 1) and told the application to go and figure out what was going on. In a matter of 15-20 seconds, it had run through all of the possibilities of block size, ordering, algorithms and such like and plumped for one answer – the correct one, in fact.

There is a caveat here, though. This isn’t an application you’d use at home. Before we started the system on its number-crunching, we had to give it clues as to how it might know what was what. There are a number of tricks that data recovery chappies know, such as what byte sequences appear in master boot records, FAT headers, NTFS headers, duplicate directory tables and such like. So, before kicking off the search we gave it half a dozen clues as to what we were expecting to find. Typically, this would be stuff that the data recovery people had spent a bit of time scanning the images for before starting the rebuild. When I raised the question of how they’d work with an organisation that (for example) couldn’t allow its data outside for legal reasons, they said that they generally send the customer the software and then work with them over a remote-control (PCAnywhere or VNC) link to lead them through the recovery process.

Anyhow, once the system had decided what shape everything was in, it took only a few minutes to recover the 600MB of data that had been “lost”. It was actually recovered into three “virtual drives” that appeared just like normal hard disks in the “My Computer” folder, so we could browse to confirm that the data really was there and we could have farmed the data off to CD, DVD or tape for shipping back to the customer.

In short, we were very impressed with this home-grown piece of kit and the way CBL go about their recovery process. As it happened, one of the disks we were “pretending” was faulty actually did develop a fault and so it was more realistic than we expected. It demonstrated the usefulness of taking images, so you can protect against the disk you’re trying to recover dying completely. Although the software package is not intended for customer use, it made us realise just how convoluted the layout of the bits on RAID disks can be and what a job it must be to do this stuff manually. CBL claims that its lead time has come down from days to hours as a result, which can only be a good thing to the customer.