Jai Menon is IBM's research labs' authority on the design and architecture of data storage systems. He joined IBM Research in San Jose, California, where he became a pioneering researcher and designer of data storage systems and RAID architectures. He was named functional manager, CS Storage Systems, in April 2000 and, in May 2001, was named IBM Fellow, the company's most prestigious and highest technical honour.
He was in town recently on a flying visit so we took the opportunity to throw a few questions at him. This was his response:
Q: What are the major problems facing storage right now?
A: Imagine if Dickens had written his articles on a computer we wouldn't be reading it now. There are two problems:
- The archival properties of the media that you store stuff on. That problem is relatively easy to deal with, as you can go in and read the data and move it.
- Even if you can read the bits, can you make any sense of them? Say you create something with MS Word and it's 200 years later, and there is no MS - this is possible!
That's the problem we've been tackling. There's a joint project with us and the Dutch National Library and they're very interested in talking with us on this problem. They've funded some of our efforts.
Q: How do you solve that problem?
A: The answer is that you store the document and a program as part of it, which can run on a very simple computer which we call the universal virtual computer. You can describe this computer in English. So the only thing that has to last hundreds of years is English which we think is pretty safe.
There's a lot of interest from libraries and from healthcare - you want to store your own health records but you also might want to know what happened to your parents. It's also true of issues such as pensions.
We started working on it years ago but the people who were interested didn't have much money - such as libraries. The project has been going for about 6-7 years in small stages.
The Dutch have funded some of our efforts and have actually written the decoding programs in the UVC language.
Q: Would it still use boring old von Neumann architectures?
A: I suppose it might be quantum by then.
Q: We've heard of the modular server and storage package that IBM is researching called IceCube. What's the latest?
A: The idea behind this collection of intelligent bricks is that if I look at a SAN or storage system there are a lot of different things I need to know about: ports and storage controllers and switches and host bus adapters…and if a disk or cable has broken you have to change that.
But you could replace all these with the brick. You don't get to replace the individual parts. The only thing you have to worry about, as the sysadmin, is the brick itself. They're like cells in the body and we put them together in a 3D structure which is why we call it the IceCube.
A brick talks to its neighbour brick using capacitive coupling - so no cables - over very short distances. We can get very high data rates of 10Gbit/s to all six of its neighbours. The idea is that you throw in 10 per cent extra space to allow for failure and for four years no-one has to come and touch the system. The only time you have to worry about it is when you want to add more bricks.
Q: How much storage are we talking about?
A: A 3x3x3 assembly can hold 32TB, enough to store all the documents in the US Library of Congress - it's about 1TB per cube.
Q: How do you get power to them?
A: With a three per cent failure rate we estimate it can keep running for four years. Cooling involves using clean cooling water in a radiator effect. We're packing everything in three dimensions but you don't need space for access into the middle of things.
'Fail in place' is a key concept, it's more than hot-pluggable. Right now, with hot plug systems you have to replace items pretty quickly. If you've got a component out, the extra load falls on the rest and if that fails, you're gone. We don't design them to keep going for four years.
But with the Cube, if a brick in the middle of the pile fails, it'll still be OK for the next four years. If you take bad ones out, the rest keep working if there are enough bricks.
Q: Is it going to become a product?
A: Let me outline the ultimate vision and you tell me who wouldn't buy it. It's a storage project that talks to servers but it can also become the server so its all in one place. Server talks to server, server talks to storage and storage talks to storage, all over capacitive coupling and that's your data centre right there. No wires, no mess.
Q: Where's the logic hardware?
A: You might need routing and computers to talk to others - like a communications server, yes.
Q: When will it be a product?
A: It will take a long time, in the order of four to five years - maybe two or three generations of storage density. It's always hard to predict when these things will transfer to product. There's a lot of interest and it's very exciting.
Q: What's new at Almedan?
A: More mundane things - disks have not gotten more reliable but they have got more capacious. Seven or eight years ago I wrote a paper that said that if a customer had a petabytes of data, RAID5 may not be good enough because there would be too many double failures too often. Then it seemed crazy to suggest that a customer might have that much data right? Well here we are 10 years later and it's not so far-fetched. Some already do.
If you do the math, you'll find that there's a TB disk, there's a one per cent chance that you'll not be able to read all the sectors. Example - in a RAID system with eight TB disks, a disk fails. You have your first failure, you got to be able to read eight times 1TB so there's an eight per cent chance you're not going to be able to do that. That's pretty high - it means that if a disk fails there's an eight per cent chance you're not going to be able to reconstruct the data.
You're going to lose data if you get a failure. It means you're going to have to go to schemes that protect against two or three disk failures. We're working on techniques that can do that but only has the overhead of mirroring.
Q: Is it software or hardware?
A: It's a software algorithm - you might choose to have a hardware assist but it's not required. In the near time it's one of those things we're going to have to deal with.
Q: Solid state disk has been a Holy Grail for at least 20 years. What's IBM doing in this area?
A: IBM has research in alternatives to disk drives. One such technology is Millipede. IBM demonstrated 1 TB/sq in using Millipede technology in 2002, allowing for storing 25 DVDs in the size of a postage stamp. Millipede is a MEMS technology, and makes atomic sized indentations on the surface of a polymer.
We also have research in what we call storage class memory - that's non-volatile memory using DRAM-like techniques - but at a much lower cost point. There's also other research in this area in the industry.
We should see technologies that are 10x cheaper than DRAM by 2010, which will still be 10x more expensive than disk but much cheaper than DRAM. In another five years, by 2015, we may see technologies that could potentially replace disk drives, which are much faster and more reliable than disk, because they are solid-state, but close to disk in price.
Q: What's your view on iSCSI and where it is going?
A: IBM just announced a new iSCSI product the DS300 which is meant for low-end xSeries and BladeCenter environments. We believe that iSCSI has a place in these kind of environments due to its low-cost and built-in Ethernet on the processor motherboard. It's complementary to Fibre Channel which is appropriate in data centre SANs.
Q: Tape format futures - is there anything going on here of interest or is disk the future?
A: IBM has demonstrated a 1 TB tape cartridge over a year ago and continues to work on more capacity per cartridge. Tape has a lot of headroom to grow, its areal density is 100 times lower than that of disk.
While disk will run into ever more challenging issues, tape, being a factor of 100 less dense, has a lot more room to improve capacity and lower cost without running into major technical issues. We see nothing replacing tape in archive-type applications for a long time.
More about Jai here.