Originally intended as a replacement for the PCI bus within PCs and servers, Infiniband was not terribly successful in that role. However, it has since begun to win acceptance in a new role, this time as a cluster interconnect for high-performance computing.

So far, its has mainly been adopted by technical and academic users, says Joan Wrabetz, VP of marketing at InfiniCon, which develops hardware and software for Infiniband networking, and has just released its third generation of software. Now she wants to reach businesses, more and more of whom are finding uses for clusters, in areas such as automotive design, financial analysis and data visualisation.

"Last year, it finally started to take off for high performance clusters," Wrabetz says. "People were using Gig Ethernet or proprietary networks, such as Myricom and Quadrix - Infiniband provides the same or better performance at lower cost."

She cites the example of the Institut Français du Pétrole (IFP), which has an Infiniband-interconnected Linux cluster comprising some 200 machines. These are a mixture of AMD Opteron, Intel Xeon and Intel Itanium processors, and the cluster has around 300 research and development users.

To DMA or not to DMA?
A key difference is that Infiniband and the proprietary networks are RDMA, while Ethernet is not (although there are projects to enable RDMA over TCP/IP). RDMA means data can go directly from system memory to system memory without buffering or going through the operating system.

Selling clustering to commercial users instead of academics means focusing on slightly different things. "It means we put a high emphasis on scaling, with 24 to 288 port switches and support for 10,000 node clusters," Wrabetz says.

"Second, we focus on reliability - labs might put up with stability issues but businesses won't. Then it's ease of management, especially for large networks, so you need tools to propagate software or find failed cables, say."

Wrabetz highlights two other important developments, namely virtual I/O and server virtualisation. InfiiniCon and others are adding support for high-level I/O protocols such as Ethernet to run over Infiniband, so each cluster node only needs one adapter and cable - you then gateway to Ethernet at the hub or back-end of the cluster.

Add a virtual SAN
She says virtual I/O may be even more important when it come to storage: "The problem is as clusters grow, having a Fibre Channel adapter and switch port for each node gets prohibitive. You can designate I/O nodes and only give them Fibre Channel, but they become bottlenecks.

"So you give them virtual Fibre Channel and the traffic goes through Infiniband. We're seeing more people pushing Fibre Channel away from the cluster and either going to a Fibre Channel gateway or direct to Infiniband storage."

The next step for InfiniCon will be server virtualisation software, according to Wrabetz; in particular, this means the ability to boot a node over the network, tools to load the operating system remotely, and so on.

All of this is finally making it worthwhile for system builders to include Infiniband as standard, she says, at least in products aimed at the high performance computing market. That should make clusters cheaper, but it also brings new challenges for companies such as InfiniCon.

"We're seeing vendors eliminate the host adapter and put Infiniband on the motherboard," she says. "It saves a big percentage of the cost - Appro and IBM have done it so far, others are on the way. So our host software needs to run in different host environments." This was one of the additions InfiniCon made in its version 3.0 software, she adds.

She acknowledges that, at five years old, it's still early days for Infiniband - interoperability is limited, for example - but claims that it is not yet a major issue.

"The reality is nobody builds heterogeneous Infiniband fabrics today - they're complicated enough already," she says. "People do test to the standard, the reality is the products are not quite interoperable, but we've been able to do it when needed. It's a lot better than Fibre Channel was at the same stage of its development. The protocol is very comprehensive, but not everyone has implemented the optional parts."