Server processors are failing to deliver the promised gains in performance, according to the head of technical operations at Facebook.
The social networking company is constantly trying to upgrade its infrastructure to keep up with growth in users and data, while trying to minimise power consumption to save money, said Jonathan Heiliger, vice president of technical operations.
"The biggest thing (that) surprised us is ... less-than-anticipated performance gains from new microarchitectures - so, new CPUs from guys like Intel and AMD. The performance gains they're touting in the press, we're not seeing in our applications," Heiliger said. "And we're, literally in real time right now, trying to figure out why that is."
The hardware industry has also fallen short when it comes to delivering very power-efficient servers to carry out a limited set of functions for companies such as Facebook and Amazon, Heiliger said. He had some harsh words for server OEMs .
"You guys don't get it," Heiliger said. "To build servers for companies like Facebook, and Amazon, and other people who are operating fairly homogeneous applications, the servers have to be cheap, and they have to be super power-efficient." That means more than just an efficient power supply, but a whole system down to the processor, he said. Google has done a great job designing and building its own servers for this kind of use, Heiliger added.
Facebook is still working with server makers on this and doesn't know why they continue to fail, Heiliger said. He hopes to see co-operation among organizations deploying large computing clusters to develop a set of common standards that vendors can design for.
Heiliger had one piece of advice for anyone building an infrastructure to handle large-scale Internet-based services.
"There's a pretty simple answer for scaling infrastructure. It's, 'Don't be cheap,'" Heiliger said. He added that Facebook drove hard bargains with its hardware and software infrastructure suppliers, and is careful not to overbuy.
The best way to scale up a system is to look at application, software and hardware infrastructure, pick one to focus on, and add to that first. Facebook focused on application infrastructure and upgrades the other two to keep up with that, he said.
Testing is another key to the operational success of Facebook, which has more than 200 million users and frequently introduces new features. Heiliger said the launch of Facebook's personalised usernames earlier this month went smoothly, despite an explosive response when it first went live, because of extensive testing of the new feature.
It took about two months to roll out the new feature, from concept to availability, he said. When the personalised usernames became available, users claimed 1 million names in the first hour without slowing down the service as a whole, he said.