Bottlenecks. The arch-enemy of the database admin, devouring CPUs everywhere. For me, there isn’t anything more frustrating than a system bottleneck. That speaks volumes about my rather geeky passion. Sure, poor overall performance can be a pain, but bottlenecks are extra frustrating, as they mean you could be getting much more done if one piece of your system wasn’t slowing everything else down.

Diagnosing bottlenecks is in itself a hard science. To add insult to injury: even when you have located the problem it is often difficult, if not impossible, to remove it. Many times, overcoming one bottleneck only leads to a new bottleneck somewhere down the line.

The only true way to eliminate bottlenecks is to create a completely balanced system - one where all pieces work in perfect harmony. This is why many IT administrators spend countless hours trying to fine-tune their infrastructure. While this process is indeed time consuming, it is a necessary step to getting the most out of your data centre. But nowadays, removing bottlenecks means looking at much more than just hardware and infrastructure.

Coded Bottlenecks

While hardware was once the only bottleneck systems faced, this is not the case anymore. Today’s servers are ultra fast, so many admins and developers are beginning to encounter bottlenecks that are no longer related to hardware. Bottlenecks which show up in software; inside the code of the application. With servers now powered by terabytes of flash memory, lightning-fast multi-core processors and DRAM in the hundreds of GB, old disk-era code left over in applications is presenting itself as a bottleneck everywhere. While fast systems can make crappy code run faster, we’re encountering a reality, where the application itself becomes the bottleneck, increasing latency and making scale non-linear.

Overcoming code-level bottlenecks requires a completely different approach to system architecture - in software, hardware, and how the two integrate. Rethinking design on this level can be somewhat daunting, as you may be venturing out of your comfort zone. Fortunately, it’s surprisingly easy as diagnosis of software bottlenecks is a highly tooled and well-understood subject area; the knowledge required is readily available, yet often ignored. The resulting speeds of optimising software, as well as the reduced latency and throughput your system can achieve, makes it worth the effort - especially if you’re in need of serious performance.

Putting Applications into Top Gear

This quandary is what ultimately created one of my favorite cars. When most think of the Porsche, they envision the rear-engined 911. This iconic beauty is also one of the oldest sports coupe models to ever be produced - with more than 800,000 911s being sold in the last 50 years. While the 911 is the fastest factory Porsche available, it isn’t the easiest to drive because of its rear engine.

However, in the racing world, speed is often what matters - so the complexity is acceptable. It also means there is a steep learning curve and significant investment needed to properly drive a 911. Similarly, in our data centres, significant investment, and application tuning are usually needed to achieve maximum performance and lowest latency. Unlike racing, complexity is not optimal in a data centre.

The mid-engined Boxster is slower than the 911, but what it lacks in speed, it makes up for with pure driving enjoyment and ease of use. In data centres, this is like purchasing an easy to implement solution that might not be the fastest, but it is easy and simple to run.

In our world it’s often difficult, and many times impossible, to combine top performance with balance and ease - however it is possible for the Porsche. Ruf Automobile is a manufacturer of high performance, tuned Porsches. My favorite vehicle in their line-up is the Ruf 3800 S - which is essentially a Boxster with a 911 engine.

The 3800 S takes a 911 engine and fits it into the mid-engined chassis of the Boxster. The result is an easy to drive sportscar that can go 0 to 100 km/h in 4.6 seconds (now that’s low latency!) and achieve a maximum speed of 295 km/h - faster than all but the top line 911 models.

I see the future of data centres leaning towards solutions like the Ruf 3800 S: among the fastest on the market, the right components in the right places and extremely fast, yet easy to implement with little tweaking and training required. The hybrid of powerful engines, combined with easy handling will make us all looks like good drivers.

Taking a New Look at Old Ways

To combine powerful vehicle acceleration with balance requires a new approach to manufacturing and customisation. Similarly, for servers, combining powerful application acceleration with balance requires rethinking server and software design. Through simplicity in software and hardware, applications can process data faster than ever before and decrease latency. To achieve this balance, we have to put the experience of the driver - pardon the pun - in the front seat.

In our data centres, simplicity is becoming a requirement for both hardware and software. It’s the only way to manage data at scale. In hardware, our devices are connecting more directly to our system bus - built to perfectly balance the hardware, yet modular in design. This is why most manufactures of flash devices are now connecting through the PCIe, which is the most direct path to deliver data to the CPU.

Speed Through Superior Software

Perhaps more impressive, but not yet as common, is optimising these flash memory devices with custom built APIs to make the more convenient to use.. For years, companies have been devoting massive amounts of resources to optimise applications for certain pieces of hardware. Today, leading vendors are doing all that hard work for you by releasing software development kits (SDKs) with APIs that provide pre-tuned modifications to your existing apps. This helps bridge the gap between performance and simplicity, like Ruf 3800 S.

A common performance improvement unleashed by open source APIs allows flash devices to be treated as a new memory tier - fast and flexible like DRAM, but with high capacities like disk. By doing this, data centres can double the life of their flash by cutting out old disk protocols and greatly accelerating applications. In addition, this dramatically lowers latency, delivering real world performance improvements.

With these SDKs, taking advantage of performance improvements has never been easier. The tuning has been done and is prepackaged. By simply replacing portions of code, programmers of all sizes can achieve massive performance improvements by incorporating these open source SDKs.

No One Left to Blame But the Programmer

For decades, it was far easier for companies to throw more servers, more CPU, more DRAM and more disk at bottlenecks, rather than ask programmers to develop better code. Buying hardware has been seen as path of least resistance while recruiting and retaining expert human talent is often much more difficult and risky. This paradigm is rapidly changing, as many developers are now realising that new application architectures can help them achieve new performance breakthroughs.

Once we overcome these software bottlenecks through rethinking how we program, our applications will be faster than ever. Not only faster - but easier to run. Through these new APIs, companies can improve speeds by more than 10 times. When combined with equally as powerful hardware upgrades, one can easily see how applications will soon deliver both high performance and an enjoyable driving experience, unmatched by anything we have seen before.

Posted by to Thomas Kejser, EMEA CTO at Fusion-io