In the build up to the release of a series of benchmarking tests for network monitoring / management applications based on Open Source technologies, Tom Callway speaks to Tarus Balog, the CEO of the OpenNMS Group and current maintainer of the OpenNMS open source network management project.

1. What is OpenNMS?

"OpenNMS is the world's first enterprise-grade network management application platform developed under the open source model."

What the heck does that mean?

It was registered on Sourceforge in March of 2000, although it was started in 1999, so it has been around for awhile - long enough to qualify as "world's first" when you add that ...

It was designed from the beginning to be "enterprise-grade" which means it can manage tens of thousands if not hundreds of thousands of network elements. We have commercial support clients who are using one instance of OpenNMS to collect on over 50,000 discreet devices, another collecting data from over 120,000 interfaces (this results in 1.2 million data points every five minutes) and still another with 200 devices, each with 32,000 interfaces per device.

OpenNMS was also built to be a network management application platform. Users can use OpenNMS as a framework on which to build a unique management solution. While OpenNMS does a lot out of the box, it really shines when it is customised for a given network.

2. Why was it so important to release OpenNMS as a free, open source project? What advantages does it bring as opposed to using an opencore business model?

Many people involved with OpenNMS have been doing some form of network management for decades. Those of us with a lot of experience are actually a pretty small group, and we were all frustrated with the fact that commercial solutions weren't powerful or flexible enough for us to easily deploy them for clients (not to mention the licensing costs). We thought that if we had some common platform on which to build our solutions, we could more easily meet our clients' needs, and thus OpenNMS was born. The only way to make it grow quickly and be fair to all involved was to make it a pure open source project.

As for "open core" - that combination of open and commercial software - it goes directly against what we were trying to achieve. Open core vendors immediately put a line in the sand, saying "these features are 'enterprise' and you shall not have them', thus the community is forced either to reinvent the wheel or to fork the project, or both. We hope that they'll just use OpenNMS.

3. How is the project organised and how might somebody interested in contributing to the project get involved?

The two main places for OpenNMS involvement are the Wiki and the discussion lists.

While the project is backed by a commercial entity called The OpenNMS Group, the project is maintained by a group of about 20 people called The Order of the Green Polo. Membership is solely based on merit, and new members are voted in by existing members.

4. There are lots of competitive open source and proprietary network monitoring tools out there - what makes OpenNMS different?

The scalability and flexibility of OpenNMS make it attractive for carriers and large enterprises. If an organisation has a dedicated IT department and at least one person who focuses on management, they are probably a good candidate for using OpenNMS. In many of these companies we have replaced OpenView and Tivoli because those products simply could not handle the load.

In the open source marketplace, it seems that most vendors are focusing on the smaller organisation. Their products are great for users with a modest number of devices to monitor, but they break at scale. Some of the open core options in this space would like to say they compete with the large management vendors, but in fact they are more often competing with products like Solarwinds' Orion. Think about it: some open core vendors charge US$100+ per device per year to license their software. A modest OpenNMS client has 2000 devices. At those prices they would have to pay US$200,000 per year for the open core option, which over, say, a 5 year lifetime makes it more expensive then HP or IBM, thus they can't really play in our space.

5. Can you tell us more about OpenNMS' provisioning capabilities?

It's one thing to be able to manage a large number of devices, but if one has to spend hours and hours configuring the software to perform the management, then it isn't enterprise-grade.

OpenNMS has always had a a powerful automatic discovery feature. New devices can be easily added to the system and management "just works". However, we had one client, Swisscom Hospitality Services, with over 50,000 devices and a very dynamic network. There was no way automated discovery could have worked well with all of the adds, moves and changes they were doing. Since they have an internal database of their IT infrastructure, we had them export that into a specially formatted XML file, which OpenNMS could then import nightly. There was a key to relate the equipment in the Swisscom database to the device in the OpenNMS database and thus very complex changes (nodelabel, IP addresses, etc) could be easily handled.

The only downside was that it was all or nothing. One either used the automatic discovery or the model importer. In version 1.8 we will release a totally new provisioning system that combines the best of both worlds.

6. Can you expand more on the integration of RANCID and OpenNMS and why this is important?

A good portion of The OpenNMS Group revenue comes from custom development (100% of which is rolled back into the project). We had a client in Italy approach us to help them get rid of both OpenView and CiscoWorks. RANCID provides much of the functionality of CiscoWorks and being open source it allowed us to easily integrate it with OpenNMS. These efforts is expected to save the client over ‚¬1M per year.

7. Has the user experience improved with OpenJDK?

Not really. OpenJDK is nice in many ways, not the least of which is that those of us who are in open source and develop in Java are not seen as black sheep anymore (well, at least not as much). The problem is that OpenJDK is not a perfect port of the Sun JDK and since OpenNMS utilizes the Java VM to the hilt we have uncovered some bugs in OpenJDK. We reported them and they have been addressed, but I think it will be a year or two before the users will actually start to see the benefit of an improved user experience.

8. Can you elucidate the auto-discovery and other features that can make a user's life easier?

In many products one has to manually add the device and services to be monitored, i.e. is a Cisco Router, is a Windows server, etc.

With OpenNMS, any new device must be put into the configuration files, but then every instance of that device is automatically detected. For example, I am working with a client who uses 29West software. The 29West application has a really nice SNMP MIB, so I analyzed it and set up OpenNMS to collect on important metrics regarding the performance of the software. Now, anytime the 29West software is discovered on a device, that data collection automatically occurs.

The best part is that since OpenNMS is open source, the community contributes quite a few configurations and thus with each release it can manage more and more devices.

9. How does OpenNMS fare performance-wise?

As I mentioned above, at Swisscom we are monitoring 50,000+ devices. At New Edge Networks they have over 14,000 devices but a whopping total of 120,000+ interfaces. We are doing data collection on about 10 data points per interface every five minutes which is something that few products can do. At a telecom company in Italy we managing devices with 32,000 interfaces per device (many virtual, of course), which breaks OpenView.

Now this doesn't necessarily run on a Pentium III laptop with 512MB of RAM - each of these systems has hardware finely tuned to support OpenNMS - but hardware is cheap compared to the cost of comparable enterprise management software. Part of the services offered by the OpenNMS Group is advice on sizing the OpenNMS server.

10. Can you tell us about Device / Service and Device / Device dependencies?

The basic data model is that a physical device should correspond to a "node" within OpenNMS. This isn't always the case, as virtual machines can show up like they were "real" machines, but that is the basic idea. The node supports at least one IP interface, and on that interface will be services.

The service monitoring piece of OpenNMS can range from a simple ping or port check up through complex website navigation and mail transport round trip performance. For a telecom company in Honduras we are testing voice quality and SMS performance using distributed cell phones. If a service availability test fails to meet the configured criteria for proper operation, events are generated and outage records are created.

OpenNMS is smart enough to be able to correlate all services down on an interface to an interface outage, and all interfaces down on a node to a device outage. Using the Path Outage feature of OpenNMS, the topology of devices can be taken into account (so that a router going down does not result in 200 node down notices being sent).

In addition, OpenNMS has an alarm subsystem similar to IBM Tivoli's Netcool product. It allows for ad hoc correlations to be configured so that, say, "up" alarms are matched with "down" alarms and then cleared.

11. What's coming up in version 1.8? What are you most excited about?

There are lots of things in 1.8 that I'm excited about, but the two main features are the new provisioner and access control changes.

The provisioning system combines the best of the current automated discovery and the model importer. It gives the user the ability to finely control discovery, from what gets discovered, down to which non-IP interfaces are scheduled for data collection. It can be very permissive (discovering lots of services automatically) to very restricted (only monitoring particular, explicit services on these devices) or some combination of the two. In addition, provisioning changes can cause actions to be taken on external systems, such as changing the node label on a device can cause the DNS entry for that device to be dynamically updated. This compliments the OpenNMS Trouble Ticketing API as another area where two way interaction between the platform and other applications is being implemented.

There is also the addition of access control. Currently, any privileged user of OpenNMS can see all the devices on a particular instance. This doesn't work well in service provider environments where the client wants to give access to multiple clients but only to their particular resources. For our larger clients this has always been easy since there is a separate customer portal that just mines data from OpenNMS, but for others this will make their lives easier, especially in multi-tenancy environments.

My favorite feature in 1.8 is WMI data collection. Not because I like Windows (I had to go and buy a Windows license just to test the feature) but the fact that it was contributed by a community member, Matt Raykowski, who became a member of the Order of the Green Polo because of it. Contribution to OpenNMS ranges from simple typo correction all the way up to powerful features like WMI collection, and I'd bet few projects can claim as much on the higher end as we do. That's why I like it so much.

Many people give lip services to open source, but OpenNMS proves that a) open source can compete at the highest levels in the enterprise and b) that it can be done without resorting to commercial software business models.

Original post