Follow Us

We use cookies to provide you with a better experience. If you continue to use this site, we'll assume you're happy with this. Alternatively, click here to find out how to manage these cookies

hide cookie message

Blogs

Monitoring the pulse of IT

Opsview Development Team

Next generation distributed monitoring, the Opsview way

Article comments

Next generation distributed monitoring, the Opsview way

One of Opsview’s great features is distributed monitoring, which we’ve had for over 5 years now. From the web user interface, you can assign hosts to a slave system and Opsview will take care of all the configuration work for you: from the slave configuration files, to the slave results sent to the master, to the master configuration with freshness checking.

We do all the system integration work, so you don’t have to.

However, there are some limitations in our chosen technologies. We use NSCA, which is the most common method in the Nagios world, and while we’ve made improvements to it that have gone back upstream, there are some baked-in limitations:

  • Only the first 511 bytes of plugin out was returned to the master, limiting the usefulness of the information you could display
  • Only the 1st line of data was returned, meaning you had to cramp output together
  • NSCA communication used fixed size packets which were inefficient
  • While results were sent, Nagios would wait for completion, introducing a bottleneck
  • If there was a communication problem with the master, results were dropped

Sometimes to move forward, you have to leave the past behind.

So we did that - we ripped out NSCA from Opsview’s slave communications and we’ve addressed every one of these limitations - and added a few nice extras too!

We’ve chosen NRD (Nagios Result Distributor) as our core technology. This is a library, written by one of our partners, CAPSiDE. There are many reasons we chose this, but the top four are:

  • It is based on perl, which is our language of choice
  • It has taken the test suite we developed for NSCA and enhanced it, demonstrating a mature approach to code development
  • The client and server code is a thin shim over the libraries, which means you can easily create your own clients
  • We have a good relationship with CAPSiDE and they have given us access to their code repository

We’ve spent some time understanding the core NRD code, enhancing it, fixing some issues and adding in some great new features. CAPSiDE have also released it on CPAN for wider consumption.

So Opsview’s new process for sending results from a slave is:

A couple of other amazing features we’ve squeezed in:

  • A known Nagios limitation is the named pipe to submit results. We’ve overcome this by writing directly to the checkresults spool directory - this reduces a Nagios processing cycle on the Opsview master
  • We’ve implemented transactions in the results, so if the client has a failure communicating to the server, the client will back off and retry again in 5 seconds. This guarantees you do not have duplicated results
  • The nrd daemon on the master will dynamically add more servers as workload increases, thanks to the features of Net::Server
  • As all communication between master and slaves is over a tunnelled SSH session, we’ve updated our Opsview check scripts to restart these tunnels if the slave is exhibiting communication errors

With all this extra capabilities, you would think there is a cost in performance. But in fact, our testing shows that performance has got better!

(Based on sending 2016 results in a single transaction over an SSH tunnel from a slave to a master. Times measured on the client.)

This shows that we are getting an average 62% improvement in all aspects of slave communication back to the master!

We are thrilled we’ve added this major new functionality into Opsview and have taken distributed monitoring another huge step further over any of our competitors.

But the best thing is: this is available immediately with our Opsview Community 3.11 release. Install the VM, add a slave and you will get this new architecture setup as part of the process. And if you are an existing Opsview user, you get a silky smooth switch-over. We’ve done a lot of testing to ensure that as part of the upgrade, Opsview will automatically switch any slaves to this architecture and start sending results in the new NRD way.


Share:

More from Techworld

More relevant IT news

Comments

Send to a friend

Email this article to a friend or colleague:

PLEASE NOTE: Your name is used only to let the recipient know who sent the story, and in case of transmission error. Both your name and the recipient's name and address will not be used for any other purpose.

Techworld White Papers

Choose – and Choose Wisely – the Right MSP for Your SMB

End users need a technology partner that provides transparency, enables productivity, delivers...

Download Whitepaper

10 Effective Habits of Indispensable IT Departments

It’s no secret that responsibilities are growing while budgets continue to shrink. Download this...

Download Whitepaper

Gartner Magic Quadrant for Enterprise Information Archiving

Enterprise information archiving is contributing to organisational needs for e-discovery and...

Download Whitepaper

Advancing the state of virtualised backups

Dell Software’s vRanger is a veteran of the virtualisation specific backup market. It was the...

Download Whitepaper

Techworld UK - Technology - Business

Innovation, productivity, agility and profit

Watch this on demand webinar which explores IT innovation, managed print services and business agility.

Techworld Mobile Site

Access Techworld's content on the move

Get the latest news, product reviews and downloads on your mobile device with Techworld's mobile site.

Find out more...

From Wow to How : Making mobile and cloud work for you

On demand Biztech Briefing - Learn how to effectively deliver mobile work styles and cloud services together.

Watch now...

Site Map

* *