The great thing about open source software is the unexpected contributions you may suddenly receive.
Jonathan Kamens of Advent Software sent a patch into the nagios-devel mailing list about a speed up to the status CGIs. He identified, using gprof, that status.cgi was taking the most time in the sorting routine of downtimes and comments.
His patch was simple and effective - instead of trying to sort the order of the objects every time, just do it once at the end.
We’ve been looking at the patch and produced some statistics of the old behaviour versus the new behaviour. The times can be found from running the t/660status-downtimes-comments.t perl script within the Nagios code.
|Comments and downtimes||Nagios 3.2.0||With patch|
|100000||1680 (= 28mins - and it didn't finish!)||53|
For 100000 comments, the status.cgi was taking 28 minutes just reading the current status data - and in the test it didn’t actually finish! With Jonathan’s patch, this reduced down to 53 seconds.
There’s probably still some work to do as an increase of 10 from 10000 to 100000 increases the execution time by a factor of 53, but this is a good start!
There seem to be some work in looking at alternative ways of getting status data, including NDOutils database backend, or mklivestatus to get the information directly from the Nagios daemon.
For Opsview, we’re betting on the database backend, because there are other advantages (arbitrarily complex search criteria, historical information, separation of work from Nagios instance), although we like to provide the old style Nagios CGIs as well, so we’ll be adding this patch to Opsview. Funnily enough, the Nagios users get the patch earlier than the Opsview users do! But that’s all part of working with open source software and improving the tools for everyone.