monitor your network with parenting in Opsview
Parenting is one of the many powerful network monitoring tools that can be deployed in Opsview. However parenting is quite often misunderstood or deployed incorrectly causing the intended goal (less notifications when detecting network failures) to be missed and flooding mailboxes with undesirable e-mail alerts.

This blog post outlines the steps involved in setting up Basic Parenting in Opsview.


Before successfully deploying parenting it's important to understand how it works:

  1. Parenting is based on host states
  2. It determines if a host is DOWN or UNREACHABLE
  3. When assigning multiple parents all parents need to be DOWN for the host to be UNREACHABLE
  4. Parenting can not be circular
Please note that UNREACHABLE notifications are on by default so check the settings in your notification profile.

Determining a host state

One additional point that has to be made is that host states are determined using layer 3 IP information (for instance by running a check_icmp -H 1.2.3.4) as described by the OSI model.

This is important as modern network technologies like VLANs, VPNs, HSRP and VRRP are used to make a network more robust and fault-tolerant, unfortunately they also “hide” information needed to setup parenting.

Later on we will discuss these technologies and their impact on parenting.

Circular parent-child relationships

A circular parent-child relationship is formed when we define hostA to have deviceX as it’s parent and define deviceX to have hostA as its parent. The parenting logic will detect this and generate an error as a circular parent-child relationship should not exist (actually they do exist and we will discuss them in another blog post).

Types of parenting

In this article we will be looking at Basic Parenting, focusing on layer 3 hops and Single Point Of Failures (SPOFs) and how we can use them for parenting.

Note that while Basic Parenting is relatively easy to implement and will greatly help in detecting network failures, it will not be as precise as when you implement Advanced Parenting (which will be covered in a later blog post).

Network technologies

network technologies
To be able to setup parenting we need to understand some of the network technologies mentioned earlier.

Below is a simple explanation of the technologies and the parts they play in parenting, plenty of more detailed information on these can be found on the internet.

We also look at one of the tools at our disposal to help determine the hops we cross to reach any given host in our network.

Traceroute

Traceroute can be used to determine the (layer3) hops between any two hosts on a network.

When using traceroute it’s best to run it from the Opsview slave/master (this will give a consistent result). When you use slave clusters you should consider how you cluster them (globally or locally) so all the slaves in the cluster have the same path to a given host.

Later on we will discuss how to use the results from a traceroute to create a host-parent table.


VLANs

A VLAN (or Virtual LAN) is used to create logical ip-subnets spanning various network devices. So if Host A and Host B are in the same VLAN, any physical hop between them becomes transparent and a traceroute between them will show only one hop (the destination).

Again this is a simplistic representation of VLAN’s and should hold true in most situations (for instance there are extensions to VLANs like private VLANs etc etc).

VPN’s

VPN’s are commonly used to connect remote servers to a local network (over the Internet for example). However from a network perspective the host is no longer at a distant location but directly connected to the network (so any hops on the Internet become transparent).

VPN’s will be discussed in the Advanced Parenting blog post so we can go into this in more detail.

HSRP/VRRP

HSRP and VRRP are redundancy technologies used to make a network more robust on layer 3. The most common deployment is used to make a network gateway redundant.

In those cases the IP address you configure on your host as the gateway is shared between two multi-layer switches (or routers) so when one fails the other takes over and traffic keeps flowing in your network.

(Note that generally a gateway is only used when traffic is destined for another network).

For example traffic from 192.168.1.200/24 to 192.168.1.100/24 will stay on the same network (with a VLAN it might cross multiple devices but these are transparent).

Traffic from 192.168.1.200/24 to 192.168.21.100/24 will cross the gateway as the source and destination are on two separate networks.

Below is an example of a modern network.
complex-network1.PNG
In this example we have host A and host C which are connected only once to our network and host B which is multi-homed (for example running bonding or nic-teaming).

All the switches (except switchE) and the core switches are cross-connected to provide redundancy and fault-tolerance in our network (full-meshed using VLAN’s and HSRP/VRRP).

If all our hosts reside on the same ip-subnet (or VLAN) our network will be completely transparent and our traceroute will show only one hop (the destination host).

opsview-slave nagios $ traceroute hostA traceroute to hostA (192.168.1.2), 30 hops max, 40 byte packets 1 hostA (192.168.1.2) 1.594 ms 0.590 ms 0.362 ms

Assuming our Opsview host is on a different ip-subnet our network will have one (or more) additional hops in our traceroute.

opsview-slave nagios $ traceroute hostA
traceroute to hostA (192.168.1.2), 30 hops max, 40 byte packets
 1  coreA (192.168.1.1)  0.747 ms  0.582 ms  0.518 ms
 2  hostA (192.168.1.2)  1.594 ms  0.590 ms  0.362 ms


Setting up Basic Parenting

When setting up Basic Parenting we first need to determine our SPOFs and hops. Note that a SPOF is a single point of failure so any host/device which has a single connection to our network has to be considered to have a SPOF. Also note that Host B and Opsview are redundantly connected to our network and don’t have a SPOF (in Advanced Parenting we will be covering multi-homed hosts).

Step 1:

Review your network.

First off, review your network (if needed ask your network admins to help you out) and make traceroutes to each host so we can create a host-parent table.

After the review we have found that Core (consisting of a HSRP ip-gateway running on either Core A or Core B) is a hop and switchC, switchD and switchE are SPOFs.

Note that Core is considered a HOP and not a SPOF (although it looks like a single point of failure) this is because it uses HSRP or VRRP for redundancy over two nodes (Core A and Core B).

Using this information we can create our host-parent table. In this table we use various pieces of information to determine a given hosts parent:
  1. Looking at Host A we know it is connected to SPOF switchC which makes switchC the parent of Host A (Id. 1)
  2. From our traceroute we know we only traverse Core when going from Opsview to Host A making Opsview the parent of Core (Id. 4)
  3. From our network admin we know switchC is connected to Core (Id. 5)
This gives us an entire path from Opsview through Core through switchC to Host A. Host-parent table example for our network.

Id Host Parent Note
1 Host A switchC See id 5
2 Host B Core See id 4
3 Host C switchE See id 7
4 Core Opsview
5 switchC Core See id 4
6 switchD Core See id 4
7 switchE Core See id 6

Please note that Host B is connected to switchC and switchD (which we will be monitoring as they are SPOFs) and so it is possible to configure Host B with two parents (switchC and switchD) instead of using Core. In Advanced parenting we will be looking at multiple parents, for now we configure Host B as having just the Core as parent. Here is our network but now with only our SPOFs and hops and all transparent devices removed.
parenting SPOF HOP.PNG

Step 2:

Create your SPOFs and Hops in Opsview

(ask your network admin for the host-addresses).

Start by adding the devices which have Opsview as it’s parent (Core in our case) then add the devices which have Core as their parent. This way you can immediately configure the devices’ parent based on our host-parent table. Make sure you assign at least one service-check to the devices.

Step 3:

Edit your hosts and add the parent based on the host-parent table.

edit host parent.PNG

Step 4:

Check your notification settings for DOWN and UNREACHABLE notifications.
notification profile DOWN UNREACHABLE.PNG

Step 5:

Reload your Opsview. If you accidently created a circular parent-child relationship a reload will fail with the following error:.

Checking for circular paths between hosts...

Error: The host 'CoreA' is part of a circular parent/child chain!
Error: The host 'HostA' is part of a circular parent/child chain!
Checking for circular host and service dependencies...
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

If this happens review your configuration and verify it against your host-parent table. Look out for the post on Advanced Parenting where we will be building further on our basic parenting setup, including:

  1. Slave clusters and parenting
  2. Multi-homed hosts and multiple parents
  3. Circular parent child relationships
  4. Common pitfalls for parenting
  5. VPNs

About the Author

Alan Wijntje is responsible for maintaining and improving all forms of monitoring at Ziggo, one of the leading Managed Service Providers in the Netherlands. An Opsview expert and open source enthusiast, Alan enjoys finding, designing and implementing new and innovative ways of monitoring complex systems and applications.

Legal Disclaimer

This blog post is contributed by a member of the Opsview community. The Opsview project and Opsera Ltd accept no responsibility for the accuracy of its content and are not liable for any direct or indirect damages caused by its use.

Download Opsview Community edition or try out a demo of Opsview Enterprise edition.