This blog post outlines the steps involved in setting up Basic Parenting in Opsview.
Before successfully deploying parenting it's important to understand how it works:
- Parenting is based on host states
- It determines if a host is DOWN or UNREACHABLE
- When assigning multiple parents all parents need to be DOWN for the host to be UNREACHABLE
- Parenting can not be circular
Determining a host stateOne additional point that has to be made is that host states are determined using layer 3 IP information (for instance by running a check_icmp -H 18.104.22.168) as described by the OSI model.
This is important as modern network technologies like VLANs, VPNs, HSRP and VRRP are used to make a network more robust and fault-tolerant, unfortunately they also “hide” information needed to setup parenting.
Later on we will discuss these technologies and their impact on parenting.
Circular parent-child relationshipsA circular parent-child relationship is formed when we define hostA to have deviceX as it’s parent and define deviceX to have hostA as its parent. The parenting logic will detect this and generate an error as a circular parent-child relationship should not exist (actually they do exist and we will discuss them in another blog post).
Types of parentingIn this article we will be looking at Basic Parenting, focusing on layer 3 hops and Single Point Of Failures (SPOFs) and how we can use them for parenting.
Note that while Basic Parenting is relatively easy to implement and will greatly help in detecting network failures, it will not be as precise as when you implement Advanced Parenting (which will be covered in a later blog post).
Network technologiesTo be able to setup parenting we need to understand some of the network technologies mentioned earlier.
Below is a simple explanation of the technologies and the parts they play in parenting, plenty of more detailed information on these can be found on the internet.
We also look at one of the tools at our disposal to help determine the hops we cross to reach any given host in our network.
TracerouteTraceroute can be used to determine the (layer3) hops between any two hosts on a network.
When using traceroute it’s best to run it from the Opsview slave/master (this will give a consistent result). When you use slave clusters you should consider how you cluster them (globally or locally) so all the slaves in the cluster have the same path to a given host.
Later on we will discuss how to use the results from a traceroute to create a host-parent table.
A VLAN (or Virtual LAN) is used to create logical ip-subnets spanning various network devices. So if Host A and Host B are in the same VLAN, any physical hop between them becomes transparent and a traceroute between them will show only one hop (the destination).
Again this is a simplistic representation of VLAN’s and should hold true in most situations (for instance there are extensions to VLANs like private VLANs etc etc).
VPN’sVPN’s are commonly used to connect remote servers to a local network (over the Internet for example). However from a network perspective the host is no longer at a distant location but directly connected to the network (so any hops on the Internet become transparent).
VPN’s will be discussed in the Advanced Parenting blog post so we can go into this in more detail.
HSRP/VRRPHSRP and VRRP are redundancy technologies used to make a network more robust on layer 3. The most common deployment is used to make a network gateway redundant.
In those cases the IP address you configure on your host as the gateway is shared between two multi-layer switches (or routers) so when one fails the other takes over and traffic keeps flowing in your network.
(Note that generally a gateway is only used when traffic is destined for another network).
For example traffic from 192.168.1.200/24 to 192.168.1.100/24 will stay on the same network (with a VLAN it might cross multiple devices but these are transparent).
Traffic from 192.168.1.200/24 to 192.168.21.100/24 will cross the gateway as the source and destination are on two separate networks.
Below is an example of a modern network. In this example we have host A and host C which are connected only once to our network and host B which is multi-homed (for example running bonding or nic-teaming).
All the switches (except switchE) and the core switches are cross-connected to provide redundancy and fault-tolerance in our network (full-meshed using VLAN’s and HSRP/VRRP).
If all our hosts reside on the same ip-subnet (or VLAN) our network will be completely transparent and our traceroute will show only one hop (the destination host).
opsview-slave nagios $ traceroute hostA traceroute to hostA (192.168.1.2), 30 hops max, 40 byte packets 1 hostA (192.168.1.2) 1.594 ms 0.590 ms 0.362 ms
Assuming our Opsview host is on a different ip-subnet our network will have one (or more) additional hops in our traceroute.
opsview-slave nagios $ traceroute hostA traceroute to hostA (192.168.1.2), 30 hops max, 40 byte packets 1 coreA (192.168.1.1) 0.747 ms 0.582 ms 0.518 ms 2 hostA (192.168.1.2) 1.594 ms 0.590 ms 0.362 ms
When setting up Basic Parenting we first need to determine our SPOFs and hops.
Note that a SPOF is a single point of failure so any host/device which has a single connection to our network has to be considered to have a SPOF.
Also note that Host B and Opsview are redundantly connected to our network and don’t have a SPOF (in Advanced Parenting we will be covering multi-homed hosts).
Setting up Basic Parenting
Step 1:Review your network.
First off, review your network (if needed ask your network admins to help you out) and make traceroutes to each host so we can create a host-parent table.
After the review we have found that Core (consisting of a HSRP ip-gateway running on either Core A or Core B) is a hop and switchC, switchD and switchE are SPOFs.
Note that Core is considered a HOP and not a SPOF (although it looks like a single point of failure) this is because it uses HSRP or VRRP for redundancy over two nodes (Core A and Core B).
Using this information we can create our host-parent table. In this table we use various pieces of information to determine a given hosts parent:
- Looking at Host A we know it is connected to SPOF switchC which makes switchC the parent of Host A (Id. 1)
- From our traceroute we know we only traverse Core when going from Opsview to Host A making Opsview the parent of Core (Id. 4)
- From our network admin we know switchC is connected to Core (Id. 5)
|1||Host A||switchC||See id 5|
|2||Host B||Core||See id 4|
|3||Host C||switchE||See id 7|
|5||switchC||Core||See id 4|
|6||switchD||Core||See id 4|
|7||switchE||Core||See id 6|
Please note that Host B is connected to switchC and switchD (which we will be monitoring as they are SPOFs) and so it is possible to configure Host B with two parents (switchC and switchD) instead of using Core. In Advanced parenting we will be looking at multiple parents, for now we configure Host B as having just the Core as parent. Here is our network but now with only our SPOFs and hops and all transparent devices removed.
Step 2:Create your SPOFs and Hops in Opsview
(ask your network admin for the host-addresses).
Start by adding the devices which have Opsview as it’s parent (Core in our case) then add the devices which have Core as their parent. This way you can immediately configure the devices’ parent based on our host-parent table. Make sure you assign at least one service-check to the devices.
Step 3:Edit your hosts and add the parent based on the host-parent table.
Step 4:Check your notification settings for DOWN and UNREACHABLE notifications.
Step 5:Reload your Opsview. If you accidently created a circular parent-child relationship a reload will fail with the following error:.
Checking for circular paths between hosts... Error: The host 'CoreA' is part of a circular parent/child chain! Error: The host 'HostA' is part of a circular parent/child chain! Checking for circular host and service dependencies... Checking global event handlers... Checking obsessive compulsive processor commands... Checking misc settings...
If this happens review your configuration and verify it against your host-parent table. Look out for the post on Advanced Parenting where we will be building further on our basic parenting setup, including:
- Slave clusters and parenting
- Multi-homed hosts and multiple parents
- Circular parent child relationships
- Common pitfalls for parenting
Download Opsview Community edition or try out a demo of Opsview Enterprise edition.