The field of network management encompasses a plethora of concepts, since it deals with all aspects of monitoring and fault-finding on the networks that underlie corporate systems. An interesting issue for any network manager is that it's very hard to draw the line between network management and the areas that surround it – server management, dealing with problems on the cabling infrastructure, application performance management and so on.
For the moment, though, we'll look at what one might call "pure" network management – proactively monitoring the corporate network in order to identify issues as soon as they occur, or preferably to predict possible problems before they become reality by observing trends.
There are two standard protocols for network management and monitoring. The most important is the Simple Network Management Protocol (SNMP), which defines a set of instructions that can be used to view and change the settings on any SNMP-compliant device. The other is RMON, or the Remote Monitoring protocol, which bolts on to SNMP and allows remote collection of network information
We'll do RMON first, as it's the quicker of the two to describe. If you have a distributed network whose traffic levels you wish to monitor, you probably don't want to have a single central workstation interrogating all of the switches at all of your sites over WAN links – not only will it be hideously slow unless you have fast links, but it'll also have an impact on the loading of the WAN.
The answer is to place an RMON "probe" (or more than one, if the network is extensive) at each of your sites, and allow it to collate the performance information of all of the SNMP devices on the LAN; instead of interrogating potentially dozens of switches and routers from the central office, you instead interrogate just your handful of RMON probes, thus reducing the impact on the network and the time you're spending waiting for data to download from remote sites.
SNMP is a prehistoric protocol family, but its usefulness is that Noah ran it on his IBM abacus and so it's supported even by older hardware. Its drawbacks are that its security mechanisms suck and the data transfers you need to do to get stuff in and out are complex and long-winded, but this isn't a huge issue in these days of Gigabit networks and high-speed processors.
SNMP is a standard that is continually added to as technologies develop. The accepted "standard" features are contained within the Management Information Base (MIB) definition, but part of this definition allows for manufacturers to define (and tag on) their own proprietary extensions to the standard MIB – which they do when, for instance, they invent a new technology which, by definition, isn't covered by the SNMP standard. As technologies become common, they are gradually absorbed into the standard MIB, and the evolution continues.
There are three main facets to SNMP: viewing settings, changing settings and sending "traps".
SNMP "get" The "get" commands of SNMP allow an application to pull information from a device. This could be the general configuration information (the link speed for each Token Ring port, for example, or the IP address of a router port) or it could be statistical information that accumulates during normal operation – packet counts, error counts and such like.
SNMP "set" The "set" commands allow the network manager to write settings to a device – port speeds, IP addresses, VLAN definitions and so on.
SNMP "trap" The majority of SNMP connections go from the management console (usually a Windows or X-Windows application of some kind) into a network device. The "trap" goes the opposite way, since it is SNMP's way of allowing a network device to alert a management station that there is a problem. So one might use a "set" command to tell a switch that you want to be alerted when you see more than 70% loading on an Ethernet segment for more than 10 seconds, and it'll use a "trap" to tell your management console in the event that this happens.
SNMP in its own right has limited value – although you can use it to configure a device and receive performance and alerting information from it, the value is not in the raw data but what you can do with it. So to make the best use of SNMP, you'll need a network management program that collates all the information and presents you with a usable user interface that lets you do things like selecting multiple interfaces and applying a single operation to all items in the selection.
Unfortunately, there are limitations to SNMP that restrict what you can do with a generic management program. For instance, although you could manage two different makes of 96-port switch through the same general application, the application probably couldn't figure out which ports are located on which card in the switch chassis – this is vendor-specific information that isn't available in SNMP.
There are two options, then: either use the vendor's own management package (which will be inherently aware of the structure of the chassis) or you can use a third-party package like HP OpenView which can accept vendor-specific plug-ins that fill the gap between the generic aspects of SNMP and the proprietary architecture of the devices you're managing. If you're fortunate enough to have the budget required to standardise on a single vendor, the vendor-specific package may well be the best choice, as it may well be customised toward proprietary management interfaces that the manufacturer has devised to sit alongside the SNMP ones; if you have a heterogeneous network, the generic-with-plugins approach is the one to take.
The extent to which you can monitor a network is defined entirely by the capabilities of the devices that drive that network. Although most routers include some kind of Simple Network Management Protocol (SNMP) support, it's common to find that companies have purchased the non-SMTP version of their switches, which makes it impossible to keep an eye on what volumes of traffic are traversing the LAN.
It makes sense, then, to consider what management facilities you're likely to need when you buy your devices. As switches are a commodity these days, the cost is largely insignificant, and for medium- to high-end switches, the majority come with SNMP thrown in anyway as the extra cost to the vendor is minimal.
The physical layer
SNMP is mostly concerned with what's going on at ISO layers 2 and 3, but although the cabling infrastructure of the organisation goes largely unchanged, it can develop faults. Manufacturers of high-end switches and routers such as Cisco have started to include cable-level diagnostics into their switches' port connectors, which can alert the network manager to wiring faults as they develop; if you're not a high-end installation, though, it makes sense to have some stand-alone cable diagnostic facilities to hand.
Extending above the network layers
We've looked into the management of the network itself, but this is only part of the story. Because the network's existence is due entirely to the fact that it's required to run the business' applications, there is an increasing trend to include an understanding of how the actual applications relate to the network. Application traffic analysers, although traditionally regarded as a tool for developers of network applications, are becoming increasingly significant to the network manager because more and more business applications are becoming either Web Services or client-server applications – with an increased reliance on the network over the desktop applications of the past.
By mapping the services an application uses onto the network, it is becoming increasingly possible to map from the network to the application and back again. For instance, if an application is experiencing a problem and that application is known to utilise a specific collection of servers, switches, routers and links, this makes the network manager's fault-finding task simpler. The reverse is also true: if the network manager observes an issue with a network device or segment, he understands what business applications this will influence.
The final thing we should touch on in an introduction to network management is the concept of an "agent" – a software application that resides on a device on the network and which is interrogated by a central management console from time to time. The SNMP modules in network devices are effectively "agents", since they reside in the devices and are contacted by the central console; for systems that don't support SNMP, you have the option of installing an "agent" on those systems. A typical example of this is server agents, which collate information such as CPU usage and network interface traffic – just like SNMP agents but generally using proprietary protocols to communicate with their central management devices.
Network management can be as simple or as complex as you wish to make it. Many small installations get by simply by watching the lights on the front of the switches – after all, you can spot basic problems such as heavily-loaded links or broadcast storms just by watching the green flashes. As the network's complexity increases, though, so the usefulness of proper monitoring and management tools increases, and the justification for spending money on such tools becomes simpler.