Network Testing

Valid HTML 4.01!

My first attempts at monitoring the network just consisted of a script with a block of code for each device with the result of the ping test being written into a variable and when all the tests had finished, a web page was squirted out, hopefully. It used to crash frequently, and must have consumed huge chunks of memory with all the variables it generated. Also because it only produced the web page when all sites had been tested, if a few were slow responding or timed out, the browser could also time out before it had any html sent to it. Because the ip addresses were hard coded in, it was difficult to maintain.
All these shortcomings have been addressed and is quite reliable in service. Because data is written back to the config file, the name of the config file has been changed to match the version of .pl file using it. The config file uses commas as delimiters, the trailing comma is important, do not miss it out. Why use a writable config file and not a database? Well I have thought about the pros and cons. A database would undoubtably be better for storing lots more data, either historical data or more hosts, but would require another package on the monitoring server, (at present it just uses net-snmp). But also if the database gets corrupt, it is a lot harder to restore integrity, a flat file can be restored using a text editor. Also I have very limited experience of databases. So a future version may use a database, probably for no better reason than learning how to program one.

How it works:- (screenshot in new window)

  • Timetest checks if we are in an SLA period (ours is Monday to Friday, 09.00 to 17:30, Bank holidays etc are ignored for now), the subroutine uses "localtime time" to get the current time, and splits it. Each part is tested to check if it is a valid day and between the sla hours. This is used later to control wether or not the results are written out for SLA monitoring.
  • The Ping sub just calls the system's ping, you may have to alter the path to it for your os. If the ipaddress requested contains an "#", it is assumed to be commented out and a not tested is returned. The command tells ping to send one ping with a length of 900 bytes and the -w2 limits a timeout of 2 seconds before it fails. Not all pings support -w. The result is then taken apart to extract the round trip time. This is a messy bit of code and may require hacking depending on the format of the reply your ping gives. Depending on the outcome, a colour is assigned, red for no reply, green for a reply under a threshold, and yellow for a reply but longer than a threshold time. If the ping is sucessful, the host is then poked by the snmp routine to extract it's uptime. (not to be confused with the next sub.....)
  • sub Uptime gets the monitoring servers own uptime, not the host being tested's uptime. This is chopped up into $days and $hours.
  • sub SNMPget uses the perl module Net::SNMP to get the host being tested's uptime, you may have to alter the community string to match your network. This is one of the messyiest bits of code, the error trapping does not really work, but because this is only called if there has been a sucessful ping, the error trapping would only be called if a ping suceeded and an SNMP probe failed. Unlikely, but it has happened. This really needs sorting, but I haven't worked out perl objects well enough to fix it yet.
  • In the MAIN: section, my $columncount = "4" ; controls the number of html columns output in the results table, 4 seems to be about right.
  • The result of the sla sub is used to show the colour green for in hours, or blue for out of hours. Also $validsla is used later to control the logging output.
  • In the head section of the html, a <META HTTP-EQUIV="refresh" CONTENT="60"> tag causes the browser to automatically refresh every 60 seconds.
  • Some javascript is output next to give a pretty scrolling line in the bottom of the web browser.
  • Just to check if the broswer or pc has crashed (a special feature for m$ windows users), the time of last refresh is displayed next to the sla hours field. Your ip address is just an environment variable from Apache.
  • Failed results are recorded and the next bit opens a new browser window to display them. see later.
  • If I can take the trouble to write W3C compliant html in a perl cgi script, I would like you to view it in a standards compliant browser, I was tempted to put <form><input type crash></form> to try and destroy IE, but decided against it. :-(
  • New in and maybe only of limited use depending on your network hardware is a temperature display. This is based in the environmental monitor built into Cisco's 7507 router. SNMPget is used to retrieve the six values corresponding to the oid's for the temp and their description. Two levels of threshold are set, the cell background goes yellow for the first one to be reached, and red as the second threshold is broached.
  • Next start to read in the config script to get the ip address to test, but also check if it exists. This file is read in to an array and the ipaddress split out. No attempt is made to check for commented out lines here, this is handled by the ping subroutine. Also the result of the previous ping is split out, and incremented if appropriate. If the ping fails or is slow, the result is datestamped and written out to an error log. Currently this log is not rolled over, so it has the capacity to get HUGE!.
  • After the ping subroutine has established that the target is up, the SNMPget subroutine is called to find out the uptime.
  • For the logging to the config file, the number of sent vs the number of received pings is calculated as a percentage to two decimal places. The array to write back is then assembled, and finally an html row is is piped by the web server back to the browser. A link to is included, when followed this shows the contents of a text file with info about the equipment at the site and details such as the leased line circuit and contact phone numbers etc.
  • After the config file has been completely processed, the HTML table and page are closed.
  • If we are in SLA time, the result array is written back to the config file, if we are outside time the script just quits without recording results.

  • is a simple piece of perl which uses the tail command to output n lines. n is passed to as $ENV{'QUERY_STRING'} and is first stripped of any non numeric characters with $lines =~ s/[a-z,A-Z,\/,\|,\,,\@]//g; This input validation prevents tail from barfing if it is fed alphabetic input. The /g tells s/// to match all instances. Without this s/// would just strip out the 1st match. The log files are already loaded with html tags, so no extra formatting is required, the $result is just squirted out. The html page creates will refresh every 60s.