Welcome to docs.opsview.com
Frequently Asked Questions
Installation and Initial Configuration
I'm having trouble with the Opsview installation - can I get any help?
If you have issues installing, sign up to the mailing lists and send an email to opsview-users. Try to put in as much information as you can, especially:
- Which platform?
- How are you installing?
- What errors are you getting? Please include any relevant logs and output.
This list is frequently viewed by our engineers, and others in the community may help.
We want everyone to be able to install Opsview, so we'll do our best to fix any issues and get your system up and running.
Why do I keep getting asked to authenticate?
Please check that the time is correct on the Opsview server, as an incorrect system time can cause session cookies to expire.
It is strongly recommended to use NTP as time synchronisation is critical to accurate monitoring. If you are using a virtual machine, check the virtual machine host information for time synchronisation as it may conflict with NTP.
See also the next question.
Why is the time in the UI different to that on the system
The date/time on the server is correct, but parts of the UI show an incorrect time. This can be because the user that started opsview-web has the TZ environment variable reset. To fix this as the nagios user run
unset TZ opsview-web restart
and then investigate where the TZ variable may have been incorrectly set.
Why do I get "Error retrieving update from Opsview. Will continue to retry"?
If there is an error getting an AJAX refresh, the main content area will show this message. This could be due to network or the Opsview Web application not responding.
Your browser will continue to poll for status updates and will refresh accordingly when it gets a successful response.
Why can't I login using the Atom feed?
Are you an LDAP user? If so, this is a limitation in using LDAP.
How do I change top level of Hostgroup Hierarchy?
By default, the name of the top level hostgroup is “Opsview”. This can be changed in the web UI. On the Hostgroup Hierarchy configuration page, click on the = sign and a box will appear at the bottom where you can change the name.
Why do I get error "The Opsview Web Server is not running or is not responding to requests!"?
When connecting to Opsview, web browser is giving following error:
Opsview Error The Opsview Web Server is not running or is not responding to requests!
This can be resolved by starting Opsview Web server:
/etc/init.d/opsview-web start
How do I change MySQL passwords post installation?
There will be a short outage to Opsview while the passwords are changed.
1. Stop opsview and opsview-web daemons 2. Edit passwords in opsview.conf 3. Change passwords in MySQL
Note: Passwords should avoid characters such as @, ! and $ due to handling by perl or by the shell. You can see how the variables would be calculated by running /usr/local/nagios/bin/opsview.sh.
mysql -p -u <root user>
USE mysql;
UPDATE user SET password=PASSWORD("newpass") WHERE user="opsview";
UPDATE user SET password=PASSWORD("newpass") WHERE user="odw";
UPDATE user SET password=PASSWORD("newpass") WHERE user="nagios";
UPDATE user SET password=PASSWORD("newpass") WHERE user="reports";
FLUSH PRIVILEGES;
4. Run ''/etc/init.d/rc.opsview gen_config'' (which will regenerate the Opsview configuration files and then restart opsview daemons automatically) 5. Restart the opsview-web daemon
How do I change "admin" UI password post installation?
To change the admin user password via the Opsview GUI:
1. Navigate to the ''SideBar => Configuration => Contacts => Admin'' 2. Enter the new passwords and submit the page 3. Log out of the admin account (via ''SideBar => User => Logout'') and back in
How do I recover the "admin" UI password?
In the event the admin password has been lost, the following commands will reset it to the installation default.
# mysql -u root -p opsview Password: <not shown> mysql> update contacts set password='$apr1$SUR3Kcd8$CkJfpqvqy3r.6rzawNwCS.' where username='admin';
You should now be able to access the UI again using the username admin and the password initial. Remember to change it!
Why am I getting lots of errors in syslog?
Connections and Disconnections
If you get lots of these messages reported by syslog with Opsview 2.10.1 or newer
Dec 5 11:04:42 dev10 ndo2db: Successfully connected to MySQL database Dec 5 11:04:42 dev10 ndo2db: Successfully disconnected from MySQL database Dec 5 11:04:47 dev10 ndo2db: Successfully connected to MySQL database Dec 5 11:04:47 dev10 ndo2db: Successfully disconnected from MySQL database Dec 5 11:04:52 dev10 ndo2db: Successfully connected to MySQL database Dec 5 11:04:52 dev10 ndo2db: Successfully disconnected from MySQL database
you need to amend your syslog.conf file to lower the user reporting level from DEBUG to INFO, i.e.
user.info -/var/log/user.log
rather than
user.* -/var/log/user.log
If using rsyslog you can also set '$RepeatedMsgReduction on' which will take repeated messages and condense them to 'Last message repeated n times' - see http://www.rsyslog.com/doc-rsconf1_repeatedmsgreduction.html for more information.
Why does safari crash if I leave it refreshing on the statusmap.cgi screen?
This has been reported on Safari 2.0.4 (delivered with Mac OS X 10.4.10). It appears that the latest Webkit version of Safari (available at http://webkit.org/) does not have this problem.
The statusmap.cgi screen works fine on Firefox and IE6 and IE7.
Why does Nagios die after a reload signal is sent?
This has been seen on slave systems which are 64bit where the master is 32bit. Running nagios in the foreground and sending a -HUP signal gives this error:
libgcc_s.so.1 must be installed for pthread_cancel to work
You have to install the 32bit version of libgcc.
Only seen on Redhat systems.
I've manually changed the state of a passive service to OK, but it keeps changing back. Why?
This happens in a distributed environment where the state of a service is inconsistent between the master and the slave.
If the slave receives a passive check, it gets propagated to the master. If you then change the state of that service on the master, the slave thinks it is still in the old state, but the master thinks it is in the new manually chosen state.
If this service has renotification enabled, then the slave will send its state back to the master, causing the state to revert back.
The workaround is to either:
- manually change the state on the slave (which then propagates back to the master)
- or disable renotifications for this service
How do I change the TCP port opsview is expecting my slave is using to connect on?
This only applies when the slave is configured to initiate connections to the master.
To change port from 25807 to 25802 run the following SQL commands using the MySQL admin tool:
use opsview; update monitoringclusternodes set id=2 where id=7;
Why does the status bar at the bottom flicker when scrolling in IE6?
Unfortunately, this is a limitation in IE6. The only way to get a fixed position for the status bar in IE6 is using javascript. This realigns the status bar whenever the page is scrolled or resized. This is because IE6 does not support the CSS attribute
position:fixed
Newer browsers, such as IE7, Firefox and Safari, all support this attribute and should not flicker.
Why do some select boxes overwrite on top of the status bar in IE6?
This is a limitation in IE6.
Why isn't MRTG generating graphs for some devices?
Assuming that MRTG option is enabled for hosts in question.
As 'nagios' user, run the following commands:
/usr/local/nagios/bin/mrtgconfgen.pl full /usr/local/nagios/bin/mrtg_genstats.sh
These regenerate the configuration and run a test. If these are successful, then check the nagios user crontab contains a line similar to:
0,5,10,15,20,25,30,35,40,45,50,55 * * * * /usr/local/nagios/bin/mrtg_genstats.sh > /dev/null 2>&1
and that cron is running OK.
You can also check the log file for error
/usr/local/nagios/var/log/mrtg_genstats.log
How do I change Opsview to use seconds instead of minutes?
Amend the opsview.conf file to include the following configuration:
$nagios_interval_length_in_seconds = 1;
You then need to restart Opsview Web: /etc/init.d/opsview-web restart.
NOTE: all service check and host configuration pages will have to be reconfigured to change the units from minutes to seconds (i.e. take all the values and multiple by 60) for check interval, retry check interval, and notification interval.
SMTP configuration
Changing the sender address for email alerts
To change the SMTP sender address for your Opsview alerts:
For everything to the right of the @ symbol
Update '/etc/mailname' with your preferred host or domain name. Restart your SMTP MTA.
For the user name to the left of the @ symbol
- In Postfix you can adjust this using the 'sender_canonical_maps' configuration parameter
Send all messages to your company email system
Update the 'relayhost' parameter with the hostname of your SMTP MTA, eg:
relayhost = mail.opsview.org
MySQL Database
Use MySQL client to execute SQL statements
How do I backup up Opsview databases and configuration
- Edit etc/opsview.conf to set correct backup destination
su - nagios /usr/local/nagios/bin/rc.opsview backup
How do I restore from database backup?
- Identify the required image to restore from (location is held in
$backup_dirvariable within theopsview.conffile if using a full backup rather than database only).
su - nagios
gunzip -c {/path/to/opsview-db-{date}.sql.gz} | /usr/local/nagios/bin/db_opsview db_restore
I get a database upgrade error during an upgrade
If you get an error which says:
Upgrading Opsview part of Runtime database DB at version 2.7.0 DBD::mysql::db do failed: Table 'opsview_contact_services' already exists [for Statement "CREATE TABLE opsview_contact_services (
There was an error in the runtime nightly backup which missed out the opsview_database_version table. The upgrade script then tries to apply changes from Opsview 2.7 and 2.8 into the database. To fix, run this in mysql:
mysql -u{user} -p{password} runtime
mysql> update opsview_database_version set version='2.8.6'
And then run:
su - nagios /usr/local/nagios/installer/upgradedb_runtime.pl
From Opsview 3.3, the upgrade tasks associated with Opsview 2.7 and 2.8 will be removed.
How do I fix damaged database tables?
If the database is damaged, run the following commands:
mysqlcheck -p -u <user> <database>
To repair a table (from the MySQL client):
use <database name>; REPAIR TABLE <tablename>;
To check all databases you can use the following as the MySQL root user:
mysqlcheck -A -r -u root -p
A common cause of corrupted database files is that the system ran out of space on the /var partition.
Using Opsview
How do I change my password?
In the side navigation, select User → Preferences. From here, you can reset your password.
Depending on your access levels, some options may be dimmed.
Note: Versions prior to 3.0.2 did not allow changing of passwords for non-admin users due to a bug.
Why do I have "Host assumed UP - no results received" as the host output?
Hosts are checked on demand - only when a service comes in with a failure state will the host be checked.
If a service has failed, then the host will be checked and the host output will be changed to reflect this.
If you submit a result in the user interface for a service with a failed state, then the host will not be checked. This is due to a change we made to the core Nagios 2 engine - hosts were being checked for every failed passive result coming in which was causing problems on systems with snmp traps.
Opsview says that a host is down, but I can ping it - what is going on?
There are several possibilities:
- Is the check command for the host a ping? For example, if it is an ssh check (which you may want to set for firewall reasons) and ssh hasn't started, then as far as Opsview is concerned, the host is down
- As Opsview can only check from its perspective, perhaps a check from a different network location could give different results
- Because hosts are checked “on demand” (see the important concepts page), if all services fail to recover from a host failure, then the host is not checked again
For the last possibility, the recommended approach is to always have a check which is similar to the host check (usually TCP/IP), so that this regular service check will reveal when the host has recovered. This also means you can get performance data about this service (performance data is not available for host checks due to the irregularity of them).
Why do I have a stale result straight after I have submitted a result?
This could happen in a distributed environment where the time is not synchronised between master and slaves.
When you submit a result, there is a time associated with it. If the slave is ahead of time for the master, then the slave will mark the result back in time and then the slave will immediately mark it as stale.
How can I use the ''Nagios Checker'' Firefox plugin?
More information can be found here:
NMIS pages don't look right, as though they are missing a stylesheet
This is due to a line missing from the Apache configuration file - add the line
Alias /static/nmis/ "/usr/local/nagios/nmis/htdocs/"
to the Opsview configuration section and restart Apache.
Icons are missing from my Solaris master server GUI
This is due to a bug in the SunFreeware GD package used when creating the Opsview packages (which meant gd2 images were not created correctly). This can be fixed by ensuring SFWgd is installed (from the companion DVD) and then running the following as the nagios user:
cd /usr/local/nagios/share/images/logos/ for arg in $(echo *.png | sed 's/\.png//') do /opt/sfw/bin/pngtogd2 $arg.png $arg.gd2 0 1; done
The mass acknowledgements sometimes shows me more items than I had in the prior view
Mass acknowledgements filters the current view to only show you unhandled services and unhandled hosts. Due to the way the URLs are constructed, it was better to force showing all the unhandled hosts, otherwise from some views, you wouldn't get any items to acknowledge.
The Parent Tree link doesn't work
In Internet Explorer 7, the Parent Tree link just gives me a broken image and an error saying:
The page you are viewing uses Java. More information on Java support is available from the Microsoft website.
This means you need to install Java on your machine. Goto http://java.com and install Java.
The Parent Tree diagram overwrites the contents of the web page underneath
The Parent Tree functionality uses the Hypergraph Java applet, but Java applets do not respect the z-index and thus overwrite all other HTML content. This is a limitation in Java applets. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4858528 for the bug report sent to Sun.
How long after a failure can I expect to receive a notification?
Nagios has a concept of SOFT and HARD states and notifications are only sent on hard states. See the important concepts guide for more information.
How can I work out what plugin arguments were actually used?
Unfortunately, there is no UI method of finding this, but you can query the database to get the information:
- Connect to the runtime database (use the connection information in opsview.conf)
- Get the service object id:
select * from nagios_objects where name1='hostname' and name2='servicename'
- The service object id is the value in the object_id column
- Get the last 5 results:
select state,output,command_line from nagios_servicechecks where service_object_id={object_id} order by start_time desc limit 5
- The command line column has the full command executed by Nagios
This is a rendering bug in IE6, but IE6 is no longer supported from Opsview 3.5. We recommend upgrading to IE 7 or IE 8, or using Firefox.
On Safari 4, I get layout errors after I've selected some options on some pages, such as the contacts edit page
This is a bug in Webkit, which is used by Safari 4, Google Chrome and other browsers.
We have raised this with the Webkit team:
We also are tracking this internally in our jira system.
Administering Opsview
I get access denied and I cannot see any of the /admin pages
Since Opsview 3.1 introduced granular access controls, your role probably has had the ADMINACCESS access rights removed.
As access to the the user interface is not possible, the issue will need to be addressed via the database. Connect to MySQL as the opsview user (configured in opsview.conf) and run:
mysql> select
contacts.name as contact_name,
roles.id as roleid,
access.id as accessid,
access.name
from contacts,roles,roles_access,access
where accessid=access.id
and roleid=roles.id
and contacts.role=roles.id
and contacts.name='{contactname}';
Change the {contactname} as appropriate.
This should return output like:
+--------------+--------+----------+----------------+ | contact_name | roleid | accessid | name | +--------------+--------+----------+----------------+ | admin | 10 | 1 | VIEWALL | | admin | 10 | 3 | ACTIONALL | | admin | 10 | 6 | NOTIFYSOME | | admin | 10 | 7 | CONFIGUREHOSTS | | admin | 10 | 8 | RELOADACCESS | | admin | 10 | 9 | ADMINACCESS | +--------------+--------+----------+----------------+ 6 rows in set (0.00 sec)
If ADMINACCESS is missing, then it will need to be added to the role (this will take effect for all contacts using this role).
mysql> insert into roles_access values (10,9); Query OK, 1 row affected (0.03 sec)
Rerunning the above select statement should show the ADMINACCESS in the list now.
You can then login and see the audit logs to work out who made the change….
Another possibility that has been reported is that you have an incorrect cookie. Try deleting all cookies related to Opsview and re-login. Please let us know about this on the mailing lists if you get this problem as we are trying to track down the root cause.
Reloading gives me an error about nagvis.ini.php
When reloading, I get an error which says:
Can't close to nagvis.ini.php: Bad file descriptor at /usr/local/nagios/bin/nagconfgen.pl
Check the permissions of /usr/local/nagios/nagvis/etc/nagvis.ini.php. This should be owned by the nagios user and the nagcmd group.
Also ensure that your apache user is a member of the group nagcmd.
Nagios Plugins
What does this output message mean?
See our list of common plugin outputs.
Distributed Monitoring
On the master, it says "Next check: N/A" but I know this is a regularly scheduled check
This is because the Nagios CGIs on the master for that service are passive checks, not active checks. It will just display the results from the last check from the slave and has no idea of when the next scheduled check will be. This is a limitation in the Nagios CGIs.
Nagios is not running on one of my slaves, how do I fix?
The simple way of starting the Nagios daemon on a slave is via the Master server.
When logged in as the 'nagios' user on the Master server:
/usr/local/nagios/bin/rc.opsview gen_config
This will re-generate the configuration and (re)start the Nagios daemon on each slave server.
If this command fails, goto /admin/reload to see any error messages that may have been captured.
Why are there large numbers of service checks in 'UNKNOWN' state?
This occurs in a distributed environment if a slave has not sent results back to the master for longer than 30 minutes.
A slave server problem should be alerted via the check_opsview_slave plugin. But if it is not resolved, then services monitored by this slave will start to go into UNKNOWN states after 30 minutes (note, the host will not be set into an UNKNOWN state). You need to fix the connectivity issues.
Other things it could be:
- NSCA not running on the master
- Nagios command pipe not setup correctly
- The slave Nagios has very high latency - check nagios.log and nagiostats output on the slave for the Active Service Latency values
I get a "HostKeyVerification Failed." error - how do I fix it?
This is usually only seen in environments with reverse SSH tunnels either on the initial setup or when a slave has changed IP address.
In Opsview version 3.0 the fix is to run the following as nagios user
dosh -i -s <slave name>
In previous versions of Opsview use the following steps:
send2slaves -t
For each slave with an error, a line similar to the following will be returned:
ssh -o ConnectTimeout=10 -p <port> -o BatchMode=yes -o HostKeyAlias=<slave>-ssh localhost hostname
Rerun the command as nagios but change the BatchMode=yes to BatchMode=no.
SNMP
Why does my SNMPv3 host not have MRTG graphs generated?
This is a known limitation. Only hosts with SNMPv1 or v2 will have MRTG graphs generated.
How can I load extra MIB's into Opsview?
There are two directories on the Opsview master used for MIBs:
/usr/local/nagios/snmp/allto hold all MIBs/usr/local/nagios/snmp/loadto hold MIBs that are actually loaded by Net-SNMP
Put the MIB files in there and ensure that the files are readable by the nagios user.
As per the SNMP setup documentation, the /etc/snmp/snmpd.conf (or /etc/sma/snmp/snmp.conf on Solaris) should exist and at least contain the line:
mibdirs +/usr/local/nagios/snmp/load
Please check your configuration against our configuration notes if this file does not exist.
You will need to restart the Net-SNMP daemon to take effect.
Run /usr/local/nagios/bin/send2slaves -s to send the snmp/load directory to all slaves and restart SNMP.
My SNMP 5.3 system isn't accepting traps correctly
By default, SNMP version 5.3 and newer requires authentication to be set up - this configuration can be disabled by setting
disableAuthorization yes
in the /etc/snmp/snmptrapd.conf file.
Monitoring Agents
I can't find an agent package for my Unix OS / Architecture, what now?
If an installation package isn't already available it will be necessary to create one:
1) Download source code and pre-requisites
Download NRPE and Nagios Plugins packages from the Opsview source repository (opsview/trunk/opsview-base)
Details here: http://opsview.org/subversion
You can also obtain these directly from Nagios website.
2) Compile NRPE and Nagios Plugins
You will need at least GCC and OpenSSL software installed.
Steps for compiling software:
- Start console session (bash, ksh, csh, etc)
- Create 'nagios' user and group
- Unpack tarball
- Change into software directory (cd nrpe-2.2.5)
- Run command './configure' and check for any missing dependencies
- Run command 'make all' to build software
- Optionally run command 'make install' to install software in relevant locations
3) Assemble additional functionality
Add additional monitoring plugins from Opsview source repository and Nagios Exchange web site.
4) Update configuration
Update NRPE configuration files to suit your requirements (/usr/local/nagios/etc/nrpe.cfg).
5) Create packages (optional)
To deploy software to multiple systems we recommend we create a software package for your OS platform. For linux this may be using RPM or Debian packaging tools. Solaris and AIX include their own tools.
FAQ: Log messages
There are various locations for log messages, depending on the subsystem
Audit logs
Why do I keep getting "Username 'user' logged in via auth tkt" in my audit logs?
By default, the authentication ticket is set to expire after 1 day. If the time on your Opsview server is too slow, then the browser could expire the authentication ticket, though the session would still exist at the Opsview server.
Make sure the Opsview server has its time synchronised correctly.
/usr/local/nagios/var/nagios.log
These are the Nagios log files. These are rotated every week and the archives are put into /usr/local/nagios/var/archives.
This is medium volume.
/var/log/opsviewd.log
This is the main log destination for Opsview programs. It is possible to split the locations if you wish by changing the Log4perl.conf file in /usr/local/nagios/etc.
By default, this acts as a bucket for all log messages.
Invalid offset
[2008/04/04 03:11:45] [run_scheduled_reports] [FATAL] Died: Invalid offset: New_York
This appears to be a bug in DateTime: http://search.cpan.org/src/DROLSKY/DateTime-TimeZone-0.6902/Changes
It is harmless and will be fixed when we next update DateTime.
Import took > 5 seconds
[2008/11/10 18:16:54] [import_ndologsd] [WARN] Import of 1225937390.604976, size =768003, took 10.0321450233459 seconds > 5 seconds
This means the import file into the Runtime database took more than 5 seconds. As the imports are run mostly every 5 seconds, this means the database will fall behind with the status as reported by Nagios. However, this is not a problem over a reload as the initial load can take some time. But if you are getting this message frequently or you are getting it when the size value is small, then there needs to be investigation about why there is such a delay in the importing.
See the prerequisite information about Mysql for some recommendations.
/var/log/opsview-web.log
This holds logging information for the Opsview Web application.
You can change the configuration via /usr/local/opsview-web/etc/Log4perl.conf.
Low volume (by default).
For example, if you wanted logging for the Root.pm controller, you would add these lines:
log4perl.logger.Opsview.Web.Controller.Root=DEBUG, LOGFILE
You do not have to restart opsview-web. This takes effect within a few seconds.
syslog
Some of the Nagios components, NRPE, ndo2db and NSCA, will log to syslog. The destination depends on your syslog configuration.
Performance Data and Graphing
Where's the graphing icon? I'm sure there's performance data
You need to reload Opsview. When a new service is created, Opsview doesn't know if there is performance data until data is returned (the first reload activates the new check - wait for the first status update and then reload again). The 2nd reload will see the graph data files exist and then display the graph icon in the UI.
If you still don't get a performance graph icon, check whether the Nagios status screen for that service shows the graphing icon - the url is something like: http://opsviewserver/cgi-bin/extinfo.cgi?type=2&service=SSH&host=opsview
If it does, then the macro has been set correctly in the configuration. Check the runtime.opsview_host_services table for this service has the perfdata_available flag set:
select hostname, servicename, perfdata_available
from opsview_host_services
where hostname='{host}' and servicename = '{servicename}'
This flag controls whether to draw the graphing icon or not. This table is updated by ndoutils_configdumpend, which is run when the nagios configuration has been updated following a reload.
Another possibility is that the RRDs are being migrated from Opsview 2 style to Opsview 3 style. The graphing icon will only appear when the RRDs have migrated for a particular service. See RRD migration notes for more information about the migration process.
It has been seen in the migration process that some RRD files can be left behind because the migration process cannot work out what the host name or service name is. If you don't mind losing performance history, you can delete the existing RRD file and Opsview will automatically create a new one:
cd /usr/local/nagios/var/rrd ls *.rrd rm *.rrd
Graphing is not currently working in Opsview - how does it work?
There are 2 separate and distinct areas of graphing within Opsview;
Performance data returned by service checks
Any service check that returns valid and consistent performance data as per the guidelines will have graphs automatically created. The process is
- A new service check is added in and a reload performed to activate it
- As soon as the check is run against a server and performance data is returned the graphs are created
- The graph icons and links in the UI will be created on the next reload
The service check performance data is continually logged into nagios/var/perfdata.log and processed regularly to create the rrd files, which are stored in nagios/var/rrd. The log file for this process is in nagios/var/log/nagiosgraph.log.
Performance data isn't being generated or it isn't interpreted correctly
Some service checks do not provide performance data. Other checks do provide it but the data isn't interpreted correctly. In both cases a change to the provided map file is probably required /usr/local/nagios/etc/map - this file is used to read the service check output and (if a match is found) generates performance data for use in the service graphs (if no patch is found then performance data is used if it is in the service check output).
This file is a series of perl regular expressions which matches service check output. From Opsview 2.12.7 if you need to make a change to one of the entries you can create a map.local file (this will override the entries in the map and will not be overwritten on an upgrade).
Any changes to the map.local file should be followed by a
perl -c map.local
to ensure there are no grammatical errors in the file.
Changes to map.local will be used when processing the next set of performance data, which is every 30 seconds.
There is some more information about the map file and its format on the nagiosgraph web site.
Performance graph combines several pieces of data, how do I graph individually?
It is possible to generate graphs for specific performance data by appending an argument to the URL.
For graph based on check_http plugin, we get four pieces of data:
- Response time (time)
- Page size (size)
- Warning threshold (time_warn)
- Writical threshold (time_crit)
Standard URL will be similar to:
/cgi-bin/show.cgi?fixedscale=1&service=HTTP&host=www.opsview.org
Appending '&db=,time' will generate graph based on 'time' value, eg:
/cgi-bin/show.cgi?fixedscale=1&service=HTTP&host=www.opsview.org&db=,time
Parameters can be combined so to display response time and critical threshold on same graph simply append '&db=,time,time_crit'
Performance graphs show an arbitrary range
Some performace checks do not show minimum and maximum ranges and an arbitrary range is used.
To force the use of a specific range the following can be added to the URL
&rrdopts=%2Dl%200%20%2Du%2016384%20%2Dr
This adds the following rrd options to the command generating the graph:
- l 0 -u 16384 -r
where -l <num> provides the minimum range, -u <num> the maximum range and -r specifies the ranges are rigid (prevents autoscaling). See man rrdgraph for further options that are available.
MRTG graphs
Any host that has the 'MRTG Graphs' check-box ticked on the host configuration page will be scanned by MRTG for data every 5 mins - any data gathered is converted into a graph (stored in nagios/var/mrtg) and access to them provided by the Reporting → Network Traffic link in the sidenav. See also the Installation FAQ entry and the SNMP FAQ entry about MRTG graphing.
The MRTG configuration for all hosts will be rescanned at Opsview reload time if any host with MRTG has had some configuration change or if a host template with MRTG has been changed. You can force an MRTG rescan by running as the nagios user on the master server:
/usr/local/nagios/bin/mrtgconfgen.pl full
Note: If a user has view access for the host, then they can view the MRTG graphs for that host.
Note: MRTG uses the host address to identify the host, not the unique Nagios host identifier. If you have multiple Nagios hosts with the same host address, Opsview checks that a user has access to all the hosts with this host address.
Note: In a slave cluster, only the first node in the cluster will poll the device.
Note: It is not possible to disable specific interfaces for MRTG to monitor. You could use the Opsview host interfaces functionality to select specific interfaces.
Why doesn't graphing work on Solaris 10 sparc?
This seems to be due to a problem in the SunFreeware rrdtool package (the Intel version doesn't have this issue). Downloading, compiling and installing your own version should fix the issue.
How can I delete all the graphing history for a service?
Just remove the RRD file on the Opsview master server. In /usr/local/nagios/var/rrd, locate the appropriate RRD files. They take the format:
{hostname}_{servicename}_*.rrd
The names are urlencoded, so for example may contain %20 to signify a space.
After you have removed the file, the next performance data that arrives will automatically create a new RRD.
Why are some of my RRD files appear to be losing history / getting deleted?
There is a housekeeping job, /usr/local/nagios/bin/opsview_cronjobs, which deletes an RRD files which have not been modified within the audit log retentions number of days. As RRD files are updated with new results, this should only be for RRD files which belong to old hosts or services.
It has been seen on a Debian Lenny system which was not completely upgraded (the kernel remained as the Etch version), that RRD Tool updated the rrd files but did not change the modification time. Thus the RRD files were deleted by the housekeeping and the graphs were then recreated. The fix for this scenario was to upgrade the kernel appropriately.
Opsview Development FAQs
How do I troubleshoot Nagios CGIs?
To simulate a Nagios CGI, switch to Nagios user on Opsview master:
su - nagios cd /usr/local/nagios/sbin REQUEST_METHOD=GET REMOTE_USER=admin QUERY_STRING="host=all&createimage" ./statusmap.cgi > /tmp/cgi.out
This will run the CGI as if the web server had served the request.
Management URLs
Handling ssh requests with putty in Windows
For ssh: protocols in Windows, it is not so straight forward to setup. The problem is that the usual way for the protocol to be passed to the application in Windows is to send to the putty command with the parameter of ssh://10.11.12.13, but putty does not recognise this and rejects with an error of:
Unable to open connection to ssh Host does not exist
This is because putty is trying to resolve the name ssh:10.11.12.13, rather than trying to access 10.11.12.13.
Strangely, this works okay for telnet, because putty has code to strip off the telnet: part of the URL.
There are three options available (that we are aware of):
- UrlConf will make the necessary registry setting changes. Requires .Net 1.1
- CustomURL - similar to UrlConf. Requires .Net 2.0
- Putup - Uses a putty: handler, instead of an ssh one. You will need to change the management URL to putty://machine=$HOSTADDRESS$. Be aware that:
- putup installs its own version of putty. You will need to edit the registry settings if you want it to use a different copy of putty
- This means there is a dependency that the browser user is using windows, so you probably want to setup an ssh://$HOSTADDRESS$ too for non-Windows users
It seems to us that the most obvious fix is for putty to be able to handle a parameter of ssh://address and strip off the ssh: part, as it already does this for telnet:
Upgrading issues
I have duplicated services for hosts
If you have upgraded to Opsview 3.3.2 and hosts are showing duplicate services with different check times, this is due to a bug in the upgrade script.
The problem is due to an attempt to fix an NDOutils bug where nagios instance ids could be created multiple times. This affects Nagvis, amongst other issues. However, the upgrade script in Opsview 3.3.2 fails to take into account when the nagios_instance id is set to a number other than 1.
This only affects a small number of users.
To fix, you have to disable information from the Runtime database about the other instances. The immediate fix is:
mysql> use runtime mysql> select * from nagios_instances;
This should return only 1 row. If you have more than 1 row here, this is an unsupported configuration (you have more than just Opsview pointing to this runtime database).
The instance_id will be > 1. The instructions below assume this value is set to 3 - please change appropriately.
mysql> update nagios_objects set is_active=0 where instance_id != 3; mysql> delete from nagios_programstatus where instance_id != 3;
This disables all the objects for the other instance_ids and removes some metadata about the instance connections. Now reload Opsview and the additional services will disappear.
Differences to Nagios
Service Groups
In Opsview, a service group is a group of service checks, unassociated to any hosts.
In Nagios, a service group is a list of services (associated to a host). The closest concept in Opsview is a keyword, which consists of a group of services though the use of tagging.
Opsview does not generate any configuration for a Nagios service group.
Trace: » upgradingperlmodules » rpm_opsview » deb_agent » quickstart » model » nagios » dbconnections » contributing » developmentserver » faq