| 1 |
INTRODUCTION |
|---|
| 2 |
|
|---|
| 3 |
There are three pieces to installing this package, once you have all of |
|---|
| 4 |
the prequisites out of the way: Authentication, per-host configuration, |
|---|
| 5 |
and central configuration. All of this will make much more sense |
|---|
| 6 |
if you look at the architecture diagram at |
|---|
| 7 |
http://madstop.com/naginator/nagios.png. |
|---|
| 8 |
|
|---|
| 9 |
PREREQUISITES |
|---|
| 10 |
|
|---|
| 11 |
You need a Ruby (http://www.ruby-lang.org) interpreter and Nagios |
|---|
| 12 |
(http://www.nagios.org) installed on all of your hosts, along with |
|---|
| 13 |
all of the plugins that you want to execute. Cfengine |
|---|
| 14 |
(http://www.cfengine.org) is not a requirement, but you should be |
|---|
| 15 |
using it anyway, and it will certainly make this job easier. |
|---|
| 16 |
|
|---|
| 17 |
I also use my own packages 'enhost' and 'facter', currently only available via |
|---|
| 18 |
subversion (at http://madstop.com/svn/<package>). They should be documented and |
|---|
| 19 |
have a page in a week or so. Check this space. I only use them for key |
|---|
| 20 |
management, though, so you can easily survive without them. |
|---|
| 21 |
|
|---|
| 22 |
If you find any more prereqs, please let me know. |
|---|
| 23 |
|
|---|
| 24 |
NOTES |
|---|
| 25 |
|
|---|
| 26 |
Many things are currently hardcoded, and I'm not even aware of all of those |
|---|
| 27 |
things. Please notify me if you find something hardcoded that should not be. |
|---|
| 28 |
|
|---|
| 29 |
DEFAULTS |
|---|
| 30 |
|
|---|
| 31 |
Probably the only default that is not easily changed right now is the nagios |
|---|
| 32 |
user name: 'nagios'. Hopefully this will be fixed some day. |
|---|
| 33 |
|
|---|
| 34 |
The default configuration directory for Nagios is /etc/nagios; configurations |
|---|
| 35 |
are generated into that directory, and scripts that require configs default to |
|---|
| 36 |
looking there. Most scripts accept some kind of flag to change that, however. |
|---|
| 37 |
|
|---|
| 38 |
I use '/var/nagios' as ~nagios, but I almost always expand '~nagios' instead |
|---|
| 39 |
of using the full path, so as long as you have that set to the directory where |
|---|
| 40 |
you install the scripts and such, it should all work out okay. |
|---|
| 41 |
|
|---|
| 42 |
Also, because some of these scripts use libraries, you will need to either |
|---|
| 43 |
copy those libraries to a system default location (e.g., /usr/local/lib/ruby), |
|---|
| 44 |
or you'll need to add their location to LIBRUBY. Or, you can modify the $: << |
|---|
| 45 |
command I have in them, but that line will be removed when this project goes |
|---|
| 46 |
production. |
|---|
| 47 |
|
|---|
| 48 |
REASON FOR EXISTENCE |
|---|
| 49 |
|
|---|
| 50 |
A central Nagios process is great, because you can do host-alive and |
|---|
| 51 |
remote service checks on all of your hosts. Unfortunately, it is difficult |
|---|
| 52 |
to use this process to do checks that have to be done on each local host, |
|---|
| 53 |
such as disk usage and usage statistics. Sure, you can use NRPE, but there |
|---|
| 54 |
are reasons not to use this iterative method (we could argue those reasons, |
|---|
| 55 |
but suffice it to say I don't like the method). |
|---|
| 56 |
|
|---|
| 57 |
I prefer to have each host use NSCA to send the necessary information to |
|---|
| 58 |
the central server. Unfortunately, that means that 1) each machine needs |
|---|
| 59 |
to have its own configuration, and 2) each machine needs to run Nagios. |
|---|
| 60 |
|
|---|
| 61 |
Fortunately, I use cfengine, and I use it in such a way that it already |
|---|
| 62 |
knows quite a bit about my machines. One of the main reasons I want to |
|---|
| 63 |
use NSCA instead of NRPE is because NSCA and a local Nagios configuration |
|---|
| 64 |
allow me to reuse the information that my machines know about themselves |
|---|
| 65 |
from their cfengine configurations. For instance, in cfengine I might |
|---|
| 66 |
configure a host to be a mail server; if I use NRPE, I have to store the |
|---|
| 67 |
fact that that host is a mail server in both cfengine and in Nagios, |
|---|
| 68 |
but if I can somehow generate a configuration out of cfengine, basing |
|---|
| 69 |
my configuration on only what cfengine already knows, then I am only |
|---|
| 70 |
configuring each piece of information once. This is a very good thing. |
|---|
| 71 |
|
|---|
| 72 |
BASIC ARCHITECTURE |
|---|
| 73 |
|
|---|
| 74 |
Again, look at the architecture diagram at http://madstop.com/naginator/nagios.png. |
|---|
| 75 |
|
|---|
| 76 |
This project creates its configuration in a three step process: |
|---|
| 77 |
|
|---|
| 78 |
Initial Generation: |
|---|
| 79 |
|
|---|
| 80 |
First, cfengine runs its 'module:nagios' module; it passes to this module |
|---|
| 81 |
all classes configured for the host it is running on, along with |
|---|
| 82 |
a string containing a list of services to monitor. These classes correspond |
|---|
| 83 |
directly to hostgroups configured in Nagios (via the 'hostgroups.cfg' file); |
|---|
| 84 |
in fact, the module parses the hostgroups file and just tries to match classes |
|---|
| 85 |
and hostgroups. The host is added to any hostgroups that are matched. |
|---|
| 86 |
|
|---|
| 87 |
The string that gets passed to the module is a list of services. These services |
|---|
| 88 |
coincidentally exactly correspond to services configured in the checkcommands.cfg |
|---|
| 89 |
file, and in fact the module parses that file and then searches through it for |
|---|
| 90 |
each of these services. The one wrinkle is that you can pass a service with |
|---|
| 91 |
the '!argument' style, so you could say 'disk!/var'. |
|---|
| 92 |
|
|---|
| 93 |
The module now has a) a list of all the host groups our host is in, and b) a |
|---|
| 94 |
list of all of the services to monitor. Currently, the module is hard-coded |
|---|
| 95 |
to use 'generic-host' and 'generic-service' for the parent, respectively, of |
|---|
| 96 |
the host and class configurations. This could relatively easily be added as |
|---|
| 97 |
an option, but might complicate other things. |
|---|
| 98 |
|
|---|
| 99 |
So, the module just generates a configuration containing this information. |
|---|
| 100 |
However, there actually need to be two configurations: One to run locally |
|---|
| 101 |
(for things like checking disks) and one for running on the central server |
|---|
| 102 |
(for checking services remotely, and also for receiving updates from NSCA and |
|---|
| 103 |
displaying stats in the web pages). So, only services that match /local/ are |
|---|
| 104 |
put into the local configuration, and for the remote configuration file, |
|---|
| 105 |
local services are converted to passive checks (because, remember, the server |
|---|
| 106 |
will be receiving its info from NSCA). |
|---|
| 107 |
|
|---|
| 108 |
Okay, now we've got our configuration. |
|---|
| 109 |
|
|---|
| 110 |
Sending the configuration: |
|---|
| 111 |
|
|---|
| 112 |
For the local configuration, all we need to do is start the nagios process. |
|---|
| 113 |
For this, we have a special 'localnagios.cfg' file configured to run against |
|---|
| 114 |
the generated file (defaults to /etc/nagios/local.cfg), so we're basically done. |
|---|
| 115 |
Note that the local.cfg file contains the modified definitions for the |
|---|
| 116 |
hostgroups, so even though you need a hostgroups.cfg file to generate the |
|---|
| 117 |
configs, you should not import it in your nagios configuration file. |
|---|
| 118 |
|
|---|
| 119 |
For the remote configuration, we somehow have to get the generated |
|---|
| 120 |
'remote.cfg' file to our central host. You can do this however you want, but |
|---|
| 121 |
I have written scripts for each end of this operation (sending and receiving) |
|---|
| 122 |
along with the necessary support structure to run these scripts over ssh. |
|---|
| 123 |
|
|---|
| 124 |
The sending script is 'nagsend', and it does not require any arguments (the |
|---|
| 125 |
default file to send is /etc/nagios/remote.cfg, and can be changed with |
|---|
| 126 |
--config <file>). The receiving script is 'nagaccept', and it must be run |
|---|
| 127 |
with the host name of the host who is sending the script. These scripts are |
|---|
| 128 |
written to be run on either end of a pipe, kind of like this: |
|---|
| 129 |
|
|---|
| 130 |
nagsend | ssh nagios@nagios nagaccept `hostname` |
|---|
| 131 |
|
|---|
| 132 |
Except that nagsend actually launches the ssh session; it looks a lot like the |
|---|
| 133 |
above line, though. The hostname is required because it helps secure the |
|---|
| 134 |
system when using SSH (described below). |
|---|
| 135 |
|
|---|
| 136 |
Once the configuration is received, it defaults to being stored in |
|---|
| 137 |
~nagios/collection/<hostname>.cfg. From there, the script 'nagconfig' is used |
|---|
| 138 |
to collate those configs into a single generated configuration (defaults to |
|---|
| 139 |
/etc/nagios/generated.cfg), which is imported by the central nagios |
|---|
| 140 |
configuration file. Note that this generated file includes the modified |
|---|
| 141 |
hostgroups, so you should not import the hostgroups.cfg file in your nagios |
|---|
| 142 |
configuration. |
|---|
| 143 |
|
|---|
| 144 |
Okay, we're now done. We just start our centralized server. Note that the |
|---|
| 145 |
central server will have two configurations: One running against the local |
|---|
| 146 |
nagios configuration, and one running against the central configuration. I |
|---|
| 147 |
figured this was easier than treating that host specially in all of the |
|---|
| 148 |
scripts. Feel free to complain, but complaints should definitely come with |
|---|
| 149 |
patches. |
|---|
| 150 |
|
|---|
| 151 |
AUTHENTICATION |
|---|
| 152 |
|
|---|
| 153 |
Okay, like I said, I use ssh to send these files. I use keys to do so, which |
|---|
| 154 |
means I don't have to type passwords, which is good, because these scripts are |
|---|
| 155 |
all meant to run automatically (out of cfengine, for instance). |
|---|
| 156 |
|
|---|
| 157 |
What I have is each host's root user authenticating to my central host as the |
|---|
| 158 |
'nagios' user (this is mostly configurable -- at least 'nagsend' accepts a |
|---|
| 159 |
'--user' flag), and it uses the host private key for the authentication. This |
|---|
| 160 |
requires the following pieces: |
|---|
| 161 |
|
|---|
| 162 |
1) The local host must have something like the following ~root/.ssh/config: |
|---|
| 163 |
Host nagios.domain.com nagios |
|---|
| 164 |
User nagios |
|---|
| 165 |
IdentityFile /etc/ssh/ssh_host_dsa_key |
|---|
| 166 |
ForwardAgent no |
|---|
| 167 |
ForwardX11 no |
|---|
| 168 |
Compression yes |
|---|
| 169 |
|
|---|
| 170 |
Obviously, change the paths as appropriate. The other flags aren't required, |
|---|
| 171 |
but they help secure things just a bit. |
|---|
| 172 |
|
|---|
| 173 |
2) The local host must already have the central server's public key stored, |
|---|
| 174 |
either in root's known_hosts file or the central file, and the public key |
|---|
| 175 |
must have whatever host name you are using stored with it. For instance, I |
|---|
| 176 |
have my host 'kirby' running my central nagios copy, but I have a cname to |
|---|
| 177 |
'nagios', and that's the host I ssh to. This means I need to have kirby's |
|---|
| 178 |
public key on each system, but I need to have 'nagios' stored with that public |
|---|
| 179 |
key (probably in addition to 'kirby'). |
|---|
| 180 |
|
|---|
| 181 |
3) The central server's nagios account must be configured to accept these |
|---|
| 182 |
logins by having each host's public key in ~nagios/.ssh/authorized_keys. It |
|---|
| 183 |
is highly recommended that your configuration for each public key look |
|---|
| 184 |
something like this: |
|---|
| 185 |
|
|---|
| 186 |
no-port-forwarding,no-X11-forwarding,no-agent-forwarding,command="nagaccept |
|---|
| 187 |
hostname" ssh-dss [...key...] |
|---|
| 188 |
|
|---|
| 189 |
Notice the inclusion of the hostname in the command; this makes it so that |
|---|
| 190 |
the only command that this host can run as the nagios user is 'nagaccept', |
|---|
| 191 |
with its own hostname as an argument. The absolute worst compromise you could |
|---|
| 192 |
get with this configuration is some other host being able to replace this |
|---|
| 193 |
host's nagios configuration, which is pretty darn secure, if you ask me. |
|---|
| 194 |
|
|---|
| 195 |
4) The central server's nagios account must be set up to find that 'nagaccept' |
|---|
| 196 |
command. This means it needs to either be in a standard bin directory, or you |
|---|
| 197 |
need to add the appropriate path to ~nagios/.ssh/environment (and for current |
|---|
| 198 |
versions of SSH, you need to have 'PermitUserEnvironment yes' set in |
|---|
| 199 |
sshd_config). |
|---|
| 200 |
|
|---|
| 201 |
That's all. (Yes, that's a bit sarcastic.) |
|---|
| 202 |
|
|---|
| 203 |
I use a couple scripts and LDAP to simplify the key management for me. I use |
|---|
| 204 |
'enhost' (http://madstop.com/svn/enhost; probably not really documented) to |
|---|
| 205 |
collect the keys and store them in LDAP (it requires 'facter'; |
|---|
| 206 |
http://madstop.com/svn/facter, also not well documented right now). Feel free |
|---|
| 207 |
to try to get them running, but it might be easier for you to just wait a week |
|---|
| 208 |
or two if you want to use these, as I'll be documenting them ASAP. If you |
|---|
| 209 |
really want to get started, though, feel free to use the enhost.schema file |
|---|
| 210 |
from the enhost repository. |
|---|
| 211 |
|
|---|
| 212 |
Once the keys are in LDAP, I use the 'nagkeys' script in the naginator package |
|---|
| 213 |
to pull them from LDAP and store them (using the above configuration) in |
|---|
| 214 |
~nagios/.ssh/authorized_keys. |
|---|
| 215 |
|
|---|
| 216 |
FINITO |
|---|
| 217 |
|
|---|
| 218 |
Yes, that's finally it. Done. It should all work. It's a lot of work to |
|---|
| 219 |
set up, but with cfengine it should be much easier, and once it's set up, |
|---|
| 220 |
it just maintains itself indefinitely, even as you add more services. |
|---|
| 221 |
|
|---|
| 222 |
Oh, I also use the 'cfengine/bin/localdisks' script to pull a list of local disks |
|---|
| 223 |
and configure them to be monitored. Not required, but a bit of help in |
|---|
| 224 |
making the configuration. |
|---|
| 225 |
|
|---|
| 226 |
Please email luke at madstop.com with problems. |
|---|