root/INSTALL

Revision 648e1b45c1caab30dcbf99ba643c0e34796269ae, 11.0 kB (checked in by Luke Kanies <luke@madstop.com>, 3 years ago)

moving everything to the trunk subdir

git-svn-id: https://reductivelabs.com/svn/naginator/trunk@20 a6514a99-27e0-0310-b50f-e00a1f381428

  • Property mode set to 100644
Line 
1 INTRODUCTION
2
3 There are three pieces to installing this package, once you have all of
4 the prequisites out of the way:  Authentication, per-host configuration,
5 and central configuration.  All of this will make much more sense
6 if you look at the architecture diagram at
7 http://madstop.com/naginator/nagios.png.
8
9 PREREQUISITES
10
11 You need a Ruby (http://www.ruby-lang.org) interpreter and Nagios
12 (http://www.nagios.org) installed on all of your hosts, along with
13 all of the plugins that you want to execute.  Cfengine
14 (http://www.cfengine.org) is not a requirement, but you should be
15 using it anyway, and it will certainly make this job easier.
16
17 I also use my own packages 'enhost' and 'facter', currently only available via
18 subversion (at http://madstop.com/svn/<package>).  They should be documented and
19 have a page in a week or so.  Check this space.  I only use them for key
20 management, though, so you can easily survive without them.
21
22 If you find any more prereqs, please let me know.
23
24 NOTES
25
26 Many things are currently hardcoded, and I'm not even aware of all of those
27 things.  Please notify me if you find something hardcoded that should not be.
28
29 DEFAULTS
30
31 Probably the only default that is not easily changed right now is the nagios
32 user name: 'nagios'.  Hopefully this will be fixed some day.
33
34 The default configuration directory for Nagios is /etc/nagios; configurations
35 are generated into that directory, and scripts that require configs default to
36 looking there.  Most scripts accept some kind of flag to change that, however.
37
38 I use '/var/nagios' as ~nagios, but I almost always expand '~nagios' instead
39 of using the full path, so as long as you have that set to the directory where
40 you install the scripts and such, it should all work out okay.
41
42 Also, because some of these scripts use libraries, you will need to either
43 copy those libraries to a system default location (e.g., /usr/local/lib/ruby),
44 or you'll need to add their location to LIBRUBY.  Or, you can modify the $: <<
45 command I have in them, but that line will be removed when this project goes
46 production.
47
48 REASON FOR EXISTENCE
49
50 A central Nagios process is great, because you can do host-alive and
51 remote service checks on all of your hosts.  Unfortunately, it is difficult
52 to use this process to do checks that have to be done on each local host,
53 such as disk usage and usage statistics.  Sure, you can use NRPE, but there
54 are reasons not to use this iterative method (we could argue those reasons,
55 but suffice it to say I don't like the method).
56
57 I prefer to have each host use NSCA to send the necessary information to
58 the central server.  Unfortunately, that means that 1) each machine needs
59 to have its own configuration, and 2) each machine needs to run Nagios.
60
61 Fortunately, I use cfengine, and I use it in such a way that it already
62 knows quite a bit about my machines.  One of the main reasons I want to
63 use NSCA instead of NRPE is because NSCA and a local Nagios configuration
64 allow me to reuse the information that my machines know about themselves
65 from their cfengine configurations.  For instance, in cfengine I might
66 configure a host to be a mail server; if I use NRPE, I have to store the
67 fact that that host is a mail server in both cfengine and in Nagios,
68 but if I can somehow generate a configuration out of cfengine, basing
69 my configuration on only what cfengine already knows, then I am only
70 configuring each piece of information once.  This is a very good thing.
71
72 BASIC ARCHITECTURE
73
74 Again, look at the architecture diagram at http://madstop.com/naginator/nagios.png.
75
76 This project creates its configuration in a three step process:
77
78 Initial Generation:
79
80 First, cfengine runs its 'module:nagios' module; it passes to this module
81 all classes configured for the host it is running on, along with
82 a string containing a list of services to monitor.  These classes correspond
83 directly to hostgroups configured in Nagios (via the 'hostgroups.cfg' file);
84 in fact, the module parses the hostgroups file and just tries to match classes
85 and hostgroups.  The host is added to any hostgroups that are matched.
86
87 The string that gets passed to the module is a list of services.  These services
88 coincidentally exactly correspond to services configured in the checkcommands.cfg
89 file, and in fact the module parses that file and then searches through it for
90 each of these services.  The one wrinkle is that you can pass a service with
91 the '!argument' style, so you could say 'disk!/var'.
92
93 The module now has a) a list of all the host groups our host is in, and b) a
94 list of all of the services to monitor.  Currently, the module is hard-coded
95 to use 'generic-host' and 'generic-service' for the parent, respectively, of
96 the host and class configurations.  This could relatively easily be added as
97 an option, but might complicate other things.
98
99 So, the module just generates a configuration containing this information.
100 However, there actually need to be two configurations:  One to run locally
101 (for things like checking disks) and one for running on the central server
102 (for checking services remotely, and also for receiving updates from NSCA and
103 displaying stats in the web pages).  So, only services that match /local/ are
104 put into the local configuration, and for the remote configuration file,
105 local services are converted to passive checks (because, remember, the server
106 will be receiving its info from NSCA).
107
108 Okay, now we've got our configuration.
109
110 Sending the configuration:
111
112 For the local configuration, all we need to do is start the nagios process.
113 For this, we have a special 'localnagios.cfg' file configured to run against
114 the generated file (defaults to /etc/nagios/local.cfg), so we're basically done.
115 Note that the local.cfg file contains the modified definitions for the
116 hostgroups, so even though you need a hostgroups.cfg file to generate the
117 configs, you should not import it in your nagios configuration file.
118
119 For the remote configuration, we somehow have to get the generated
120 'remote.cfg' file to our central host.  You can do this however you want, but
121 I have written scripts for each end of this operation (sending and receiving)
122 along with the necessary support structure to run these scripts over ssh.
123
124 The sending script is 'nagsend', and it does not require any arguments (the
125 default file to send is /etc/nagios/remote.cfg, and can be changed with
126 --config <file>).  The receiving script is 'nagaccept', and it must be run
127 with the host name of the host who is sending the script.   These scripts are
128 written to be run on either end of a pipe, kind of like this:
129
130 nagsend | ssh nagios@nagios nagaccept `hostname`
131
132 Except that nagsend actually launches the ssh session; it looks a lot like the
133 above line, though.  The hostname is required because it helps secure the
134 system when using SSH (described below).
135
136 Once the configuration is received, it defaults to being stored in
137 ~nagios/collection/<hostname>.cfg.  From there, the script 'nagconfig' is used
138 to collate those configs into a single generated configuration (defaults to
139 /etc/nagios/generated.cfg), which is imported by the central nagios
140 configuration file.  Note that this generated file includes the modified
141 hostgroups, so you should not import the hostgroups.cfg file in your nagios
142 configuration.
143
144 Okay, we're now done.  We just start our centralized server.  Note that the
145 central server will have two configurations:  One running against the local
146 nagios configuration, and one running against the central configuration.  I
147 figured this was easier than treating that host specially in all of the
148 scripts.  Feel free to complain, but complaints should definitely come with
149 patches.
150
151 AUTHENTICATION
152
153 Okay, like I said, I use ssh to send these files.  I use keys to do so, which
154 means I don't have to type passwords, which is good, because these scripts are
155 all meant to run automatically (out of cfengine, for instance).
156
157 What I have is each host's root user authenticating to my central host as the
158 'nagios' user (this is mostly configurable -- at least 'nagsend' accepts a
159 '--user' flag), and it uses the host private key for the authentication.  This
160 requires the following pieces:
161
162 1) The local host must have something like the following ~root/.ssh/config:
163 Host nagios.domain.com nagios
164 User nagios
165 IdentityFile /etc/ssh/ssh_host_dsa_key
166 ForwardAgent no
167 ForwardX11 no
168 Compression yes
169
170 Obviously, change the paths as appropriate.  The other flags aren't required,
171 but they help secure things just a bit.
172
173 2) The local host must already have the central server's public key stored,
174 either in root's known_hosts file or the central file, and the public key
175 must have whatever host name you are using stored with it.  For instance, I
176 have my host 'kirby' running my central nagios copy, but I have a cname to
177 'nagios', and that's the host I ssh to.  This means I need to have kirby's
178 public key on each system, but I need to have 'nagios' stored with that public
179 key (probably in addition to 'kirby').
180
181 3) The central server's nagios account must be configured to accept these
182 logins by having each host's public key in ~nagios/.ssh/authorized_keys.  It
183 is highly recommended that your configuration for each public key look
184 something like this:
185
186 no-port-forwarding,no-X11-forwarding,no-agent-forwarding,command="nagaccept
187 hostname" ssh-dss [...key...]
188
189 Notice the inclusion of the hostname in the command; this makes it so that
190 the only command that this host can run as the nagios user is 'nagaccept',
191 with its own hostname as an argument.  The absolute worst compromise you could
192 get with this configuration is some other host being able to replace this
193 host's nagios configuration, which is pretty darn secure, if you ask me.
194
195 4) The central server's nagios account must be set up to find that 'nagaccept'
196 command.  This means it needs to either be in a standard bin directory, or you
197 need to add the appropriate path to ~nagios/.ssh/environment (and for current
198 versions of SSH, you need to have 'PermitUserEnvironment yes' set in
199 sshd_config).
200
201 That's all.  (Yes, that's a bit sarcastic.)
202
203 I use a couple scripts and LDAP to simplify the key management for me.  I use
204 'enhost' (http://madstop.com/svn/enhost; probably not really documented) to
205 collect the keys and store them in LDAP (it requires 'facter';
206 http://madstop.com/svn/facter, also not well documented right now).  Feel free
207 to try to get them running, but it might be easier for you to just wait a week
208 or two if you want to use these, as I'll be documenting them ASAP.  If you
209 really want to get started, though, feel free to use the enhost.schema file
210 from the enhost repository.
211
212 Once the keys are in LDAP, I use the 'nagkeys' script in the naginator package
213 to pull them from LDAP and store them (using the above configuration) in
214 ~nagios/.ssh/authorized_keys.
215
216 FINITO
217
218 Yes, that's finally it.  Done.  It should all work.  It's a lot of work to
219 set up, but with cfengine it should be much easier, and once it's set up,
220 it just maintains itself indefinitely, even as you add more services.
221
222 Oh, I also use the 'cfengine/bin/localdisks' script to pull a list of local disks
223 and configure them to be monitored.  Not required, but a bit of help in
224 making the configuration.
225
226 Please email luke at madstop.com with problems.
Note: See TracBrowser for help on using the browser.