Update: As of 0.17.0, this functionality is available.
Most of this concept was hashed out in a thread on puppet-dev, although I called the concept aggregation there instead of collection.
Introduction
One of the primary characteristics of the Puppet language is that a given manifest is interpreted in the context of a specific host. That host's Facter facts are set as variables in the top-level scope, and then the scope tree is thrown away for the next host. Overall this works relatively well -- the host evaluates the classes it needs and gets all of the instances it needs.
However, there are a surprisingly large number of cases where one host does need to configure aspects of another host. (This is disturbingly insecure if you don't have control over the whole configuration, but given that the whole config is centrally maintained you're generally safe.) For instance:
- A web server needs a port open in the firewall
- A client needs to configure the monitoring server to monitor all of its configured services (e.g., if it's an LDAP or web server)
- The SMTP server needs to configure all SMTP clients so they can correctly contact it
Parser Modifications
After much discussion, it was realized that this is entirely a syntactical feature, which means that two new syntactical constructs are required: The first to specify that an object is available for other nodes to collect, and the second to specify the objects to collect. Because this is an asynchronous process -- nodes are getting configured constantly so the list of objects available for collection changes constantly -- it requires a Puppetmaster Database to cache the objects (actually, it will probably be more of a permanent storage than a simple cache).
Collectable Instance
First we'll address what needs to be done to specify that an object is available to other nodes (and thus is not instantiated on this node). It seems to make sense to do something like use a character sigil:
@port { http: destination => $hostname } # open a firewall port
@nameserver { $hostname: ip => $ipaddress } # Set up a nameserver
I don't know if it makes sense to use the '@' symbol particularly (maybe '&' would imply the behaviour better, but I expect that'll be used for 'and' at some point), but something like that, anyway. The specific syntax has definitely not been decided on.
Storing Objects
Using this syntax would result in the object being stored in a central database so that any client could retrieve it (at some point this will probably be a security concern, but, um, we haven't addressed that yet). It's critical that that central database be accessible by more than one puppetmaster server so horizontally scaling can be easily achieved. Scaling databases, both horizontally and vertically, is well characterized now (if not exactly cheap or easy), and it makes sense to rely on that existing knowledge to provide this shared cache of objects.
I expect that this cache will get heavy usage over time; see Puppetmaster Database for more information.
Collecting Objects
This is just a question of performing something like a query. The only syntax that has been proposed so far is similar to a normal type instance but using angle brackets instead of curly braces:
monitor < service =~ /./ > # collect all of the services to monitor sshkey < type == dsa > # retrieve everyone's SSH host keys
You can see that this syntax drastically complicates the Puppet language -- there are currently no operators of any kind, much less regexes (the syntax could be provided without regexes, but the addition of operators is a bigger change than regexes).
I think that this syntax will not be possible without the Parser Redesign; with it done, statements will have values, so it will be much easier to support operators on those values.
FAQ
1. Why don't you just use an attribute to mark the object as collectable, instead of a separate syntax?
Because the parser does not currently handle any of the paramaters; they're passed on exactly as is to the library. I want to maintain this strict separation, where only the library messes with the attributes.
2. Why don't you specify the nodes who should get the collectable objects?
Because that would significantly limit the real possibilities. There's no reason your web server should need to know what your firewall is, it just knows that it needs one.
Conclusion
This is a significant change to the parser, but it only affects the parser, which isolates the complexity somewhat. Once this functionality is in place, Puppet's language will probably be one of the most powerful of the domain-specific languages dedicated to configuration management.
Unfortunately, this feature depends on both Puppetmaster Database and the Parser Redesign, so it's likely to be a while before this is available.