This more of a description of plans and progress toward fixing #583 rather than a spec, but hey, take what you can get.
Goals
- Use REST for all communication, instead of XMLRPC, to eliminate the encoding/escaping overhead of XMLRPC and simplify the process of integrating with other services
- Retain the ability for a given network service to be used locally or remotely
- Simplify creating new services and new termini for those services.
Secondary Goals
- Allow direct file serving by the external web servers, rather than forcing Puppet's server to handle all files (e.g., when using Mongrel behind Apache, we should be able to allow Apache to serve the files, rather than sending those requests to Mongrel)
Terminology
| REST: | Representative state transfer. See external sources. Basically, this means using standard HTTP verbs like 'get' and 'put', along with semantic URIs, to build a web services infrastructure. |
|---|---|
| XMLRPC: | The current RPC-like mechanism used within Puppet. All connections must encode into XML, and the APIs can be completely arbitrary. |
| Indirection: | The process of passing a method call on to another object. E.g., in our case, we want to support getting node information from any of several sources, including LDAP or iClassify. So, we configure Puppet to use indirection to send the method to the correct source. |
| Terminus: | The end-point of an indirection. In the above example, LDAP and iClassify are termini. |
Plans
The primary plan is to reorganize our network services around the classes being served, and then to support indirection as a means to passing these objects over the network. In looking at our current network services, here are some representative classes we manipulate through network calls:
- certificates
- configurations
- nodes
- files
Let's delve into a simple example: nodes. Lots of other services need to know about nodes (e.g., configurations). Node information could be retrieved from one of multiple locations (currently, either LDAP or an external node tool). To get information about a given node, you will call get on the Node class with the name of the node you want information about. This method will, in turn, get redirected to an appropriate node source (LDAP, etc.) which will then look up the node's information and (this is the important bit) return an instance of the Node class.
This provides a clear picture of how to network this service: We always support rest as an available node source, and we configure our server to know how to route this kind of request appropriate. For instance, we might configure our node source as 'https://puppet/node/<node_name>', and our server would configure routing (which is merely the process of deciding which object gets a given HTTP verb called on it) so that any paths in '/node/' would get sent to our Node class, and the http verb used here, 'get', would get called on that class with the rest of the URI (the node name, in this case).
For remote node sources, the terminus would need to convert to YAML or some other serialization, and the receiver would need to deserialize. For local sources, of course, no serialization would be necessary.
Details
We are creating an indirector module that can be mixed in to networked objects to provide this indirection. It must be mixed in, and then configured using the indirects method:
class Puppet::Node extends Puppet::Indirector indirects :node, :to => :node_source ... end
The :to argument tells the indirector what configuration parameter to use to figure out the name of the indirection terminus. It then autoloads that terminus from, e.g., 'puppet/indirector/node/ldap', if the node source is set to ldap. Each terminus is responsible for implementing the methods necessary to make the indirection function, although at this point no validation is done or even possible to make sure that is true.
Serialization
These REST interfaces are being developed to facilitate integration between Puppet and other tools, among other reasons, so it's critical that serialization not be Ruby-specific. In addition to using a standard markup, like YAML, we should serialize to simple, non-Ruby objects. For instance, when serializing a node, we should serialize it as a hash, with a name, an environment, and some parameters (each keyed appropriately), rather than a Puppet::Node instance. This will require serialize and deserialize methods on each of these objects.
This also brings up how to best simplify this serialization. Preferably, neither the indirection termini nor the consumers would need to worry about serialization; instead, you'd want some kind of layer in the middle that knew that a given method accepted an object to be serialized or returned a serialized object and acted accordingly. This could probably be added to the indirects call, something like:
indirects :node, :to => :node_source, :serialize_in => :put, :serialize_out => :get
I expect there's a better internal DSL we can use than that. Maybe 'indirects' should yield:
indirects :node do handle :get, :put # the accepted methods serialize_in :put serialize_out :get to :node_source end
Either way, we need something to do this, else all of our network services will be doing this all over the place. We especially don't want the consumers of these network services to have to think about whether a given terminus is local or remote.
This would normally be something handled either in the routing layer or with the controller in an MVC model, but I am not necessarily planning on copying that model exactly, so I don't know if I should try to copy that model or what.
Authorization
Authorization is currently provided in one or two layers within Puppet: Each network service can configure its own authorization system, as the file server does, and there is an over-all authorization system that affects every network service, configured via the namespaceauth.conf.
We'll need some equivalent over-all system for REST, and it would be best if we could have a standard way of providing per-service authorization.
Potential Issues
Transition Functionality
There are still some functional areas that are not necessarily part of either the terminus nor the class, and it's unclear where this functionality would reside.
For instance, when facts are posted to the server, we always want to add some metadata, such as when the facts were updated. Also, some information is retrieved so often that we'll want to have short-lived caches; we'll be asking for node information on every fileserving request, which can easily be hundreds of times a second, and it seems silly to make each terminus implement caching separately.
Another example is certificate signing -- sometimes certificates should be signed immediately, and sometimes the certificate should be stored to be signed manually. This will still need to be two requests -- put the csr, get the cert -- but something in there needs to notice that autosigning is enabled and then sign the certificate.
Whatever solution we choose looks like it will be affected by the fact the we can and likely will chain networked calls. For instance, clients could set their fact store as the central server, and the central server could set the fact store as a database. If we've got any transition functionality, such as adding metadata like timestamps, then which class adds that metadata, the client's class or the server's (or the remote terminus)? It's easy to just say the terminus should always handle it, but there is a need for this kind of functionality that spans all termini.
I expect that these can be handled through hooks of some kind, but I'm not sure.
Update:
Looks like, for now, in those cases that need this functionality we will define another method that provides it, and the initiating node will start with that method but the rest of the way through the chain, everyone will use the http methods. For instance, we might use Facts.store(facts), which would then add the metadata and then call Facts.put(facts).
Multiple Termini At Once
Files, at least, will need support for more than one terminus in a given process. We need to be able to get file data from multiple file servers, which means that the indirection interface needs some ability for call-time configuration. It has been posited that we use some kind of location mechanism, such that we could call something like this to retrieve a file's metadata:
Puppet::File.get "/sudo/sudoers", :location => "http://puppet/file"
We might, in the same configuration, then need something like this, for local copying:
Puppet::File.get "/etc/motd", :location => "localhost"
It is unclear exactly what this interface would look like, though, or how it would be configured (e.g., it could not use the standard indirects class-level interface, since a given configuration will determine the list of termini necessary).
While the termini are normally used as singletons, such that you'd only ever have a single terminus of a given type (e.g., one node source), but this is clearly different. As such, it will need both a different configuration mechanism and a different calling mechanism. I expect that we need to come up with a standard calling mechanism that supports this kind of specification of extra information, and then default to not needing it for those cases that only interact with one terminus at a time.
Different Views of the Same Class
Configurations, in particular, are interacted with in two ways: They are retrieved, normally from the compiler, and they are stored, normally in a database, or maybe in YAML or via REST. In addition, individual resources within the configuration will be addressed, as when we are collecting resources.
We are not yet clear on how to handle either of these differing views. It might be that we need both a Configuration class and a ConfigStore class, and then the ConfigStore class would provide further specification abilities, so you could say, for instance:
ConfigStore.find "/resources?type=sshkey"
We'd need to translate our simplistic collection queries into whatever system we decide upon.
Documentation
We need some way to generate an indirection reference, so that people know what indirections are available, what termini are available for each, what they all do, and how to select a given terminus for your indirection.
This will be complicated by the fact that some of this information is in the indirector and some is in the indirecting class, and we could theoretically use the same indirection in multiple classes.
Notes
File Serving
Based on a few design issues and some plans for the future, we've decided to split the file server into two services: Metadata, which handles providing a file's owner, mode, type, group, and checksum, and then a Checksum (or possibly Content) service, which can turn a checksum into a string. This is a different way of thinking about file serving, but it makes a lot more sense than how we do it now.
The current protocol involves two queries: The first query returns the metadata of a file, and the second query returns its content (if necessary). This is similar to what we want, except that both queries have the same target (e.g., http://puppet/file/sudo/sudoers). This is problematic because as we provide change control features, we're going to want this separation (I'll have to provide more detail later).
This change will need to be done in a way that still makes it possible to serve the files over apache or something, although that's only a secondary goal.
Conclusion
There's much more, but this is a start.