Puppet: System Administration Automated

Support

The Problem

One of the primary goals of the Puppet library is that it provide an abstraction layer between the objects you specifically want to manage and the messy details of how those objects are implemented on a given platform. The API that it exposes to users of the library does a good job of doing this, but the internals of the library do not do very much to make it easy to extend this abstraction.

Some types require little or no abstraction -- Ruby has builtin, portable file support, so there is very little per-platform abstraction -- while others require complete customization -- MacOS X uses Net Info? for user and group management, and nearly every platform has a unique way of managing services and platforms.

Puppet's library internals need to be modified to make this variety of abstraction simple.

The Proposal

So far, the plan is to split Puppet types into two classes: A Model class and an Implementation class. The Model is responsible for defining the valid attributes for a type and providing validation mechanisms for those attributes. The Model will always need to be cross-platform, even if there are small differences in validation from one platform to another.

The Implementation will be responsible for actually modifying the running operating system state. A given Implementation will know how to create instances, modify them, and remove them. Each Model instance would have a corresponding Implementation instance.

It is not yet known how these two class types will interact, or where exactly the line between them will be.

The Interface

One of the most useful recent changes to Puppet is that states and attributes can register valid values, and states can provide code to set those values. For instance, this is how a highly simplified File type might manage creating different file types:

Puppet::Type.newtype(:file) do
    newstate(:ensure) do
        newvalue(:file) do
            File.open(@parent.name, "w") { |f| f.print "" }
        end

        newvalue(:directory) do
            Dir.mkdir(@parent.name)
        end

        newvalue(:absent) do
            File.unlink(@parent.name)
        end
    end
end

This defines three values, file, directory, and absent, along with the code necessary to implement each of these values.

You can clearly see that this crosses the boundary between model and implementation, yet it does so in a very useful way. It gets more complicated than this because you can also register regular expressions as valid values. These two in combination provide very simple builtin validation -- if a value is not a specifically allowed value and does not match any registered regular expressions, then it is not a valid value.

As much as possible, the Model and Implementation should autodiscover information about each other, and the separation between the two should not result in a significant amount of extra bookkeeping. My first thought on joining them was that the newvalue method should accept a method to call on the Implementation, instead of a block:

class Puppet::Model::File
        newstate(:ensure) do
            newvalue(:file, :mkfile)
            newvalue(:directory, :mkdirectory)
            newvalue(:absent, :mkabsent)
        end
    end
end

class Puppet::Impl::File
    def mkabsent
        File.unlink(@model.name)
    end

    def mkdirectory
        Dir.mkdir(@model.name)
    end

    def mkfile
        File.open(@model.name, "w") { |f| f.print "" }
    end
end

You can already see plenty of duplication there, though. It makes much more sense for the Model to assume the method name (and possibly accept it as an argument), and all that's left is to provide a way to specify the method name for regular expression values. This assumptive ability has a dimension added to it by the fact that a given Puppet type often has multiple states, and each of them can support multiple values, and sometimes the same value -- for instance, a File may have its owner and group both set to root.

This means that the method that the Model assumes will likely be named something like set_<state>_<value>, which would mean that our fictional File Implementation would have methods named set_ensure_file, set_ensure_directory, and set_ensure_abset. For regular expressions, we would just require that a name was also registered for the regular expression, and the associated method would be based on the name of the regex, not the regex itself.

This can get kind of ugly, though, in that a given Implementation will have this all these methods named according to the state they're for but otherwise just in a flat list. It might make more sense to have the Implementation's class structure map somewhat to the Model's (e.g., have a separate class for each state).

Object Creation Vs. Modification

You may have noticed that the previous example is even more contrived-looking than normal; this is because what's really being modelled on the file is whether it exists at all, and if so, what state it's in. This is because Puppet was initially designed without any standard concept of object existence. I have slowly moved to using ensure as the standard way of managing existence, but it is still a separate state, and not as "special" as it should be. Puppet does treat the state somewhat specially internally, in that if it is out of sync then it is the only state that's synced -- e.g., if a file's ensure state should be file but is absent, then Puppet ignores whether the owner state is absent instead of root, and expects the code that creates the file to also make sure that the other states are correctly synced. This means that most Puppet types spend code handling all of this, when really it should be done at the class level.

It would be best if Puppet types had a clear concept of creating and removing instances of that type.

It probably makes sense to have both the model and implementation treat object creation specially, but I don't quite know how.

Conclusion

There is still more work to do to figure out the best way to structure this separation, and implementing the separation will take a significant amount of effort, but it will have a large payoff in terms of simplifying internal classes and making it easier to add new types and implementations. If you doubt this needs to be done, please take a look at how users and packages are managed currently.