Dance Puppets, DANCE!

By Thomas Vachon

What an odd title? What has to deal with puppets and system administration? Well, in fact there is a program called puppet. What is puppet? Puppet is a client/server software system put out by Reductive Labs which allows for simplistic management of *Nix (OS X included).

"So it manages things, but I can do that with some custom scripts." Well that is a sentiment I have run into in my current position, but it was quickly overcome when shown how puppet differs from other in-house scripts. Most in-house systems are a mix of OSS software combined into a single usable instance. For instance, using rsync to sync up scripts to a list of servers, and ssh to execute and read back any output. While that works on a homogenized environment where all conventions are followed, but what happens if someone spins up a new site where they didn't or more likely could not follow such conventions. This is the problem I found myself in. I inherited a EC2 environment and due to the way EC2 servers are built and the OS version we had to run, none of our pre-made scripts would work. Thus begins my adventures in configuration management.

My first experiment was similar to our in-house system which was a combination of rsync and SSH systems which synced up the root file system to a copy of it on the "admin" server. It then would execute any necessary commands via ssh. The "system" did have package management, using a combination of dpkg --get-selections and ssh commands, but it was far from easily manageable and required a run per a server type. Overall, while the system worked, it was far from scaleable.

Thus began my adventure of looking at configuration management solutions. The three that stood out were cfengine, puppet, and bfcfg2. While cfengine and puppet are the closest related, there are some significant differences in design/philosophy which set them apart. Most importantly, you have to clearly define different OS'es in cfengine which puppet just handles. More info can be found here. bfcfg2 was not selected for a variety of reasons. It lacks a good way to bundle up servers into classes, which some workarounds have been developed. More importantly, its configuration language is not easily understandable by multiple people. It is written in XML and would require extensive comments to make it clear to a multi-person operations team.

This left me with puppet. Puppet comes highly recommended by several sources including Digg and Google, both of whose recommendations are not easy to come by. Digg likes to use some OSS software but build upon it. They did not do this for puppet which is a testament of its features and flexibility. Google uses puppet to manage all their linux desktops and will be expanding it in the near term to whole data centers. So what is so great about puppet? Some of the best things is its mutliplatform abilities, code re-usability, and just plain readability/codeability.

Ok, so you've ranted about puppet, what's the big deal? I can do this type of stuff in my sleep. Sure you can, but can you install a package on hundreds of servers in a matter of minutes? If you have scripts pre-written, no problem. What if you want to install a new server and make its package versions match EXACTLY every other server and you only have 1 hour to do it, well you are pretty certainly screwed, unless you have a puppet system set up. A timed install of a server using apache2, rails, mod_rails, and about 15 other gems, takes my puppet install all of 10 minutes. This is the beauty of puppet.

So what does the configuration look like? We'll its very similar to cfengine, as puppet is an outgrowth, but it also runs on Ruby on Rails, so you have the power of ERB templating at your fingertips. Let's start with a simple script to ensure SSH is installed and running on boot and its configuration files are in place.

File: ssh.pp

  class sshd{
    file { "/etc/ssh/sshd_config":
    owner => root,
    group => root,
    mode => 0444,
    source => "puppet:///files/sshd_config"
    notify => Service["ssh"]
  }
  file{ "/etc/ssh/ssh_config":
    owner => root
    group => root
    mode => 444
    source => "puppet:///files/ssh_config"
    notify => Service["ssh"]
  }
  service{ ssh:
    ensure => running
  }
}

Wow, thats kinda cool right, but what does it all mean? Well puppet is broken down into 3 major building blocks, the node, which is the server entry in a file called nodes.pp; the class, which is a container for a bunch of stuff; and finally the resource, which is the meat and potatoes of the system. The resources, as seen in this example are the "file" resource and the "service" resource.

The file resource can take a bunch of options, the most interesting one here is source. the puppet:/// tells the client (which parses these manifests) to look at a embedded webrick server (which the puppetmaster runs) and grab the file from there. It then places it at the file path specified in the first line of the resource. The notify line says, if this file is updated, restart ssh.

The service (which has to be in the same class as any notify declarations) in this case says that it has to be running on boot. Puppet is intelligent enough to know if something has to be running on boot, it also needs to be installed. Note, the package name HAS to match the installable version for you OS. Now you say, well how is THAT portable, well watch this, change your service delcaration to match

service{ ssh:
  name => operatingsystem ? {
    'debian' => "ssh",
    'centos' => "sshd"
  },
  ensure => running
}

This queries the underlying OS and installs the appropriate package. You can do similar things for the “path” attribute if you need the ssh configurations in a different spot.</p>

So that is a quick overview of how puppet works. I will be going more in-depth on how I have chosen to deploy it in a later post.