June 12, 2009

Thoughts on Puppet

Over the last month or so, I’ve been working on introducing the Puppet configuration management system into our environments. We’ve now got a large number of development environments, and a handful of production boxes, migrated over, so I feel like I’ve learned enough to post some opinions on it.

A quick overview of what Puppet does, for anyone who’s not familiar with it. Puppet runs a central server which contains a database detailing the configuration (in terms of files, packages, services, and other manageable “resources”) on each of your servers. Each server runs a daemon which periodically checks in with the central server, and ensures that every resource the server says it should have, is properly set up and configured. If it’s not, the daemon takes care of setting it up for you.

You can use puppet to manage as much or as little of a server’s configuration as you like. The “hello world” example manages just one single file, but it’s perfectly possible to take a server from a bare OS to a fully configured production box without any intervention at all.

We have about a dozen or so physical servers, hosting about 50 virtual servers. (Mostly solaris zones, but a handful of Ubuntu VMWare guests). Our boxes are quite distinct, we have probably about 20-25 different configurations, with a fairly wide variation between them – oracle boxes, J2EE app servers, SAMP/LAMP servers, and a bunch of other odds and sods.

Puppet can’t really do bare-metal provisioning – at least, not without hackery, so we use PXE-boot+Preseeding to install our Linuxes and just plain “put the DVD in and follow the instructions” to install Solaris. It can provision solaris zones, with a bit of faffing, but VMWare guests are more tricky. So the point at which puppet starts to add value, for us, is once there’s enough OS in place to do (either by script or by hand) “apt-get install puppet”.

On our Ubuntu boxes, we’re managing virtually the entire configuration through puppet. There are a couple of reasons for this:
  • Linux package installation is simpler, more consistent, and has better support in the puppet plug-in community than Solaris
  • Our linux nodes are simpler and less mission-critical than our Solaris ones
  • Our linux nodes are easier to switch between, so we can build a new node from scratch using puppet, and shift it into production without affecting service; most of the Sun stuff requires either downtime or stress to bring a newly-provisioned environment into production.

The thing that stops us moving to a completely puppet-managed system is that we already have a relatively rich and functional set of scripts for capistrano-style building and deploying new versions of the applications. It didn’t seem worth the bother of trying to pull that into puppet just yet.

On the solaris boxes, we’ve started from a very basic set of stuff to manage (/etc/resolv.conf, since we’re in the throes of a DNS re-configuration), and built up gradually, focussing on the infrastructure things (sendmail, NIS config, ZFS housekeeping) and adding in a few higher-level bits and bobs (mysql configs for example) as we go along. I still intend to get the solaris boxes to the same state as the Ubuntus, it’s just taking a while longer.

One of my favourite puppet features so far has been it’s integration with Nagios. We’ve used puppet extensively to define the nagios checks that a service should have; so that now we can guarantee that (for instance) any box running sendmail will also include a mail queue check – or that anything running mysql will have not only a cron job to back it up nightly, but a nagios check to verify the backup actually worked. This has showed up a fairly large number of gaps in our existing monitoring – which, to be fair, had been defined largely by waiting for something to break, then adding a check for it ex post on the affected host. We’ve also uncovered countless instances of inconsistent or just plain broken configuration as we went through (like the host who’s sendmail config was bust, resulting in a queue of ~1500 bits of cron-mail, which all arrived in my inbox after the first puppet run)!

One of the slightly surprising features is that, to date, puppet probably hasn’t really saved us very much time. I think there are two reasons for this:

  • Firstly, we’re still in the process of defining and building up our configurations. We wouldn’t expect to see see particularly big returns until we want to replicate one of those configurations onto a new machine (we have several bits of hardware due to be replaced over the next 12 months, so I expect to start seeing benefits during that time)
  • Secondly, a lot of what we’re doing at the moment is really fixing stuff that’s been broken all along, but just hasn’t caused a problem yet. So I guess you could say we’re investing time and effort now, to offset the risk of some catastrophic surprise later.

A further surprise is how enjoyable I’ve found the process of building the puppet manifests. I’ve often bemoaned the fact that I spend all of my time doing boring ops stuff nowadays, and never get to write any code. But the last three weeks or so have seen me head-down over puppet manifest for days at a time, just like the “good old days” of building apps. The puppet DSL, once you’ve got used to it, is actually a surprisingly rich and elegant way to define a system; it’s a proper language for proper coders ( ;-) )
And the ability to “refactor” server configurations, taking a duplicative, inconsistent, and error-prone setup and turning it into a nice D.R.Y. declaration of what should and shouldn’t be present is tremendously rewarding when you’ve spent the last 6 months trying to fix it by hand (a process akin to painting the Forth Bridge, only without the views)

There’s also something very rewarding about the feedback cycle of checking in a change, sitting back, watching the emails come in as each box updates itself, and then watching nagios magically update as the new services come online.

Things that are not so great about puppet:

  • The difficulty of specifying very big dependency graphs; it’s easy to end up with resources that have hideously long “require[File[“foo],Service[“bar”],User[“bang”]...]” lines. This should get a lot easier in the next release of puppet, though, when it will be fairly straightforward to require whole classes.
  • The data in the storedconfigs database is an awesome resource (it’s the kind of CMDB that an ITIL-fanatic would sell their grandmother for) but the puppetshow front-end is, well, a bit sad, really – it seems to have been half-developed and then abandoned. A shiny new web app is apparently in development, though, so hopefully this too should be a transient problem
  • The ability to reference other SCMs for files will be a big win when it arrives, because it means that puppet will be able to hook into our developers’ CVS and SVN repos, allowing them to update bits of app config and have puppet deploy them automatically, without their needing to get to grips with how the whole of puppet works, or to have access to the puppet manifests.
  • I worry that some of the more “elegant”* bits of config that I’m generating will turn out to be a bit write-only when other people come to start using them. Hopefully at least some of this risk will go away if and when the proposed central module repository gets off the ground – at that point, I can ditch my own hand-rolled modules for something centrally maintained.

* read “byzantine”

Over the next few months, I hope to get all of our systems puppetized to at least some degree – even if it’s just installing the daemon on each one so that we can roll out changes more easily in the future. And I also expect that 4 or 5 of our bigger apps will get to the point where all of their infrastructure requirements are managed by puppet. It should be an interesting process :-)


- No comments Not publicly viewable


Add a comment

You are not allowed to comment on this entry as it has restricted commenting permissions.

Trackbacks

Most recent entries

Loading…

Search this blog

on twitter...


    Tags

    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMXII