June 16, 2011

Partial and Curried Functions

On with the scala show! Partially-applied functions and curried functions took me a while to get my head around, but really they’re quite simple, and partials at least are super-useful.

Partially-applied functions are just functions where you pre-bind one of the parameters. e.g.

scala> val message:(String,String,String)=>String = "Dear " + _ +", " + _ + " from " + _
message: (String, String, String) => String = <function3>

scala> message("Alice","hello","Bob")
res24: String = Dear Alice, hello from Bob

scala> val helloMessage=message(_:String,"hello",_:String)
helloMessage: (String, String) => String = <function2>

scala> helloMessage("Alice","Bob")
res25: String = Dear Alice, hello from Bob

scala> val aliceToBobMessage=message("Alice",_:String,"Bob")
aliceToBobMessage: (String) => String = <function1>

scala> aliceToBobMessage("greetings")
res27: String = Dear Alice, greetings from Bob

What’s happening here is reasonably self-explanatory. We create a function “message” which takes 3 parameters, then we create two more functions, “helloMessage” and “aliceToBobMessage” which just are just aliases to the “message” function with some of the parameters pre-filled.

Since functions are first-class objects that you can pass around just like anything else, this means you can do stuff like

scala> val times:(Int,Int)=>Int = _*_
times: (Int, Int) => Int = <function2>

scala> val times2=times(2,_:Int)
times2: (Int) => Int = <function1>

scala> (1 to 10) map times2
res38: scala.collection.immutable.IndexedSeq[Int] = Vector(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)

Currying is, at first sight, just a different (more complicated) way to achieve the same thing:

scala> val message:(String)=>(String)=>(String)=>String = (to:String)=>(message:String)=>(from:String)=>{ "Dear " + to +", " + message + " from " + from}
message: (String) => (String) => (String) => String = <function1>

scala> message("Alice")("Hello")("Bob")
res28: String = Dear Alice, Hello from Bob

scala> val aliceToBobMessage=message("Alice")(_:String)("Bob")aliceToBobMessage: (String) => String = <function1>

scala> aliceToBobMessage("greetings")res29: String = Dear Alice, greetings from Bob

What that giant line of verbiage at the start is doing, is creating a function which takes one string and returns a function that takes another string, which returns a function that takes another string, which returns a string. Phew.
This mapping from a function that takes n parameters, to a chain of n functions that each take 1 parameter, is known as “currying” (After Haskell Curry). Why on earth would you want to do this, rather than the much simpler partial application example above?
Several of the core scala API methods use curried functions – for example foldLeft in the collections classes

def foldLeft[B](z: B)(op: (B, A) => B): B = 
        foldl(0, length, z, op)

- here we’re exposing a curried function, and then proxying on to a non-curried implementation. Why do it like this?

It turns out that curried functions have an advantage (see update below) that partial ones dont. If I curry a function, and then bind some variables to form what is effectively the same as a partial function, I can also modify the type of the remaining unbound parameters. Example required!

scala> val listMunger:(String)=>(List[Any])=>String = (prefix:String)=>(contents:List[Any])=>prefix + (contents.reduce(_.toString +_.toString))
listMunger: (String) => (List[Any]) => String = <function1>

scala> val ints = List(1,2,3)
ints: List[Int] = List(1, 2, 3)

scala> listMunger("ints")(ints)
res31: String = ints123

scala> val strings = List("a","b","c")
strings: List[java.lang.String] = List(a, b, c)

scala> listMunger("strings")(strings)
res32: String = stringsabc

scala> val stringListMunger=listMunger("strings")(_:List[String])
stringListMunger: (List[String]) => String = <function1>

scala> stringListMunger(strings)
res33: String = stringsabc

scala> stringListMunger(ints)
<console>:11: error: type mismatch;
 found   : List[Int]
 required: List[String]
       stringListMunger(ints)

That last error is the important bit. We’ve specialized the parameter type for the unbound parameter in stringListMunger so it won’t accept a list of anything other than Strings. Note that you can’t arbitrarily re-assign the type; it has to be a subtype of the original (otherwise the implementation might fail).
OK; so now all I have to do is think of a real-world example where this would be useful…

Update Gah, I was wrong. You can do exactly the same type-specialization with a partial:

scala> val x:(Int,List[Any])=>Int = (_,_)=>1
x: (Int, List[Any]) => Int = <function2>

scala> val y:(List[Int])=>Int = x(1,_)
y: (List[Int]) => Int = <function1>

So now I still have no idea when you’d want to curry a function, rather than just leaving it with multiple arguments and partially applying when required. This blog entry suggests that it really exists to support languages like OCaml or Haskel that only allow one parameter per function – so maybe it’s only in scala to allow people to use that style if they like it. But then what’s it doing in the APIs ?


June 11, 2011

Further Scala: Implicit conversions

My attempts to learn scala continue…

Implicit conversions are a cool, if slightly unsettling (from a java programmers POV) scala feature. If I have an instance of one class, and I try and call a method on it which is defined in a different class, then if there’s an “implicit” method in scope which will convert between the two, scala will silently use it.
e.g.

scala> var x = 12
x: Int = 12

scala> x.substring(1,2)
<console>:9: error: value substring is not a member of Int
       x.substring(1,2)

scala> implicit def foo(i:Int):String={i.toString}
foo: (i: Int)String

scala> 12.substring(1,2)
res10: java.lang.String = 2

WITCHCRAFT! BURN THE COMPILER!

This lends itself to a very very useful trick; the ability to enhance classes with additional methods. Say you had a java Map class, and you wanted the ability to merge it with another Map according to some sort of merge function on the values. You’d probably do it like this:

class MergeableMap implements Map{

public MergeableMap(Map delegate){
 this.delegate = delegate
}

public Map merge(Map otherMap, ValueMergingFunction mergeFunction){
 ....
}

... delegate implementations of all Map methods here...
}

Trouble is, (a) writing all the delegate methods is tedious, and(b) every time you want to use it, you’ve got to do

MergeableMap m = new MergeableMap(myMapVariable)
m.merge(myOtherMap,...)

Implicits in scala make this a lot easier:

class MergeableMap[A, B](self: Map[A, B]) {
  def merge(m1, merger): Map[A, B] = {
... implementation here...
  }
}

implicit def map2mergeableMap[A,B](m:Map[A,B]):MergeableMap[A,B] = new MergeableMap(m)

myMap.merge(myOtherMap, myMergeFunction)
myMap.get(...)

there’s no need to implement the other delegate methods, since we can just call them on the original Map class – but when we call merge() compiler-based voodoo works out that we want a mergeable map, and swaps it in for us. Magical.


June 09, 2011

Long Ranges in Scala

So, following on from yesterday’s prime-factoring annoyance; a more basic requirement: How to iterate from 1 to ~10^17? (i.e. close to the limit of Long)

A Range won’t cut it…

scala> val big_long=12345678901234567L
big_long: Long = 12345678901234567

scala> (1L to big_long)
java.lang.IllegalArgumentException: 1 to 12345678901234567 by 1: seqs cannot contain more than Int.MaxValue elements.

How about a BigIntRange?

scala> (BigInt(1) to BigInt(big_long))
java.lang.IllegalArgumentException: 1 to 12345678901234567 by 1: seqs cannot contain more than Int.MaxValue elements.
    at scala.collection.immutable.NumericRange$.count(NumericRange.scala:229)
    

OK, how about a stream?

scala>  lazy val naturals: Stream[Long] = Stream.cons(1, naturals.map(_ + 1))
naturals: Stream[Long] = <lazy>
scala>  naturals.takeWhile(_<big_long).find(_ == -1L)
Exception in thread "Poller SunPKCS11-Darwin" java.lang.OutOfMemoryError: Java heap space

hmm… running out of options a bit now. Maybe if I could construct a Stream without using the map() call (since I suspect that’s what’s eating up the heap), or use for-comprehension with an efficient generator?
Or…hang on…

scala> var i=0L
i: Long = 0
scala> while (i < big_long){i = i+1L}
{...time passes...exceptions are not thrown...}

So, it turns out that “the java way” works. Which, I guess, is the benefit of a mixed language; you can always fall back to the tried-and-tested solutions if the clever bits fail. And of course, you can hide the clunkiness pretty well:

 object LongIter extends Iterator[Long]{
    var l:Long = 0
    def hasNext:Boolean={true}
    def next:Long={
      l=l+1L
      l
    }
  }
  object LongIterable extends Iterable[Long]{
    val iter = LongIter
    def iterator = {iter}
  }

//ugliness ends, now we can pretend that LongIterable was there all along...

   LongIterable.find(_ == 12345678901234567L)

I suspect that this doesn’t really conform to the iterator/iterable contract (obviously, hasNext isn’t implemented), but it does appear to work tolerably well. Well enough, in fact, that I’m surprised I’ve not yet found a more idiomatic syntax for creating an iterator whose next() function is some arbitrary function on the previous value.

...reads source of Iterator ...

  Iterator.iterate(0L)(_+1L).find(_ == 123456701234567L)

phew. Finally.


June 08, 2011

Learning Scala, a step at a time

So, I have decided to make a more serious effort to learn scala . It fits almost all of my ‘ideal language’ features; it’s object-oriented and it’s statically typed (like java), it’s reasonably terse (unlike java) and it has good support for a more functional style of programming, without forcing you to use it all the time. It also runs on the JVM, which is a good thing if you care about observability, manageability, and all the other stuff that ops people care about.

The downsides, as far as I’ve found so far, are that (a) the tooling is nothing like as good as it is for java. The Eclipse Scala plugin appears to be the best the current bunch, but even there, support for things like refactoring (which is kind of the point of a statically-typed language) is pretty sketchy. And (b) it’s a bit of an unknown quantity in production. Yes, there are some big, high-scale users (Twitter spring to mind), but they’re the kind that can afford to have the best engineers in the world. They could make anything scale. Maybe scala will scale just as well in a more “average” operational environment, but there doesn’t seem to be a huge amount of evidence one way or another at the moment, which may make it a hard sell to a more conservative operations group.

On with the show. The best resources I’ve found so far for learning the language are:

- Daniel Spiewak’s series Scala for Java Refugees . Absolutely brilliant series of articles that do exactly what they say on the tin – put scala firmly into the domain of existing java programmers. Which brings me to….

- Tony Morris’ Scala Exercises for beginners. I had a hard time with these, not so much because they were difficult, but because coming from a non-comp-sci background, I found it hard to motivate myself to complete exercises as dry as “implement add(x: Int, y: Int) without using the ’+’ operator”. I also found the slightly evangelical tone of some of the other blog entries (TL;DR:”All you java proles are dumb”) a bit annoying. But working through the exercises undoubtedly improved my understanding of Scala’s support for FP, so it was worth it. On the Intermediate exercises though, I don’t even know where to start

- Akka Akka is an actor framework for scala and java, so it’s not really a scala resource per se but, it does such a great job of explaining actor-based concurrency that it’s worth reading though and playing with, because you’ll come away with a much clearer idea of how to use either Akka or Scala’s own Actors.

- Project Euler – A series of maths-based programming challenges – find the prime factors of a number, sum the even fibonaci numbers less than a million, and so on. Being maths-ish, they’re a good match for a functional style of programming, so I find it helpful to start by just hacking together a java-ish solution, and then seeing how much smaller and more elegant I can make it. It’s a great way to explore both the syntax, and the Collections APIs, which are at the heart of most of scala’s FP support. It’s also a nice way to learn ScalaTest (the unit testing framework for scala), since my hack-and-refactor approach is reliant on having a reasonable set of tests to catch my misunderstandings.
It’s also exposed some of my concerns about the operational readiness of scala. I came up with what I thought was a fairly neat implementation of a prime-factoring function; obtain the lowest factor of a number (ipso facto prime), divide the original by that number, and repeat until you can’t factor any further:

 lazy val naturals: Stream[Long] = Stream.cons(1, naturals.map(_ + 1))

  def highestFactor(compound: Long): Long = {
   naturals.drop(1).takeWhile(_ < compound/ 2).find(
        (compound % _ == 0)).map(
            highestFactor(compound/_)).getOrElse(compound)
  }

(source)
– which works beautifully for int-sized numbers, but give it a prime towards the upper bounds of a Long and it runs out of heap space in the initial takeWhile (before it’s even had a chance to start recursively calling itself). It seems that Stream.takeWhile doesn’t permit the taken elements to be garbage-collected, which is counter-intuitive. I wouldn’t be at all surprised to find that I’m Doing It Wrong, but then again, shouldn’t the language be trying hard to stop me?


Java String.substring() heap leaks

Here’s an interesting little feature that we ran into a short while ago…

Suppose I have something like a Map which will exist for a long time (say, an in-memory cache), and a large, but short-lived String. I want to extract a small piece of text from that long string, and use it as a key in my map

Map<String,Integer> longLivedMap = ...
String veryLongString = ...
String shortString = veryLongString.subString(5,3);
longLivedMap.put(shortString,123);

Question: How much heap space have we just consumed by adding “abc”=>123 into our map? You might think that it would be just a handful of bytes – the 3-character String, the Integer, plus the overhead for the types. But you would be entirely wrong. Java Strings are backed by char arrays, and whilst the String.subString() method returns a new String, it is backed by the same char array as the originating String. So now the entire veryLongString char[] has a long-lived reference and can’t be garbage collected, even though only 3 chars of it are actually accessible. Rapid heap-exhaustion, coming right up!

The solution is pretty straightforward; if you want to hold a long-lived reference to a string, call new String(string) first. Something like

String shortString = veryLongString.subString(5,3);
longLivedMap.put(new String(shortString),123);

It would be counter-productive to do this on every substring, since most of the time the substring and the original source will go out of scope at the same time, so sharing the underlying char[] is a sensible way to reduce overhead.

Since you (hopefully) won’t have many separate places in your code where you’re storing long-lived references like this (just cache implementations of one sort or another), you can create the new strings inside the implementation of those classes instead. Now your caches are all light, and your heap is all right. Good Night.


March 10, 2011

QCon day 1

A lot of good stuff today, but I’m just going to jot down a couple of my favourite points:

Craig Larman talked about massive-scale Scrum-based projects. I don’t suppose I’m going
to be running (or even part of) at 1500-person dev team very much, but some of his points
are applicable at any scale:
  • The only job title on the team is “Team Member”. There might be people with specialist skills,
    but no-one can say “it’s not my job to fix that”
  • If you don’t align your dev. teams’ organisation with your customers, then Conway’s law means your
    architecture will not align with your customers either, and you won’t be able to react when their needs
    change
  • Don’t have management-led tranfoormation projects. How will you know when they’re done? Instead,
    management’s role is just to remove impediments that the dev team runs up against – the “servant-leader”
    model
Dan North spoke about how he moved from what he thought was a pretty cutting edge, agile environment
Thoughtworks, consulting to large organisations starting to become leaner/more agile) to a really agile
environment (DRW, releasing trading software 10s of times per day), and how if you have a team that
is technically strong, empowered, and embedded in the domain (i.e. really close to the users), you can do
away with many of the traditional rules of Agile. A couple that really struck me were:
  • Assume your code has a half-life. Don’t be afraid to just rewrite it, or bin it The stuff that stays in
    can get better over time, but it doesn’t have to be perfect from day 1
  • Don’t get emotionally attached to the software you create. Get attached to the capabilities you enable
    for your users
  • Remeber, anything is better than nothing.
Juergen Hoeller talked about what’s new in Spring 3.1. No amazing surprises here, but some nice stuff:
  • Environment-specific beans – avoid having to munge together different config files for system-test vs.
    pre-production vs. live, have a single context with everything defined in it (Even nicer, arguably, when you
    do it via Java Config and the @environment annotation)
  • c: namespace for constructor args. Tasty syntactic sugar for your XML, and the hackery they had to go through
    to get it to work is impressive (and explains why it wasn’t there from the start)
  • @cacheable annotation, with bindings for EHCache and GemFire (not for memcached yet, which is a bit of a surprise)
Liz Keogh talked about perverse incentives. Any time you have a gap between the perceived value that a metric
measures, and the actual value that you want to create, you make an environment where arbitrage can occur. People can’t
help but take advantage of the gap, even when they know at some level that they’re doing the “wrong thing”.
  • Focus on the best-performing parts of the organisation as well as the worst-performing. Don’t just say “This
    project failed; what went wrong”; make sure you also say “This project succeeded better than all the others; what went right?”
  • Don’t try and create solutions to organisation problems, or you’ll inevitably make perverse incentives. Instead, make
    make Systems (Systems-thinking, not computer programs) that allow those solutions to arise.
Chris Read and Dan North talked about Agile operations. Surprisingly for me, there wasn’t a great deal of novel stuff
here, but there were a couple of interesting points:
  • Apply an XP-ish approach to your organisational/process issues: Pick the single biggest drag on you delivering value and
    do the simplest thing to fix it. Then iterate.
  • Fast, reliable deploys are a big force multiplier for development. If you can deploy really fast, with low risk,
    then you’ll do it more often, get feedback faster, allow more experimentation, and generally waste less. The stuff that Dan
    and Chris work on (trading data) gets deployed straight from dev workstations to production in seconds; automated testing
    happens post-deploy

Yoav Landman talked about Module repositories. Alas, this was not the session I had hoped for; I was hoping for some
takeaways that we could apply to make our own build processes better, but this was really just a big plug for Artifactory,
which looks quite nice but really seems to solve a bunch of problems that I don’t run into on a daily basis. I’ve never needed
to care about providing fine-grained LDAP authorisation to our binary repo, nor to track exactly which version of hibernate was
used to build my app 2 years ago. The one problem I do have in this space (find every app which uses httpclient v3.0,
upgrade, and test it) is made somewhat easier by a tool like Artifactory, but that problem very rarely crops up, so it
doesn’t seem worth the effort of installing a repo manager to solve it. Also it doesn’t integrate with any SCM except Subversion,
which makes it pretty useless for us.


March 08, 2011

"Designing Software, Drawing Pictures

Not a huge amount of new stuff in this session, but a couple of useful things:

The goal of architecture: in particular up-front architecture, is first and foremost to communicate the vision for the system, and secondly to reduce the risk of poor design decisions having expensive consequences.

The Context->Container->Component diagram hierarchy:

The Context diagram shows a system, and the other systems with which it interacts (i.e. the context in which the system operates). It makes no attempt to detail the internal structure of any of the systems, and does not specify any particular technologies. It may contain high-level information about the interfaces or contracts between systems, if
appropriate.

The Container diagram introduces a new abstraction, the Container, which is a logical unit that might correspond to an application server, (J)VM, databases, or other well-isolated element of a system. The container diagram shows the containers within the system, as well as those immediately outside it (from the context diagram), and details the
communication paths, data flows, and dependencies between them.

The Component diagram looks within each container at the individual components, and outlines the responsibilties of each. At the component
level, techniques such as state/activity diagrams start to become useful
in exploring the dynamic behaviour of the system.

(There’s a fourth level of decomposition, the class diagram, at which we start to look at the high-level classes that make up a component, but I’m not sure I really regard this as an architectural concern)

The rule-of-thumb for what is and what isn’t architecture:

All architecture is design, but design is only architecture if it’s costly to change, poorly understood, or high risk. Of course, this means that “the architecture” is a moving target; if we can reduce the cost of change, develop a better understanding, and reduce the risk of an element then it can cease to be architecture any more and simply become part of the design.


March 07, 2011

Five things to take away from Nat Pryce and Steve Freeman's "TDD at the system scale" talk

  • When you run your system tests, build as much as possible of the environment from scratch.
    At the very least, build and deploy the app, and clear out the database before each run
  • For testing assemblies that include an asynchronous component, you want to wrap
    your assertions in a function that will repeatedly “probe” for the state you want
    until either it finds it, or it times out. Something like this
       doSomethingAsync();
       probe(interval,timeout,aMatcher,anotherMatcher...);    

    Wrap the probe() function into a separate class that has access to the objects you
    want to probe to simplify things.

  • Don’t use the logging APIs directly for anything except low-level debug() messages, and maybe
    not even then. Instead, have a “Monitoring” topic, and push structured messages/objects onto
    that queue. Then you can separate out production of the messages from routing, handling, and
    persisting them. You can also have your system tests hook into these messages to detect hard-to-observe state changes
  • For system tests, build a “System Driver” that can act as a facade to the real system, giving
    test classes easy access to a properly-initialised test environment – managing the creation and
    cleanup of test data, access to monitoring queues, wrappers for probes, etc.
  • We really need to start using a proper queueing provider

February 24, 2011

Solaris IGB driver LSO latency.

Yes, it’s another google-bait entry, because I found it really difficult to find any useful information about this online. Hopefully it’ll help someone else find a solution faster than I did.

We migrated one of our applications from an old Sun V40z server, to a newer X4270. The migration went very smoothly, and the app (which is CPU-bound) was noticeably faster on the shiny new server. All good.

Except, that when I checked nagios, to see what the performance improvement looked like, I saw that every request to the server was taking exactly 3.4 seconds. Pingdom said the same thing, but a simple “time curl …” for the same URL came back in about 20 milliseconds. What gives?
More curiously still, if I changed the URL to one that didn’t return very much content, then the delay went away. Only a page that had more than a few KBs worth of HTML would reveal the problem

Running “strace” on the nagios check_http command line showed the client receiving all of the data, but then just hanging for a while on the last read(). The apache log showed the request completing in 0 seconds (and the log line was printed as soon as the command was executed).
A wireshark trace, though, showed a 3-second gap between packets at the end of the conversation:

23    0.028612    137.205.194.43    137.205.243.76    TCP    46690 > http [ACK] Seq=121 Ack=13033 Win=31936 Len=0 TSV=176390626 TSER=899836429
24    3.412081    137.205.243.76    137.205.194.43    TCP    [TCP segment of a reassembled PDU]
25    3.412177    137.205.194.43    137.205.243.76    TCP    46690 > http [ACK] Seq=121 Ack=14481 Win=34816 Len=0 TSV=176391472 TSER=899836768
26    3.412746    137.205.243.76    137.205.194.43    HTTP    HTTP/1.1 200 OK  (text/html)
27    3.412891    137.205.194.43    137.205.243.76    TCP    46690 > http [FIN, ACK] Seq=121 Ack=15517 Win=37696 Len=0 TSV=176391472 TSER=899836768

For comparison, here’s the equivalent packets from a “curl” request for the same URL (which didn’t suffer from any lag)

46    2.056284    137.205.194.43    137.205.243.76    TCP    49927 > http [ACK] Seq=159 Ack=15497 Win=37696 Len=0 TSV=172412227 TSER=898245102
47    2.073105    137.205.194.43    137.205.243.76    TCP    49927 > http [FIN, ACK] Seq=159 Ack=15497 Win=37696 Len=0 TSV=172412231 TSER=898245102
48    2.073361    137.205.243.76    137.205.194.43    TCP    http > 49927 [ACK] Seq=15497 Ack=160 Win=49232 Len=0 TSV=898245104 TSER=172412231
49    2.073414    137.205.243.76    137.205.194.43    TCP    http > 49927 [FIN, ACK] Seq=15497 Ack=160 Win=49232 Len=0 TSV=898245104 TSER=172412231

And now, it’s much more obvious what the problem is. Curl is counting the bytes received from the server, and when it’s got as many as the content-length header said it should expect, the client is closing the connection (packet 47, sending FIN). Nagios, meanwhile, isn’t smart enough to count bytes, so it waits for the server to send a FIN (packet 27), which is delayed by 3-and-a-bit seconds. Apache sends that FIN immediately, but for some reason it doesn’t make it to the client.

Armed with this information, a bit more googling picked up this mailing list entry from a year ago. This describes exactly the same set of symptoms. Apache sends the FIN packet, but it’s caught and buffered by the LSO driver. After a few seconds, the LSO buffer is flushed, the client gets the FIN packet, and everything closes down.
Because LSO is only used for large segments, requesting a page with only a small amount of content doesn’t trigger this behaviour, and we get the FIN immediately.

How to fix? The simplest workaround is to disable LSO:

#  ndd -set /dev/ip ip_lso_outbound 0

(n.b. I’m not sure whether that persists over reboots – it probably needs adding to a file in /kernel/drv somewhere). LSO is beneficial on network-bound servers, but ours isn’t so we’re OK there.

An alternative is to modify the application code to set the TCP PSH flag when closing the connection, but (a) I’m not about to start hacking with apache’s TCP code, (b) it’s not clear to me that this is the right solution, anyway.

A third option, specific to HTTP, is just to use an HTTP client that does it right. Neither nagios nor (it seems) pingdom appear to know how to count bytes and close the connection themselves, but curl does, and so does every browser I’ve tested. So you might just conclude that there’s no need to fix the server itself.


February 16, 2011

Deploying LAMP websites with git and gitosis

Requirement of the day: I’m providing a LAMP environment to some other developers elsewhere in the organisation. They’ll do all the PHP programming, but I’ll look after the server, keep it patched, upgrade it when necessary, and so on.

At some point in the future, we’ll doubtless need to rebuild the server (new OS version, hardware disaster, need to clone it for 10 other developers to each have their own, etc…), so we want as little configuration as possible on the server. Everything should be built from configuration that lives somewhere else.

So, the developers basically need to be able to do 3 things:
– update apache content – PHP code, CSS/JS/HTML, etc.
– update apache VHost config – a rewrite here, a location there. They don’t need to touch the “main” apache config (modules, MPM settings, etc)
– do stuff (load data, run queries, monkey about) with mysql.

Everything else (installation of packages, config files, cron jobs, yada yada) is mine, and is managed by puppet.

So, we decided to use git and gitosis to accomplish this. We’re running on Ubuntu Lucid, but this approach should translate pretty easily to any unix-ish server.

1: Install git and gitosis. Push two repositories – apache-config and apache-htdocs

2: As the gitosis user, clone the apache-config repository into /etc/apache2/git-apache-conf, and the apache-htdocs repository into /usr/share/apache/htdocs/git-apache-htdocs

3: Define a pair of post-receive hooks to update the checkouts when updates are pushed to gitosis.
The htdocs one is simple:

cd /usr/share/apache2/git-apache-htdocs && env -i git pull --rebase

the only gotcha here is that because the GIT_DIR environment variable is set in a post-receive hook, you must unset it with “env -i” before trying to pull, else you’ll get the “fatal: Not a git repository: ’.’” error.

The apache config one is a bit longer but hopefully self-explanatory:

cd /etc/apache2/git-apache-conf && env -i git pull --rebase && sudo /usr/sbin/apache2ctl configtest && sudo /usr/sbin/apache2ctl graceful"

Add a file into /etc/apache/conf.d with the text “Include /etc/apache2/git-apache-conf/*” so that apache picks up the new config.

We run a configtest before restarting apache to get more verbose errors in the event of an invalid config. Unfortunately if the config is broken, then the broken config will stay checked-out – it would be nice (but much more complex) to copy the rest of the config somewhere else, check out the changes in a commit hook, and reject the commit if the syntax is invalid.

And that’s it! make sure that the gitosis user has rights to sudo apachectl, and everything’s taken care of. Except,of course, mysql – developers will still need to furtle in the database, and there’s not much we can do about that except for making sure we have good backups.

You might be wondering why we chose to involve gitosis at all, and why we didn’t just let the developers push directly to the clone repositories in /etc/apache2 and /usr/share/apache2/htdocs. That would have been a perfectly workable approach, but my experience is that in team-based (as opposed to truly decentralised) development, it’s helpful to have a canonical version of each repository somewhere, and it’s helpful if that isn’t on someone’s desktop PC. Gitosis provides that canonical source
Otherwise, one person pushes some changes to test, some more to live, and then goes on holiday for a week. Their team-mate is left trying to understand and merge together test and live with their own, divergent, repo before they can push a two-line fix.
More experienced git users than I might be quite comfortable with this kind of workflow, but to me it still seems a bit scary. My years of CVS abuse mean I like to have a repo I can point to and say “this is the truth, and the whole truth, and nothing but the truth” :-)


Most recent entries

Loading…

Search this blog

on twitter...


    Tags

    RSS2.0 Atom
    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMXXI