Favourite blogs for Connection reset by beer

Chum....ps » All entries

December 10, 2013

Cloud Cannon

Writing about web page http://bigeasyband.co.uk

I was interested to try out Cloud Cannon as a web site development / hosting platform. Using Dropbox as the starting point for your content suits my workflow well, and their extra features - auto-minification, simple editing of DIVs for other people - suit me too. I hope they can make a go of it.


July 26, 2013

Shibboleth IdP authentication context class

Yeah, with a blog title like that you’d better get your brain ready.

We recently had an SP whose authentication attempts were failing, and returning no user. We found the difference in the SAML request was that it was requesting a particular authentication context class, which describes what level of logging in a user might be doing, e.g. username/password; username/password + HTTP; Certificate, etc.

The RemoteUser login handler by default doesn’t specify any class, which means it’ll be used if the SP doesn’t ask for anything in particular, but will never match if it does. No problem; we change the value in handler.xml to PasswordProtectedTransport, since we take passwords over HTTPS. But this particular SP still doesn’t work because it’s requesting simply Password and nothing else. The IdP supports multiple values so we just add that value too – it’s a generalisation of the PasswordProtectedTransport so it’s still true to say that we support both.

That fixed it for the SP, but now when logging in through any other SP, the authentication method in the final SAML response is Password, not PasswordProtectedTransport. This might not bother most SPs, but it’s inaccurate so I’d like it to report the “best” method we provide by default. It doesn’t seem to matter what order I specify them in handler.xml. A bit of searching through the documentation gives the answer:

If the IdP is configured for one or more of the methods requested by the SP, then that method may be used. If the Service Provider does not specify a particular method, and the user does not have an existing session, then the default method identified on the relying party’s configuration is used. If no default is identified there, the IdP will choose one of the available methods; the way this choice is made is unspecified and you should not rely on it being the same from release to release.

- https://wiki.shibboleth.net/confluence/display/SHIB2/IdPUserAuthn

So without any default specified in the relying party, the IdP is just picking at random, probably the first alphabetically. I’ve specified PasswordProtectedTransport in the relying party and that’s fixed that; the final SAML response reports that it used PasswordProtectedTransport as the authentication method.


February 20, 2013

Blogbuilder 3.26 and 3.27

Over the past month we have released 2 new versions of Blogbuilder, with a number of improvements and long-standing bug fixes:

  • You can now choose a fixed width version of any of the existing blog designs, and also easily move the sidebar to the right side of the page, by selecting from the options on the Appearance admin page.
  • We've added the ability to add social sharing buttons to each of your entries, using the "Show 'Like' buttons" option. This will add Facebook Like, Tweet, and Google +1 buttons to the entry.
  • We've adjusted the layout of each blog on small-screen mobile devices for easier reading.
  • And we've fixed issues including:
    • Adding tags to entries on the 'Admin entries' page.
    • Links disappearing in IE8 on the Create entry drop-down.
    • Listing entries by tags with Chinese characters, and untagged entries.
    • Apostrophes in image descriptions causing problems with inserting images.
    • Departments being listed in the wrong faculty in the directory.
    • Blockquotes not being inserted correctly.

February 19, 2013

dfvsdfvsdvsdfvsd


September 05, 2012

this is a test post

Another test. Ho hum.


April 25, 2012

A new era

This is a test post. Test, test, test.


November 13, 2011

New winter kit.

My new, shorter commute means I can now largely do away with my bike-specific kit, and ride in to work in “normal” clothes. Doubtless a relief for those who don’t have to suffer the sight of my lycra-centric outfits any more, but it has required some thought as to exactly what to replace then with.

So I was very pleased when the nice folk at Berghaus asked me to review some outdoor clothing which is just the ticket for a few miles riding through town on an autumn morning. First up, this:
Ardennes softshell
A splendid softshell jacket. It’s warm, windproof, waterproof-ish (the seams aren’t sealed, so I wouldn’t rely on it for a long time in really foul weather, but it’s stood up to 20 minutes or so of steady drizzle just fine). The cut is slightly more relaxed than a dedicated biking jacket (meaning it looks just fine down the pub), but has enough length in the arms and the back to avoid any annoying cold spots when leant over the handlebars. If I were being really picky, I could note that it doesn’t have pit-zips or any other kind of vents, which means it can get a bit wam if you’re working hard (e.g. when racing roadies on the way into work) – so far the weather’s still too warm to wear it for “proper” long rides. It’s also not very reflective, so you need something high-vis to slip over the top now the nights are dark.
But for riding to work it’s been ideal, and it’s also been great for standing on the sidelines of windswept football pitches watching the kids – which at this time of year is a pretty stern test of any fleece!

Item #2 on the list is a new rucksack – specifically, a Terabyte 25l .
Terabyte 25l
As the name suggests, this is optimised for carrying your laptop to work, rather than your ice axes into the northern Cairngorms or your inner tubes round Glentress. It features an impressively solid-feeling padded sleeve which will comfortably take anything from an ipad to a big-ish laptop and hold it securely, as well as a main compartment big enough for a packed lunch and my running kit, and the usual assortment of well-laid-out pockets and attachments. I particularly like the little strap on the back of the pack for attaching a extra bike light to. It’s comfortable even when loaded up, and plenty stable enough to bike with. Highly recommended.


October 28, 2011

Scala: 3 months in

So, I’ve been using scala in production for a few months now. Is it everything I expected it to be?

First, a little bit of background. The project I’m working on is a medium-sized app, with a Scala back-end and a WPF/C# fat client. The interface layer is a set of classes autogenerated from a rubyish DSL, communicating via a combination of client-initiated JSON-RPC over HTTP, and JSON-over-AMQP messages from server to client. It’s split into about half a dozen projects, each with it’s own team. (I should point out that I have little involvement in the architecture at this level. It’s a given; I just implement on top of it.)

The server side is probably a few thousand classes in all, but each team only builds 500 or so classes in its own project. It integrates with a bunch of COTS systems and other in-house legacy apps.

In functionality terms, it’s really fairly standard enterprisey stuff. Nothing is that complicated, but there’s a lot of stuff, and a lot of corner-cases and “oh yeah, we have to do it like that because they work differently in Tokyo” logic.

So, Scala. Language: Generally lovely. Lends itself to elegant solutions that are somehow satisfying to write in a way which java just isn’t. Everything else? Not so much. Here are my top-5 peeves:

  • The compiler is so slow. For a long time, the Fast(er) Scala Compiler simply wouldn’t compile our sources. It threw exceptions (no, it didn’t flag errors, it just died) on perfectly legitimate bits of code. This meant using the slow compiler, which meant that if you changed one line of code and wanted to re-run a test, you had to wait while your 8-core i7 churned for 2-3 minutes rebuilding the whole world. In a long-past life, I cut my teeth writing mainframe cobol with batch compilation. This was like that, all over again. TDD? Not so much. Recently the FSC has improved to the point where it can actually compile code, which is nice, but sometimes it gets confused and just forgets to recompile a class, which is hella confusing when you’re trying to work out why your fix doesn’t seem to be having any effect.
    I would gladly exchange a month of writing WPF automation code (that’s a big sacrifice, for those who haven’t endured it) for the ability to have eclipse’s hot-code-replace, unreliable as it is, working with scala.
  • The IDEs blow. Eclipse + ScalaIDE just plain doesn’t work – apart from choking on our maven config, even if you manually build up the project dependencies, it will randomly refuse to compile classes, highlight errors that don’t exist (highlighting errors on lines that contain no code is a personal favourite), and ignore actual errors. IDEA + Scala Plugin is better – this is now my day-to-day IDE, despite starting out as a dyed-in-the-wool Eclipse fanboy, but it’s slow and clunky, requires vast amounts of heap, and the code completion varies from arbitrary to capricious.
  • Debugging is a pain. Oh yes,
    data.head.zipWithIndex.map((t) => data.map((row: List[Int]) => row(t._2)).reverse)    

feels great when you write it, but just try stepping through that sucker to find out what’s not working. You end up splitting it up into so many temporary variables and function calls that you might as well have just written a big nested for-loop with couple of if-elses in it. Sure, you can take them all out again when you’ve finished, but are you really helping the next guy who’s going to have to maintain that line?

  • The java interop is a mixed blessing. Yes, it’s nice that you have access to all that great java code that’s out there, but the impedance mismatch is always lurking around causing trouble. Of recent note:
    – Scala private scope is not sufficiently similar to java private scope for hibernate field-level access to work reliably
    – Velocity can’t iterate scala collections, nor access scala maps.
    – Mokito can’t mock a method signature with default parameter values.

The list goes on. None of these are blockers in any sense – once you’ve learned about them, they’re all pretty easy to work around or avoid. But each one represents an hour or so’s worth of “WTF”-ing and hair-pulling, and these are just the ones that are fresh enough to remember. Scala-Java interop is a constant stream of these little papercuts.

  • Trivial parallelisation is of minimal value to me. Sure, there are loads of cases where being able to spread a workload out over multiple threads is useful. But, like a great many enterprise apps, this one is largely not CPU-bound, and where it is, it needs to operate on a bunch of persistent objects within the context of a JTA transaction. That means staying on one thread. Additionally, since we’re running on VMWare, we don’t have large numbers of cores to spread onto, so parallelisation wouldn’t buy us more than a factor of 2 or 3 in the best case. In a related vein, immutable classes are all well and good, but JPA is all about mutable entities. There have been a few bits of the design where baked-in immutability has led to a cleaner model, but they’re surprisingly few and far between.

Not that long after I started work on this project, there was a video doing the rounds, showing how Notch, the creator of Minecraft, used Eclipse to create a pretty functional Wolfenstein clone in java in a 48-hour hackathon. Whilst that kind of sustained awesomeness is out of my reach, it illustrates the degree to which the incredibly sophisticated tooling that’s available for Java can compensate for the language’s shortcomings. If you haven’t watched it, I recommend skimming though to get a sense of what it means to have a toolset that’s completely reliable, predictable, and capable of just getting out of the way and letting you create.

If I were starting this project again, I’d swap the elegance and power of the scala language for the flow of java with a good IDE. Even if it did mean having to put up with more FooManagerStrategyFactory classes


September 07, 2011

5 Operations metrics that Java web developers should care about

Sometimes it’s nice, as a developer, to ignore the world outside and focus in on “I’m going to implement what the specification says I should, and anything else is somebody else’s problem”. And sometimes that’s the right thing to do. But if you really want to make your product better, then real production data can provide valuable insights into whether you’re building the right thing, rather than just building the thing right. Operations groups gather this kind of stuff as part of normal business*, so here are a handful of ops. metrics that I’ve found particularly useful from a Java Web Application point of view. Note that I’m assuming here that you as the developer aren’t actually responsible for running the aplicationp in production – rather, you just want the data so you can make better informed decisions about design and functionality.

How should you expose this information? Well, emailing round a weekly summary is definitely the wrong thing to do. The agile concept of “information radiators” applies here: Big Visible Charts showing this kind of data in real time will allow it to seep into the subconscious of everyone who passes by.

Request & Error rates

How many requests per second does your application normally handle? How often do you get a 5-fold burst in traffic? How often do you get a 50-fold burst? How does your application perform when this happens? Knowing this kind of thing allows you to make better decisions about which parts of the application should be optimised, and which don’t need it yet.
Requests per minute graph
Request rates for one node on a medium-scale CMS. Note the variance throughout the day. The 5AM spike is someone’s crazy web crawler spidering more than it ought to.

Error rates – be they counting the number of HTTP 500 responses, or the number of stack traces in your logs, are extremely useful for spotting those edge-cases that you thought should never happen, but actually turn out to be disappointingly common.
Error rates graph
Whoa. What are all those spikes? Better go take a look in the logs…

GC Performance

GC is a very common cause of application slowdowns, but it’s also not unusual to find people blaming GC for problems which are entirely unrelated. Using GC logging, and capturing the results, will allow you to get a feel for what “normal” means in the context of your application, which can help both in identifying GC problems, and also in excluding it from the list of suspects. The most helpful metrics to track, in my experience, are the minimum heap size (which will vary, but should drop down to the same value after each full GC) and the frequency of full GCs (which should be low and constant).

weekly GC performance
A healthy-looking application. Full GCs (the big drops) are happening about one per hour at peak, and although there’s a bit of variance in the minimum heap levels, there’s no noticeable upwards trend

Request duration

Request duration is a fairly standard non-functional requirement for most web applications. What tends not to be measured and tracked quite so well is the variance in request times in production. It’s not much good having an average page-load time of 50 milliseconds if 20% of your requests are taking 10 seconds to load. Facebook credit their focus on minimising variance as a key enabler for their ability to scale to the sizes they have.
Render performance graph
the jitter on this graph gives a reasonable approximation of the variance, though it’s not quite as good as a proper statistical measure. Note that the requests have been split into various different kinds, each of which has different performance characteristics.
smokeping screenshot
Request speed with proper variance overlaid

Worker pool sizes

How many apache processes does your application normally use? How often do you get to within 10% of the apache worker limit? What about tomcat worker threads? Pooled JDBC connections? Asynchronous worker pools? All of these rate-limiting mechanisms need careful observation, or you’re likely to run into hard-to-reproduce performance problems. Simply increasing pool sizes is inefficient, will more likely than not will just move the bottleneck to somewhere else, and will leave you with less protection against request-flooding denial of service attacks (deliberate or otherwise).

Apache workers graph
Apache worker pool, split by worker state. Remember that enabling HTTP keepalive on your front-end makes a huge difference to client performance, but will require a significantly larger pool of connections (most of which will be idle for most of the time)

If you have large numbers of very long-running requests bound by the network speed to your clients (e.g. large file downloads) consider either offloading them to your web tier, using mod_x_sendfile or similar. For long-running requests that are bound by server-side performance, (giant database queries or complex computations), consider making them asynchronous, and having the client poll periodically for status.

Helpdesk call statistics

The primary role of a helpdesk is to ensure that second and third-tier support isn’t overwhelmed by the volume of calls coming in. And many helpdesks measure performance in terms of the number of calls which agents are able to close without escalation. This can sometimes create a perverse incentive, whereby you release a new, buggy version of your software, users begin calling to complain, but the helpdesk simply issue the same workaround instructions over and over again – after all, that’s what they’re paid for. If only you’d known, you could have rolled back, or rolled out a patch there and then, instead of waiting for the fortnightly call volumes meeting to highlight the issue. If you can get near-real-time statistics for the volume of calls (ideally, just those related to your service) you can look for sudden jumps, and ask yourself what might be causing them

Helpdesk call volumes

* “But but… my operations team doesn’t have graphs like this!” you say? Well, if only there were some kind of person who could build such a thing… hold on a minute, that’s you, isn’t it? Building the infrastructure to collect and manage this kind of data is pretty much a solved problem nowadays, and ensuring that it’s all in place really should be part and parcel of the overall development effort for any large-scale web application project. Of course, if operations don’t want to have access to this kind of information, then you have a different problem – which is topic for a whole ‘nother blog post some other time.


August 05, 2011

There's no such thing as bad weather…

...only the wrong clothes. Continuing the camping-kit theme, let’s talk about waterproofs. If you’re going camping in the UK, it’s going to rain sooner or later. There are a few things you can do to make this not be a problem

- a tarp, or gazebo, or event shelter, or other tent-without-a-floor-or-walls, allows you to sit around, cook, and generally be outside without getting rained on or feeling quite as penned-in as sitting inside the tent does. Watch out in high winds though.

- Wet grass will soak trainers and leather boots. Get some wellies, some crocs, or some flip-flops

- An umbrella is pretty handy

- Most importantly, get some decent waterproof clothing . For summer camping, I like a lightweight jacket rather than a big heavy winter gore-tex – drying stuff out in a tent is hard (especially if it’s damp outside), so the less fabric there is to dry, the easier it’ll be. My jacket of choice at the moment is a Montane Atomic DT, but my lovely wife has been testing a The North Face Resolve
It uses a 2-layer construction with a mesh drop liner, dries fast (and the mesh liner means it feels dry even when slightly damp), breathes well, and is slightly fitted so it doesn’t look too much like a giant sack. Packs down nice and small, and, of course, it keeps the rain out. It’s cut fairly short, so if you’re out in the rain for a long time you’ll either need waterproof trousers or a tolerance for wet legs. For 3-season use, I’d say it’s ideal.

Update: We’ve had a bit longer to test the TNF Resolve jacket, and so far (after 3 months) the signs are still good. It’s stood up to some pretty torrential rain without showing any signs of weakness, and I am informed that the colour is excellent (I wouldn’t know about that, obviously). The DWR coating is still working well, which is alway a good sign. The previously-mentioned fitted cut means that it doesn’t flap about when it’s blowy, but it’s not tight or restrictive, even when wearing a fleece underneath. The only shortcoming, which is common to a lot of lightweight jackets, is that the peak on the hood isn’t stiffened at all, so if it’s really windy, it tends to get blown onto your face a bit. This isn’t really an issue unless you’re up in the hills in really foul weather, though, and of course adding a wire to the hood would make the jacket a lot less packable, so for a 3-season jacket it’s a worthwhile trade-off.


August 04, 2011

How to back up your gmail on ubuntu

Quick-and-dirty solution:

1. install python and getmail
2. make a getmail config like this:

[retriever]
type = SimplePOP3SSLRetriever
server = pop.gmail.com
username = your_user_name@gmail.com
password = your_password

[destination]
type = Mboxrd
path = ~/gmail-archive/gmail-backup.mbox

[options]
# print messages about each action (verbose = 2)
# Other options:
# 0 prints only warnings and errors
# 1 prints messages about retrieving and deleting messages only
verbose = 2
message_log = ~/.getmail/gmail.log 

3. enable pop on your gmail account
4. add a cronjob like this:

13,33,53 * * * * /usr/bin/getmail -q -r /path/to/your/getmail/config

You’ll end up with a .mbox file which grows and grows over time. Mine is currently at 5GB. I have no idea whether thunderbird or evolution can open such a big file, but mutt can (use “mutt -f ~/gmail-archive/gmail-backup.mbox -R” , unless you really like waiting whilst mutt rewrites the mbox on each save) , or it’s easy enough to grep through if you just need to find the text of a specific message. If you needed to break it into chunks, you could always just use split, and accept that you’ll lose a message where the split occurs (or use split, and then patch toghether the message that gets split in half.


July 15, 2011

Gnome 3 / gnome–shell on Ubuntu Natty

OK, so I’m not the first to do this by a long chalk, but here’s what worked for me:

1: Install the gnome 3 PPAs (c.f. http://nirajsk.wordpress.com/category/gnome-3/) :

sudo add-apt-repository ppa:gnome3-team/gnome3
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install gnome-shell
sudo apt-get install gnome-shell-extensions-user-theme

For some reason, the default theme (Adwaita) was missing, as was the default font (Cantarell). (Could be because I didn’t install gnome-themes and gnome-themes-extra – see here). You can download Adwaita from here Get Cantarell from here, copy to /usr/share/fonts/ttf, and sudo fc-cache -rv to update.
I wasn’t too keen on the fat titlebars in adwaita, so I used this to shrink them down:

sed -i "/title_vertical_pad/s/value=\"[0-9]\{1,2\}\"/value=\"0\"/g" /usr/share/themes/Adwaita/metacity-1/metacity-theme-3.xml

You can set the theme and the font using gnome-tweak-tool (aptitude install it if it didn’t arrive with gnome-shell). I’m still looking for a nice icon set; the ubuntu unity ones are a bit orange, and the gnome ones are an unispiring shade of khaki. For now I’ve settled on the KDE-inspired Oxygen (aptitude install oxygen-icon-theme) which is OK, but still doesn’t quite look right.

There’s an adwaita theme for chrome which is nice, and makes everything a bit more consistent.

The London Smoke gnome-shell theme is really nice

Switching from pidgin to empathy gets you nice, clickable message notifications, although I’d rather they were at the top of the screen than the bottom.

Aero-style snap-to-window-edge works fine, except that for nvidia graphics cards with TwinView, you can’t snap to the edges that are between two monitors. Right-click context menus look weird in a way that I can’t quite put my finger on, but they function as expected.

Other than that, it pretty much just works. The only glitch I’ve yet to work out is that spawning a new gnome-terminal window freezes the whole UI for a second or so. Not got to the bottom of why that might be yet; if I find it I’ll post something. In gnome 2 there was a similar problem with the nvidia drivers, which could be “solved” by disabling compiz, but that’s not an option here. Update it seems to be connected to using a semi-transparent background for the terminal; if I use a solid background the problem goes away.

Things I like better than Unity, after a week of playing:
– Clickable notifications from empathy. The auto-hiding notification bar at the bottom of the screen is great, although I’ve found it can sometimes be a bit reluctant to come out of hiding when you mouse over it.
– alt-tab that shows you the window title (good when you’ve got 20 terminals open). I like the ability to tab through multiple instances of the same app in order, too. To get alt-shift-tab to work, I used these instructions
– menus on the application. Fitt’s law can go hang; I like my menus to be connected to the window they control


July 06, 2011

Camping time!

It’s summer time, and that means it’s camping season again. Camping is ace, and it’s doubly ace if you have kids. Almost without exception, kids love weaselling about in the countryside, so once you’ve got them onto the campsite, they’ll amuse themselves and you can get on with the serious business of drinking beer, talking rubbish and playing with fires. What could be finer?

There’s a curious kind of Moore’s Law at the moment, as far as tents are concerned.
Every year, technology trickles down from the top-end, so low and mid-range tents get better and better. My first tent, long long ago, was pretty much bottom-of-the-range and cost about £50 (which was a lot of money for a fourteen year old in ninteen-eighty-something). It weighed approximately a tonne, leaked like a sieve, and stood me in good stead for a few years worth of adventures. Now, for half of that price you can get one of these pop up tents pop-up tent

You don’t so much “pitch” it as just take it out of the bag and stand back. It sleeps two close friends, or one with too much gear, it’s pretty waterproof, and if you peg out the guy-ropes it’s surprisingly sturdy in the wind.

Downsides? It doesn’t pack down small, and I’m not sure it would be my first choice for a week in the Lake District in November – in cold, wet conditions the single-skin design means you’ll end up damp from condensation on the inside of the tent – but for summer weekend trips it’s brilliant; fun, easy, and it costs roughly 1/25th as much as an Ultra Quasar (though if you do have £600 to spare, I can highly recommend one of those as an alternative!).

ProTip: If you want to extend the range of weather you can use this in, get a tarp and pitch it over the top of the tent, overhanging the front by a meter or two. You get an extra layer of waterproofing, and a porch so you don’t have to bring your soggy boots into the tent.


one–liner of the day: quick–and–dirty network summary

If you’ve got a solaris box with a load of zones on, you might sometimes have an issue whereby you can see that the box is pushing a lot of traffic over the network, but you’re unsure which zone(s) are responsible. Here’s a super-hacky way to get an overview:

 snoop -c 1000 | awk '{print $1}' | sort | uniq -c | sort -n

basically, catch the first 1000 packets, find the source for each one (assuming most of your traffic is outbound; if it’s inbound then print $3), and then count the distinct hosts (i.e. your zone names) and list them.

If you have a slow nameserver you may want to add “-r” to the snoop command and then map IPs to names afterwards.


June 26, 2011

Scala: fun with Lists

So, I’m slowly working my way though some of the programming exercises at Project Euler as a means of learning scala (and hopefully a little bit of FP at the same time).
Problem #11 is my most recent success, and it’s highlighted a number of interesting points. In brief, the problem provides you with a 20×20 square of 2-digit integers, and asks for the largest product of any 4 adjacent numbers in any direction (up, down, left, right, diagonally). Kind of like a word-search, but with extra maths.

For no better reason than the afore-mentioned desire to get a bit more acquainted with a functional style of programming, I chose to implement it with no mutable state. I’m not sure if that was really sensible or not. It led to a solution that’s almost certainly less efficient, but potentially simpler than some of the other solutions I’ve seen.

So, step one: Give that we’re going to represent the grid as a List[List[Int]], let’s see if we can get the set of 4-element runs going horizontally left-to-right:

    def toQuadList()= {
      def lrQuadsFromLine(line: List[Int], accumulator: List[List[Int]]): List[List[Int]] = {
        if (line.length < 4)
          accumulator
        else
          lrQuadsFromLine(line.tail, accumulator :+ line.slice(0, 4))
      }
      data.flatMap(line => lrQuadsFromLine(line, List.empty))
    }

(“data” is the List[List[Int]] representing the grid). Here we define a recursive function that takes a List[Int], obtains the first 4-number run (line.slice), adds it to an accumulator, and then pops the first item from the list and calls itself with the remainder.
Then we just call flatMap() on the original List-of-Lists to obtain all such quadruples. We could call Map, but that would give us a list of lists of quadruples – flatMap mushes it down into one list.
It would be nice to find a way to make the if/else go away, but other than just transliterating it into a match, I can’t see a way to do it.

Given that list, finding the largest product is trivial.

   def findMaxProduct = data.map((x: List[Int]) => x.reduce(_ * _)).max

- take each Quadruple in turn and transform it into a single number by multiplying each element with the next. Then call .max to find the largest.

So now we can do one of the 6 directions. However, a bit of thought at this point can save us some time: the Right-to-Left result is guaranteed to be the same as the left-to-right, since multiplication doesn’t care about order (the set of quadruples will be the same in each direction). Similarly, the downwards result will be the same as the upwards one.

Next time-saver: Calculating the downwards result is the same as rotating the grid by 90 degrees and then calculating the left-to-right result. So we just need a rotate function, and we get the vertical results for free:

   def rotate90Deg() = {
      data.head.zipWithIndex.map((t) => data.map((row: List[Int]) => row(t._2)).reverse)
    }

Doing this without mutable state took me some pondering, but the solution is reasonably concise. Doing zipWithIndex on an arbitrary row of the grid (I used .head because it’s easy to access) gives us an iterator with an index number in, so we can now build up a List containing the _n_th element from each of the original lists. (The outer map() iterates over the columns in the grid, the inner one over the rows in the grid))

So now we have the horizontal and vertical totals, we need to do the diagonals. It would be nice if we could once again re-use the left-to-right quadruple-finding code, so we need to get the grid into a form where going left to right gets the sequences that would previously have been diagonals. We can do this by offsetting subsequent rows by one each time, the rotating; like this:

1,2,3
4,5,6
7,8,9

?,?,1,2,3
?,4,5,6,?
7,8,9,?,?

7,?,?
8,4,?
9,5,1
?,6,2
?,?,3

You can see that the horizontal sequences are now the original diagonals. Initially, I started using Option[Int] for the items in the grid, so I could use None for the “question marks”. However, after more time than it ought to have taken me, I realised that using zero for those would work perfectly, as any zero in a quadruple will immediately set that quad’s product to zero, thus excluding it from our calculation (which is what we want).
The function to do the offseting is still rather complex, but not too bad (it owes a lot to the recursive toQuadList function above:

    def diagonalize() = {
      def shiftOneRow(rowsRemaining: List[List[Int]], lPad: List[Int], rPad: List[Int], rowsDone: List[List[Int]]): List[List[Int]] = {
        rowsRemaining match {
          case Nil => rowsDone
          case _ => {
            val newRow: List[Int] = lPad ::: rowsRemaining.head ::: rPad
            shiftOneRow(rowsRemaining.tail,
              lPad.tail,
              0 :: rPad,
              rowsDone ::: List(newRow))
          }
        }
      }
      shiftOneRow(data, List.fill(data.size)(0), List.empty, List.empty)
    }

We define a recursive function that takes a list of rows yet to be padded, a list of zeros to pad the left-hand-side with, another for the right-hand-side, and an accumulator of rows already padded. As we loop through, we remove from the remaining rows, add to the done rows, remove from the left-padding, and add to the right padding. It kind of feels like there ought to be a neater way to do this, but I’m not quite sure what yet.

To get the “other” diagonals, just flip the grid left-to-right before diagonalizing it.

Once that’s done, all that remains is to glue it together. Because I’ve been thinking in OO for so long, I chose to wrap this behaviour in a couple of classes; one for the list-of-quadruples, and one for the grid itself. Then I can finish the job with:

 println(
      List(new Grid(data).toQuadList.findMaxProduct,
      new Grid(data).rotate90Deg.toQuadList.findMaxProduct,
      new Grid(data).diagonalize.rotate90Deg.toQuadList.findMaxProduct,
      new Grid(data).flipLR.diagonalize.rotate90Deg.toQuadList.findMaxProduct).max)

June 20, 2011

The one where I fail to go on any exciting expeditions

Writing about web page http://www.gooutdoors.co.uk/thermarest-prolite-small-p147594

So, in the spirit of previous product reviews, this should have been an entry that described various recent feats of derring-do, and casually slipped in a plug for the latest bit of camping equipment that I’ve been sent to review. Unfortunately, the last few weeks have been characterised by some especially foul weather. Last week’s planned camping trip to do the Torq rough ride was abandoned, in favour of an early morning drive down on sunday, followed by three-and-a-half hours of squelching round the “short” route in driving rain.
Then tonight’s Second Annual Summer-Solstice-Mountain-Bike-Bivi-Trip was abandoned when it became apparent that the chances of a scenic sunrise were pretty much zero, whereas the chance of a night in a plastic bag in the rain on a hill were looking close to 100%.

All of which means that my shiny new toy has so far not seen much action outside of the back garden. But it seems churlish not to write something about it; so here goes. It’s a sleeping mat specifically a Thermarest Pro-Lite. Mats might seem like a pretty mundane item, but if you’re trying to travel light, either running or on a bike, then they’re a tricky thing to get right. Too minimalist and you don’t get any sleep, then you feel like crap the next morning. Too heavy, and they start to make a serious contribution to your pack weight, which matters a lot if you’re having to run up a hill or ride down one.
For a long time, my preferred option was a cut-down foam karrimat, which was a reasonable compromise, but suffered from being a bit on the bulky side, and also not terribly warm. I have an old thermarest as well, which is fabulously comfy – great for winter camping, but far too bulky for fast & light summer trips.

There will be some pics here, when it stops raining for long enough… for now, here’s a stock photo…

So, the pro-lite: Point 1; I got the small one; it’s very small. (note the artful perspective on the photo!) If you want something that will reach alll the way down to your toes (or even your knees) this isn’t it; buy the large size. I don’t mind this, though; in the summertime I’m happy with something that just reaches from head-to-thigh. Equally, it’s quite slim. My shoulders are fairly scrawny, and this just-about reaches from one to the other. If you’ve been spending longer in the gym (or the cake shop) than on the track or the turbo, then you might want a bigger size.

Point 2: It is just unbelievably compact. Really. It rolls up into something about the size of a 750ml drink bottle. Foam karimats can’t come near to this. This makes more of a difference than you might think, because it means you can get away with a pack that’s 5L or so smaller (and therefore lighter), and still fit the mat inside (keeping your mat inside your pack is a winning plan, because you can keep it dry). It’s also great if you’re backpacking on a bike, because the smaller your pack, the less it affects the bike’s handling.

Point 3: It’s as light as a foam mat. Unlike my old thermarest, there’s not a lot inside this mat, so it’s super-lightweight. Mine weighed in at a smidge over 300g according to the kitchen scales.

Point 4: Back-garden tests strongly suggest that it’s a lot more comfy than my old foam mat. I’ll report back once it’s stopped raining long enough to try it out for real!

Update #1 : Still not had the chance to take this backpacking, but a car-camping trip confirms that it’s very comfortable indeed – just as good as my old thermarest, though in a big tent you have to be careful not to roll off it in the night!


June 16, 2011

Partial and Curried Functions

On with the scala show! Partially-applied functions and curried functions took me a while to get my head around, but really they’re quite simple, and partials at least are super-useful.

Partially-applied functions are just functions where you pre-bind one of the parameters. e.g.

scala> val message:(String,String,String)=>String = "Dear " + _ +", " + _ + " from " + _
message: (String, String, String) => String = <function3>

scala> message("Alice","hello","Bob")
res24: String = Dear Alice, hello from Bob

scala> val helloMessage=message(_:String,"hello",_:String)
helloMessage: (String, String) => String = <function2>

scala> helloMessage("Alice","Bob")
res25: String = Dear Alice, hello from Bob

scala> val aliceToBobMessage=message("Alice",_:String,"Bob")
aliceToBobMessage: (String) => String = <function1>

scala> aliceToBobMessage("greetings")
res27: String = Dear Alice, greetings from Bob

What’s happening here is reasonably self-explanatory. We create a function “message” which takes 3 parameters, then we create two more functions, “helloMessage” and “aliceToBobMessage” which just are just aliases to the “message” function with some of the parameters pre-filled.

Since functions are first-class objects that you can pass around just like anything else, this means you can do stuff like

scala> val times:(Int,Int)=>Int = _*_
times: (Int, Int) => Int = <function2>

scala> val times2=times(2,_:Int)
times2: (Int) => Int = <function1>

scala> (1 to 10) map times2
res38: scala.collection.immutable.IndexedSeq[Int] = Vector(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)

Currying is, at first sight, just a different (more complicated) way to achieve the same thing:

scala> val message:(String)=>(String)=>(String)=>String = (to:String)=>(message:String)=>(from:String)=>{ "Dear " + to +", " + message + " from " + from}
message: (String) => (String) => (String) => String = <function1>

scala> message("Alice")("Hello")("Bob")
res28: String = Dear Alice, Hello from Bob

scala> val aliceToBobMessage=message("Alice")(_:String)("Bob")aliceToBobMessage: (String) => String = <function1>

scala> aliceToBobMessage("greetings")res29: String = Dear Alice, greetings from Bob

What that giant line of verbiage at the start is doing, is creating a function which takes one string and returns a function that takes another string, which returns a function that takes another string, which returns a string. Phew.
This mapping from a function that takes n parameters, to a chain of n functions that each take 1 parameter, is known as “currying” (After Haskell Curry). Why on earth would you want to do this, rather than the much simpler partial application example above?
Several of the core scala API methods use curried functions – for example foldLeft in the collections classes

def foldLeft[B](z: B)(op: (B, A) => B): B = 
        foldl(0, length, z, op)

- here we’re exposing a curried function, and then proxying on to a non-curried implementation. Why do it like this?

It turns out that curried functions have an advantage (see update below) that partial ones dont. If I curry a function, and then bind some variables to form what is effectively the same as a partial function, I can also modify the type of the remaining unbound parameters. Example required!

scala> val listMunger:(String)=>(List[Any])=>String = (prefix:String)=>(contents:List[Any])=>prefix + (contents.reduce(_.toString +_.toString))
listMunger: (String) => (List[Any]) => String = <function1>

scala> val ints = List(1,2,3)
ints: List[Int] = List(1, 2, 3)

scala> listMunger("ints")(ints)
res31: String = ints123

scala> val strings = List("a","b","c")
strings: List[java.lang.String] = List(a, b, c)

scala> listMunger("strings")(strings)
res32: String = stringsabc

scala> val stringListMunger=listMunger("strings")(_:List[String])
stringListMunger: (List[String]) => String = <function1>

scala> stringListMunger(strings)
res33: String = stringsabc

scala> stringListMunger(ints)
<console>:11: error: type mismatch;
 found   : List[Int]
 required: List[String]
       stringListMunger(ints)

That last error is the important bit. We’ve specialized the parameter type for the unbound parameter in stringListMunger so it won’t accept a list of anything other than Strings. Note that you can’t arbitrarily re-assign the type; it has to be a subtype of the original (otherwise the implementation might fail).
OK; so now all I have to do is think of a real-world example where this would be useful…

Update Gah, I was wrong. You can do exactly the same type-specialization with a partial:

scala> val x:(Int,List[Any])=>Int = (_,_)=>1
x: (Int, List[Any]) => Int = <function2>

scala> val y:(List[Int])=>Int = x(1,_)
y: (List[Int]) => Int = <function1>

So now I still have no idea when you’d want to curry a function, rather than just leaving it with multiple arguments and partially applying when required. This blog entry suggests that it really exists to support languages like OCaml or Haskel that only allow one parameter per function – so maybe it’s only in scala to allow people to use that style if they like it. But then what’s it doing in the APIs ?


June 14, 2011

Blogbuilder 3.25

We've just released a new version of Warwick Blogs (the last one was nearly 2 years ago!) with a number of improvements and bug fixes:

  • We've improved the RSS and Atom feeds from your blogs, and also added JSON support (add ?json=json to the URL, and you can add callback and assign parameters to do JSONP). For example: http://blogs.warwick.ac.uk/news/?json=json
  • We've significantly improved our support for newer web browsers (IE9, Chrome, Firefox 4) and you should find fewer problems using these browsers with Warwick Blogs
  • We've added OAuth support to Warwick Blogs with the following details:
  • Request token: https://websignon.warwick.ac.uk/oauth/requestToken?scope=urn%3Ablogs.warwick.ac.uk%3Ablogbuilder%3Aservice
  • Authorisation: https://websignon.warwick.ac.uk/oauth/authorise
  • Access token: https://websignon.warwick.ac.uk/oauth/accessToken
  • You'll need a consumer key and secret to use OAuth to Warwick Blogs in your own application, you can contact the IT Services Helpdesk (helpdesk@warwick.ac.uk) to request this
  • We've modified the Atom API to allow setting of arbitrary permissions by adding the special elements <blogbuilder:read-permission> and <blogbuilder:comment-permission> - these can be set to webgroups, names of groups on the blog in question, or to the special strings Anyone, Staff, Students or Alumni.
  • We've increased the text limit for the biography and contact details sections of the profile page significantly (32,000 characters)
  • We've added more "Back to Blog Manager" and "Back to my blog" links to the Admin section to make it easier to navigate
  • We've fixed issues with uploading files with spaces in and inserting media into the editor

As always, if you have any problems you can comment below or email the IT Services helpdesk at helpdesk@warwick.ac.uk


June 11, 2011

Further Scala: Implicit conversions

My attempts to learn scala continue…

Implicit conversions are a cool, if slightly unsettling (from a java programmers POV) scala feature. If I have an instance of one class, and I try and call a method on it which is defined in a different class, then if there’s an “implicit” method in scope which will convert between the two, scala will silently use it.
e.g.

scala> var x = 12
x: Int = 12

scala> x.substring(1,2)
<console>:9: error: value substring is not a member of Int
       x.substring(1,2)

scala> implicit def foo(i:Int):String={i.toString}
foo: (i: Int)String

scala> 12.substring(1,2)
res10: java.lang.String = 2

WITCHCRAFT! BURN THE COMPILER!

This lends itself to a very very useful trick; the ability to enhance classes with additional methods. Say you had a java Map class, and you wanted the ability to merge it with another Map according to some sort of merge function on the values. You’d probably do it like this:

class MergeableMap implements Map{

public MergeableMap(Map delegate){
 this.delegate = delegate
}

public Map merge(Map otherMap, ValueMergingFunction mergeFunction){
 ....
}

... delegate implementations of all Map methods here...
}

Trouble is, (a) writing all the delegate methods is tedious, and(b) every time you want to use it, you’ve got to do

MergeableMap m = new MergeableMap(myMapVariable)
m.merge(myOtherMap,...)

Implicits in scala make this a lot easier:

class MergeableMap[A, B](self: Map[A, B]) {
  def merge(m1, merger): Map[A, B] = {
... implementation here...
  }
}

implicit def map2mergeableMap[A,B](m:Map[A,B]):MergeableMap[A,B] = new MergeableMap(m)

myMap.merge(myOtherMap, myMergeFunction)
myMap.get(...)

there’s no need to implement the other delegate methods, since we can just call them on the original Map class – but when we call merge() compiler-based voodoo works out that we want a mergeable map, and swaps it in for us. Magical.


June 09, 2011

Long Ranges in Scala

So, following on from yesterday’s prime-factoring annoyance; a more basic requirement: How to iterate from 1 to ~10^17? (i.e. close to the limit of Long)

A Range won’t cut it…

scala> val big_long=12345678901234567L
big_long: Long = 12345678901234567

scala> (1L to big_long)
java.lang.IllegalArgumentException: 1 to 12345678901234567 by 1: seqs cannot contain more than Int.MaxValue elements.

How about a BigIntRange?

scala> (BigInt(1) to BigInt(big_long))
java.lang.IllegalArgumentException: 1 to 12345678901234567 by 1: seqs cannot contain more than Int.MaxValue elements.
    at scala.collection.immutable.NumericRange$.count(NumericRange.scala:229)
    

OK, how about a stream?

scala>  lazy val naturals: Stream[Long] = Stream.cons(1, naturals.map(_ + 1))
naturals: Stream[Long] = <lazy>
scala>  naturals.takeWhile(_<big_long).find(_ == -1L)
Exception in thread "Poller SunPKCS11-Darwin" java.lang.OutOfMemoryError: Java heap space

hmm… running out of options a bit now. Maybe if I could construct a Stream without using the map() call (since I suspect that’s what’s eating up the heap), or use for-comprehension with an efficient generator?
Or…hang on…

scala> var i=0L
i: Long = 0
scala> while (i < big_long){i = i+1L}
{...time passes...exceptions are not thrown...}

So, it turns out that “the java way” works. Which, I guess, is the benefit of a mixed language; you can always fall back to the tried-and-tested solutions if the clever bits fail. And of course, you can hide the clunkiness pretty well:

 object LongIter extends Iterator[Long]{
    var l:Long = 0
    def hasNext:Boolean={true}
    def next:Long={
      l=l+1L
      l
    }
  }
  object LongIterable extends Iterable[Long]{
    val iter = LongIter
    def iterator = {iter}
  }

//ugliness ends, now we can pretend that LongIterable was there all along...

   LongIterable.find(_ == 12345678901234567L)

I suspect that this doesn’t really conform to the iterator/iterable contract (obviously, hasNext isn’t implemented), but it does appear to work tolerably well. Well enough, in fact, that I’m surprised I’ve not yet found a more idiomatic syntax for creating an iterator whose next() function is some arbitrary function on the previous value.

...reads source of Iterator ...

  Iterator.iterate(0L)(_+1L).find(_ == 123456701234567L)

phew. Finally.


June 08, 2011

Learning Scala, a step at a time

So, I have decided to make a more serious effort to learn scala . It fits almost all of my ‘ideal language’ features; it’s object-oriented and it’s statically typed (like java), it’s reasonably terse (unlike java) and it has good support for a more functional style of programming, without forcing you to use it all the time. It also runs on the JVM, which is a good thing if you care about observability, manageability, and all the other stuff that ops people care about.

The downsides, as far as I’ve found so far, are that (a) the tooling is nothing like as good as it is for java. The Eclipse Scala plugin appears to be the best the current bunch, but even there, support for things like refactoring (which is kind of the point of a statically-typed language) is pretty sketchy. And (b) it’s a bit of an unknown quantity in production. Yes, there are some big, high-scale users (Twitter spring to mind), but they’re the kind that can afford to have the best engineers in the world. They could make anything scale. Maybe scala will scale just as well in a more “average” operational environment, but there doesn’t seem to be a huge amount of evidence one way or another at the moment, which may make it a hard sell to a more conservative operations group.

On with the show. The best resources I’ve found so far for learning the language are:

- Daniel Spiewak’s series Scala for Java Refugees . Absolutely brilliant series of articles that do exactly what they say on the tin – put scala firmly into the domain of existing java programmers. Which brings me to….

- Tony Morris’ Scala Exercises for beginners. I had a hard time with these, not so much because they were difficult, but because coming from a non-comp-sci background, I found it hard to motivate myself to complete exercises as dry as “implement add(x: Int, y: Int) without using the ’+’ operator”. I also found the slightly evangelical tone of some of the other blog entries (TL;DR:”All you java proles are dumb”) a bit annoying. But working through the exercises undoubtedly improved my understanding of Scala’s support for FP, so it was worth it. On the Intermediate exercises though, I don’t even know where to start

- Akka Akka is an actor framework for scala and java, so it’s not really a scala resource per se but, it does such a great job of explaining actor-based concurrency that it’s worth reading though and playing with, because you’ll come away with a much clearer idea of how to use either Akka or Scala’s own Actors.

- Project Euler – A series of maths-based programming challenges – find the prime factors of a number, sum the even fibonaci numbers less than a million, and so on. Being maths-ish, they’re a good match for a functional style of programming, so I find it helpful to start by just hacking together a java-ish solution, and then seeing how much smaller and more elegant I can make it. It’s a great way to explore both the syntax, and the Collections APIs, which are at the heart of most of scala’s FP support. It’s also a nice way to learn ScalaTest (the unit testing framework for scala), since my hack-and-refactor approach is reliant on having a reasonable set of tests to catch my misunderstandings.
It’s also exposed some of my concerns about the operational readiness of scala. I came up with what I thought was a fairly neat implementation of a prime-factoring function; obtain the lowest factor of a number (ipso facto prime), divide the original by that number, and repeat until you can’t factor any further:

 lazy val naturals: Stream[Long] = Stream.cons(1, naturals.map(_ + 1))

  def highestFactor(compound: Long): Long = {
   naturals.drop(1).takeWhile(_ < compound/ 2).find(
        (compound % _ == 0)).map(
            highestFactor(compound/_)).getOrElse(compound)
  }

(source)
– which works beautifully for int-sized numbers, but give it a prime towards the upper bounds of a Long and it runs out of heap space in the initial takeWhile (before it’s even had a chance to start recursively calling itself). It seems that Stream.takeWhile doesn’t permit the taken elements to be garbage-collected, which is counter-intuitive. I wouldn’t be at all surprised to find that I’m Doing It Wrong, but then again, shouldn’t the language be trying hard to stop me?


Java String.substring() heap leaks

Here’s an interesting little feature that we ran into a short while ago…

Suppose I have something like a Map which will exist for a long time (say, an in-memory cache), and a large, but short-lived String. I want to extract a small piece of text from that long string, and use it as a key in my map

Map<String,Integer> longLivedMap = ...
String veryLongString = ...
String shortString = veryLongString.subString(5,3);
longLivedMap.put(shortString,123);

Question: How much heap space have we just consumed by adding “abc”=>123 into our map? You might think that it would be just a handful of bytes – the 3-character String, the Integer, plus the overhead for the types. But you would be entirely wrong. Java Strings are backed by char arrays, and whilst the String.subString() method returns a new String, it is backed by the same char array as the originating String. So now the entire veryLongString char[] has a long-lived reference and can’t be garbage collected, even though only 3 chars of it are actually accessible. Rapid heap-exhaustion, coming right up!

The solution is pretty straightforward; if you want to hold a long-lived reference to a string, call new String(string) first. Something like

String shortString = veryLongString.subString(5,3);
longLivedMap.put(new String(shortString),123);

It would be counter-productive to do this on every substring, since most of the time the substring and the original source will go out of scope at the same time, so sharing the underlying char[] is a sensible way to reduce overhead.

Since you (hopefully) won’t have many separate places in your code where you’re storing long-lived references like this (just cache implementations of one sort or another), you can create the new strings inside the implementation of those classes instead. Now your caches are all light, and your heap is all right. Good Night.


March 10, 2011

QCon day 1

A lot of good stuff today, but I’m just going to jot down a couple of my favourite points:

Craig Larman talked about massive-scale Scrum-based projects. I don’t suppose I’m going
to be running (or even part of) at 1500-person dev team very much, but some of his points
are applicable at any scale:
  • The only job title on the team is “Team Member”. There might be people with specialist skills,
    but no-one can say “it’s not my job to fix that”
  • If you don’t align your dev. teams’ organisation with your customers, then Conway’s law means your
    architecture will not align with your customers either, and you won’t be able to react when their needs
    change
  • Don’t have management-led tranfoormation projects. How will you know when they’re done? Instead,
    management’s role is just to remove impediments that the dev team runs up against – the “servant-leader”
    model
Dan North spoke about how he moved from what he thought was a pretty cutting edge, agile environment
Thoughtworks, consulting to large organisations starting to become leaner/more agile) to a really agile
environment (DRW, releasing trading software 10s of times per day), and how if you have a team that
is technically strong, empowered, and embedded in the domain (i.e. really close to the users), you can do
away with many of the traditional rules of Agile. A couple that really struck me were:
  • Assume your code has a half-life. Don’t be afraid to just rewrite it, or bin it The stuff that stays in
    can get better over time, but it doesn’t have to be perfect from day 1
  • Don’t get emotionally attached to the software you create. Get attached to the capabilities you enable
    for your users
  • Remeber, anything is better than nothing.
Juergen Hoeller talked about what’s new in Spring 3.1. No amazing surprises here, but some nice stuff:
  • Environment-specific beans – avoid having to munge together different config files for system-test vs.
    pre-production vs. live, have a single context with everything defined in it (Even nicer, arguably, when you
    do it via Java Config and the @environment annotation)
  • c: namespace for constructor args. Tasty syntactic sugar for your XML, and the hackery they had to go through
    to get it to work is impressive (and explains why it wasn’t there from the start)
  • @cacheable annotation, with bindings for EHCache and GemFire (not for memcached yet, which is a bit of a surprise)
Liz Keogh talked about perverse incentives. Any time you have a gap between the perceived value that a metric
measures, and the actual value that you want to create, you make an environment where arbitrage can occur. People can’t
help but take advantage of the gap, even when they know at some level that they’re doing the “wrong thing”.
  • Focus on the best-performing parts of the organisation as well as the worst-performing. Don’t just say “This
    project failed; what went wrong”; make sure you also say “This project succeeded better than all the others; what went right?”
  • Don’t try and create solutions to organisation problems, or you’ll inevitably make perverse incentives. Instead, make
    make Systems (Systems-thinking, not computer programs) that allow those solutions to arise.
Chris Read and Dan North talked about Agile operations. Surprisingly for me, there wasn’t a great deal of novel stuff
here, but there were a couple of interesting points:
  • Apply an XP-ish approach to your organisational/process issues: Pick the single biggest drag on you delivering value and
    do the simplest thing to fix it. Then iterate.
  • Fast, reliable deploys are a big force multiplier for development. If you can deploy really fast, with low risk,
    then you’ll do it more often, get feedback faster, allow more experimentation, and generally waste less. The stuff that Dan
    and Chris work on (trading data) gets deployed straight from dev workstations to production in seconds; automated testing
    happens post-deploy

Yoav Landman talked about Module repositories. Alas, this was not the session I had hoped for; I was hoping for some
takeaways that we could apply to make our own build processes better, but this was really just a big plug for Artifactory,
which looks quite nice but really seems to solve a bunch of problems that I don’t run into on a daily basis. I’ve never needed
to care about providing fine-grained LDAP authorisation to our binary repo, nor to track exactly which version of hibernate was
used to build my app 2 years ago. The one problem I do have in this space (find every app which uses httpclient v3.0,
upgrade, and test it) is made somewhat easier by a tool like Artifactory, but that problem very rarely crops up, so it
doesn’t seem worth the effort of installing a repo manager to solve it. Also it doesn’t integrate with any SCM except Subversion,
which makes it pretty useless for us.


March 08, 2011

"Designing Software, Drawing Pictures

Not a huge amount of new stuff in this session, but a couple of useful things:

The goal of architecture: in particular up-front architecture, is first and foremost to communicate the vision for the system, and secondly to reduce the risk of poor design decisions having expensive consequences.

The Context->Container->Component diagram hierarchy:

The Context diagram shows a system, and the other systems with which it interacts (i.e. the context in which the system operates). It makes no attempt to detail the internal structure of any of the systems, and does not specify any particular technologies. It may contain high-level information about the interfaces or contracts between systems, if
appropriate.

The Container diagram introduces a new abstraction, the Container, which is a logical unit that might correspond to an application server, (J)VM, databases, or other well-isolated element of a system. The container diagram shows the containers within the system, as well as those immediately outside it (from the context diagram), and details the
communication paths, data flows, and dependencies between them.

The Component diagram looks within each container at the individual components, and outlines the responsibilties of each. At the component
level, techniques such as state/activity diagrams start to become useful
in exploring the dynamic behaviour of the system.

(There’s a fourth level of decomposition, the class diagram, at which we start to look at the high-level classes that make up a component, but I’m not sure I really regard this as an architectural concern)

The rule-of-thumb for what is and what isn’t architecture:

All architecture is design, but design is only architecture if it’s costly to change, poorly understood, or high risk. Of course, this means that “the architecture” is a moving target; if we can reduce the cost of change, develop a better understanding, and reduce the risk of an element then it can cease to be architecture any more and simply become part of the design.


March 07, 2011

Five things to take away from Nat Pryce and Steve Freeman's "TDD at the system scale" talk

  • When you run your system tests, build as much as possible of the environment from scratch.
    At the very least, build and deploy the app, and clear out the database before each run
  • For testing assemblies that include an asynchronous component, you want to wrap
    your assertions in a function that will repeatedly “probe” for the state you want
    until either it finds it, or it times out. Something like this
       doSomethingAsync();
       probe(interval,timeout,aMatcher,anotherMatcher...);    

    Wrap the probe() function into a separate class that has access to the objects you
    want to probe to simplify things.

  • Don’t use the logging APIs directly for anything except low-level debug() messages, and maybe
    not even then. Instead, have a “Monitoring” topic, and push structured messages/objects onto
    that queue. Then you can separate out production of the messages from routing, handling, and
    persisting them. You can also have your system tests hook into these messages to detect hard-to-observe state changes
  • For system tests, build a “System Driver” that can act as a facade to the real system, giving
    test classes easy access to a properly-initialised test environment – managing the creation and
    cleanup of test data, access to monitoring queues, wrappers for probes, etc.
  • We really need to start using a proper queueing provider