All entries for Monday 19 March 2007

March 19, 2007

Ian Murdock to join Sun

Writing about web page

This is pretty awesome, I think. Ian Murdock, the ian in debian , is joining Sun to head OS platform strategy. Which hopefully means ‘making a package manager that doesn’t totally blow’, and maybe also ‘throwing CDE into the sea once and for all’.

Twitter: Not just yet

Follow-up to Twitter from Secret Plans and Clever Tricks

So, I’ve tried twitter for a few days, and my views are; It’s not ready yet.

I love the idea of twitter, which works in 2 ways:
  • short blog entries – If you’re only blogging 140 characters at a time, you can do so much more informally. If blog entries are journals, twitters are more like the scribbles in the margins. Whether or not you get any benefit from the ‘social’ side of things, these little notes have some value, and I really enjoy creating them.
  • the social thing. Well, I’ve never really had a chance to test this out, because none of my friends are on twitter.

Now, of course I could solve this problem by telling some of my friends about this great new service, but the problem is that it’s not a great service. I couldn’t recommend to a friend that they start using it, because the performance is awful. The load time for the ‘post a new message’, from my uber-fat-pipe connection over JANET, is somewhere between 10 seconds and a minute. Early in the morning (when the US is asleep) you can sometimes get it to load in less than that, but in the evenings, it’s often even worse.

Look: right now (8pm GMT)

cusaab:~$ time curl > /dev/null
real    1m12.486s

1’12”. Not great. one-off?

cusaab:~$ time curl > /dev/null
real    0m53.920s

If I recommended this to anyone I knew, I’d doubtless get some puzzled emails back again asking whether maybe I’d misquoted the URL? The IM interface has been up and down, mostly down, over the last few days, and I haven’t felt like committing my money to the SMS interface, given the shoddiness of everything else.

So, I’ll be keeping an eye on twitter, but until they manage to sort out their scaling issues, then I can’t see me updating it very often. Which is a shame, because I’d really like to be able to get more out of it. But for now I think I’ll go back to keeping my scraps in google notebook. Oh well…

zones fun

We ran into an interesting ‘feature’ of solaris zones today. What should have been a fairly routine reboot of a zone turned into something of an epic struggle. The zone in question contained an Oracle instance, and when we went to reboot it it hung at the ‘shutting down’ stage. There was no response from the zone’s console, but prstat -z {zone} showed 2 processes; the zsched scheduler (expected) and a zombie oracle process. The zombie wouldn’t respond to either a kill -9 (obviously) or a preap. Unfortunately, once the zone is in this state it would seem that there’s absolutely nothing you can do apart from reboot the entire box, or clone the zone, leaving the old one in it’s half-dead state. Fortunately it was a development box, so a reboot wasn’t too disruptive, but it was still a pain in the nuts.

It would seem that this is a moderately well-recognized problem; if a process is stuck spinning in kernel code when it’s parent dies, then it will stay zombie for ever, and an unreapable zombie will prevent a zone from shutting down. It’s an illustration of the trade-offs between solaris zones, where you share a kernel and just protect the different userlands from each other, and ‘real’ VM solutions like VMWare where you incur the extra resource costs of an emulated machine, but get the benefit of real isolation between VMs.

Fortunately for us, it’s pretty easy to avoid. We’ll add to our list of SOPs “always shutdown oracle by hand before bouncing it’s zone” and hopefully not run into the problem again. If it happened in production we’d probably either clone the zone, or move everything onto the standby box. Either option would be unpleasant, so we’ll try to avoid getting into that situation. Given that zone reboots are rare, it doesn’t seem like too much of an imposition to stick all the non-core services into maintenance mode before restarting.

Once the box rebooted we had all kinds of fun, largely because a previously-disabled ipfilters sprang into life, firewalling random bits of the box off from it’s end users, and also because there were a bunch of ZFS mounts that had been forced into place before (by me; ooops) which refused to remount on boot, thus preventing the /system/filesystem/local service from starting, and dumping the box into single-user mode. But that, by comparison, was easily fixed :-)

Most recent entries


Search this blog

on twitter...


    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMXXI