All 45 entries tagged Tech

View all 318 entries tagged Tech on Warwick Blogs | View entries tagged Tech at Technorati | There are no images tagged Tech on this blog

January 15, 2009

NIS on app servers is EVIL

OK, so I’m finally (I think) getting to the bottom of our longest-running performance issue.

We have an apache server which occasionally seems to be unable to handle requests. To begin with, the symptoms were something like this; at certain times of day, the number of apache worker processes (we’re using the prefork MPM) would go through the roof, but no requests ever completed. Restarting the server seemed to help sometimes, other times we’d restart and the server would just sit there not spawning any httpd’s at all. It was all a bit of a mystery.

The times at which this happened seemed to co-incide with when our backups were running, so my first thought was file-locking – perhaps the backups were preventing apache from getting a lock on a mutex file, or something like that. But disabling the backups didn’t have any effect. Then I wondered if it might be a memory shortage (since we’d had similar problems on another server recently, caused by it running out of swap space due to a leaky httpd). Again, investigations didn’t show anything up.

Then, I looked in the conf file, and found a couple of proxying redirects, like this:

RewriteRule (.*)$1 [P] 

Alarm bells went off immediately; this is going to require a host name lookup on every request. Now, that ought not to matter, since (on solaris) ncsd should be cacheing those lookups – but nscd is suspected to have ‘issues’, particularly under heavy concurrent loads.

So, step 1; replace host names with IP addresses. Sure, we might one day need to update them if we ever change DNS, but that’s not something that happens often.

This certainly helped matters, but didn’t exactly fix them. We got fewer, shorter-lived slowdowns, but they were still there. However, something had changed. Whereas before we were getting loads of httpd processes, now we’d have barely any, until suddenly we’d get 200 being spawned at once (shortly followed by the problem going away).

Running pstack on the top-level apache whilst it was stuck like this was revealing:

 feb65bfa door     (4, 8047658, 0, 0, 0, 3)
 feaff286 _nsc_try1door (feb8f428, 8047788, 804778c, 8047790, 8047758) + 6c
 feaff4f0 _nsc_trydoorcall_ext (8047788, 804778c, 8047790) + 178
 feb0c247 _nsc_search (feb8f228, feaf767c, 6, 80477f4) + b5
 feb0af3f nss_search (feb8f228, feaf767c, 6, 80477f4) + 27
 feaf7c0f _getgroupsbymember (81bd1c0, 867b220, 10, 1) + dc
 feb00c5b initgroups (81bd1c0, ea61, 8047c88, 808586e) + 5b
 080858a5 unixd_setup_child (0, 0, 0, 0, 0, 867b4b0) + 41
 0806d0a3 child_main (10, 1, 1, 0) + e7
 0806d52b make_child (0, f, 7, e, 4, 0) + d7
 0806df01 ap_mpm_run (80b3d70, 80dfe20, 80b5b50, 80b5b50) + 93d
 08072f67 main     (4, 8047e40, 8047e54) + 5cb
 08067cb4 _start   (4, 8047ed8, 8047ef5, 8047ef8, 8047efe, 0) + 80

The top-level apache is trying to fork a new worker. But in order to do that, it needs to set the user and group privileges on the new process, and in order to do that, it needs to find the groups that the user belongs to. Since this server uses NIS to specify groups, apache has to make a call to NIS (via nscd), to list all the groups (despite the fact that the web server user isn’t actually a member of any NIS groups – it has to make the call anyway, to verify that this is the case).

So, for some reason, NIS is being slow. Maybe as a result of the high traffic levels that the backups are pushing around, the NIS requests are taking a very long time to process, and that’s preventing apache from forking new workers. When NIS finally comes back, apache has loads of requests stacked up in the listen backlog, so it spawns as many workers as it can to process them – hence the sudden jump just before everything starts working again.

To test this theory out, I wrote a teeny script that just did

time groups webservd

every 30 seconds, and recorded the result. To my dismay, lookups could take anything from 1 second to 5 minutes. Clearly, something’s wrong. Unsurprisingly, the slow lookups coincided with the times that apache was slow. running the same check on the NIS server itself revealed no such slowness; lookups were consistently returning in <1 second.

So, a fairly simple solution: Make the web server a NIS slave. This appears to have solved the problem, so far (though it’s only had a day or so of testing so far). Why a busy network should cause NIS lookups to be slow on this particular server (other servers in the same subnet were unaffected) I have no idea. It’s not an especially great solution though, particularly if I have to apply it to lots of other servers (NIS replication times scale with the number of slaves, unless we set up slaves-of-slaves).

A nicer long-term solution would be to disable NIS groups entirely. On an app/web server there’s no great benefit to having non-local groups, it’s not as if we’ve got a large number of local users to manage. Alterntatively, using a threaded worker model would sidestep the problem by never needing to do NIS lookups except at startup.

October 31, 2007

Netbeans surprises me

Follow-up to Netbeans 5: Still not switching from Secret Plans and Clever Tricks

I’ve never been able to get on with Netbeans as a java IDE. Somehow, if you’re used to Eclipse it’s just too wierd and alien, and things that ought to be simple seem hard. I’m sure that if you’re used to it, it’s very lovely, but I just can’t get started with it.

However, one thing Eclipse is not very good at, IME, is Ruby development. There are plugins, but I’ve never had much success with them; debugging support is patchy-going-on-broken, syntax highlighting / completion is super-basic, and it’s generally only one (small) step up from Emacs with ruby-mode and pabbrev.

(Note that I’m not talking about Rails development here, I’m talking about using Ruby to write stuff that would previously have been done in perl – sysadmin scripts, monitors, little baby apps and so on. Things of a couple of hundred lines or so – nothing very big, but enough that an unadorned text editor is a bit of a struggle.)

There are other Ruby IDEs of course, but they’re almost all (a) OSX specific (b) Windows specific, (c) proprietary, or (d) crap. I’d like something free, that runs on linux, but doesn’t suck, please.

Now, Sun have been making a big noise about their Ruby support generally for about the last 12 months or so, so I thought I’d grab a copy of the Ruby-specific Netbeans 6 bundle and try it out.

And, surprise surprise, it’s really good. Out of the box it almost just works – the only minor hackery I had to do was a manual install of the fastdebug gem, but the error message linked me to a web page explaining what I had to do and why. Debugging works, you can do simple refactorings, syntax highlighting and code completion are reasonably sophisticated. And it looks nice, performs well, and is all fairly intuitive to use, even for a died-in-the-wool eclipse-er like me.

So, three cheers for the Netbeans team, for filling the gaping void in the Ruby IDE space. Development still seems to be pretty active, so hopefully we can expect even more goodness in the months to come.

September 24, 2007

solaris NFS performance wierdness

Spanky new Sun X4600 box. Solaris 10u4. Multipathed e1000g GB interfaces. NFS-mounted volume, totally default.

$nfsstat -m /package/orabackup
/package/orabackup from nike:/vol/orabackup/dionysus-sbr
 Flags:         vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=32768,wsize=32768,retrans=5,timeo=600
 Attr cache:    acregmin=3,acregmax=60,acdirmin=30,acdirmax=60

$ /opt/filebench/bin/filebench
filebench> load webserver
filebench> run 60
IO Summary:      255700 ops 4238.6 ops/s, (1366/138 r/w)  23.1mb/s,   4894us cpu/op,  66.0ms latency

mutter, grumble,... remount the NFS vol with -v3

$ nfsstat -m /package/orabackup
/package/orabackup from nike:/vol/orabackup/dionysus-sbr
 Flags:         vers=3,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=32768,wsize=32768,retrans=5,timeo=600
 Attr cache:    acregmin=3,acregmax=60,acdirmin=30,acdirmax=60

$ /opt/filebench/bin/filebench
filebench> load webserver
filebench> run 60

IO Summary:      4397877 ops 72839.3 ops/s, (23495/2351 r/w) 396.4mb/s,    221us cpu/op,   3.1ms latency

What the … ? The default configuration for an NFS 4 mount on this box appears to be 20 times slower than the equivalent V3 mount. How can this be right? Either there’s something very wierd going on with our network topology, or there’s something badly broken about the way the mount is configured. Either way, it’s beyond me to work out what it is. NFS 3 ain’t broken (well, not very) so, unless Sun support can offer some illumination we’ll be sticking with that.

August 29, 2007

wishlist #2: Sun iLOM RConsole over ssh

Another thing that would make my life easier; Sun have a fantastic java GUI console redirection tool that you can use with their iLOM to get the system graphical console output. But it works by communicating over a variety of ports that our firewall doesn’t allow, so I can’t use it from home.

Surely there should be an option somewhere to tunnel it all over ssh instead? There must be an easier solution than VNC-ing (over ssh) onto my desktop at work and running the rconsole from there?

(Yes, I could ssh straight in to the ilom and get the text-only console. But it seems a shame to let the lovely graphical version go to waste…)

May 22, 2007

Three cheers for the Fair Share Scheduler

Writing about web page

The more I use my Solaris Zones boxes, the more (mostly) I like them. Yes, there are some niggles about how you cope with container failures, how you clone zones between boxes, the odd un-killable process, and so on, but for the most part, they just do exactly what you’d expect them to, all the time.

Take the FSS for instance. This little widget takes care of allocating CPU between your zones. A big problem in server consolidation, at least in web-land, is the “spiky” nature of CPU usage; Web-apps tend to be relatively low consumers of CPU most of the time, but occasionally will want a much larger amount.
If you’re consolidating apps, you don’t want one busy app to steal CPU off all the others, but if all the others are idle, then you might as well let the busy app take whatever it needs.

The FSS solves this problem elegantly. Each zone is allocated “shares” representing the minimum proportion of CPU that it is able to allocate if needed. So if I have 3 zones, and give them each 1 share, then if each zone is working flat out, they’ll get 1/3 of the CPU time allocated. But if one zone goes idle, the other two will get 50% each. If only one zone is busy, it’ll get 100%. Better still, if one zone has 100% of CPU, and another zone becomes busy, the first is reined in instantly to give the other one the CPU it’s entitled to.

And does it work in real life? Oh yes…here’s one of our apps getting a bit overloaded. You can see the box load go up to 20 (on an 8-way box; in this case it was about 99% busy for 15-20 minutes), and the zone that’s causing all the trouble gets pretty unresponsive. But the other zone doesn’t even register the extra CPU load. Awesome.

maia cpu load

maia forums response timemaia rtv response time

March 19, 2007

Twitter: Not just yet

Follow-up to Twitter from Secret Plans and Clever Tricks

So, I’ve tried twitter for a few days, and my views are; It’s not ready yet.

I love the idea of twitter, which works in 2 ways:
  • short blog entries – If you’re only blogging 140 characters at a time, you can do so much more informally. If blog entries are journals, twitters are more like the scribbles in the margins. Whether or not you get any benefit from the ‘social’ side of things, these little notes have some value, and I really enjoy creating them.
  • the social thing. Well, I’ve never really had a chance to test this out, because none of my friends are on twitter.

Now, of course I could solve this problem by telling some of my friends about this great new service, but the problem is that it’s not a great service. I couldn’t recommend to a friend that they start using it, because the performance is awful. The load time for the ‘post a new message’, from my uber-fat-pipe connection over JANET, is somewhere between 10 seconds and a minute. Early in the morning (when the US is asleep) you can sometimes get it to load in less than that, but in the evenings, it’s often even worse.

Look: right now (8pm GMT)

cusaab:~$ time curl > /dev/null
real    1m12.486s

1’12”. Not great. one-off?

cusaab:~$ time curl > /dev/null
real    0m53.920s

If I recommended this to anyone I knew, I’d doubtless get some puzzled emails back again asking whether maybe I’d misquoted the URL? The IM interface has been up and down, mostly down, over the last few days, and I haven’t felt like committing my money to the SMS interface, given the shoddiness of everything else.

So, I’ll be keeping an eye on twitter, but until they manage to sort out their scaling issues, then I can’t see me updating it very often. Which is a shame, because I’d really like to be able to get more out of it. But for now I think I’ll go back to keeping my scraps in google notebook. Oh well…

February 20, 2007

Spring and the golden XML hammer

Writing about web page

This article describes as best practice, one of the things that I’m really coming to dislike about the Spring Framework – the tendency to use XML for object construction for no better reason than ‘because I can’.

Now, I love spring; It’s revolutionised the way I, and many others, write code, and for the better. But it does have a tendency to produce reams of XML. As a data format, I think XML is OK. It’s precise, and the tooling is good, though it’s a good deal more verbose than something like JSON or YAML, which, IMO, have 80% of the functionality with 20% of the overhead.

For aspects of an application which are genuinely configuration, such as the mapping of URLs to controllers, or configuration of persistence contexts, XML is better than code; no doubt about it. For the construction of object graphs, XML is sometimes better than code. But this example is just pushing it too far. It describes setting up an observer/observable pair, using the side-effects of spring’s MethodInvokingFactoryBean to call the addListener() method, rather than doing it in code.

Now, this is just clunky. Instead of one line of code that says


we have this

<bean id="registerTownResident1" 
    <property name="targetObject"><ref local="townCrier"/></property>
    <property name="targetMethod"><value>addListener</value></property>
    <property name="arguments">
      <ref bean="townResident1"/>

Ten lines of XML. No static type-checking (I hope you’ve got a bunch of tests that verify your contexts…) The addListener invocation, the thing we’re trying to achieve here, is kind of buried; the bean that’s actually generated is never used, the whole thing is far from obvious in it’s intent.

The only notional advantage I can see is that you can add and remove listeners without touching the code. But how much of an advantage is that? In most situations, where you’re using a method-invoking synchronous observer/subject pattern like this, listeners are part of the application, and not part of the configuration; you wouldn’t remove one without first consulting a developer anyway. When you’ve got genuinely replaceable listeners, then it’s more common IME to have some kind of an abstraction like a JMS queue or a message bus in between subject and listener, so that the listeners are registered with the queue, not the subject itself.

If it were up to me, I’d probably have a class called when the context is built (via an ApplicationListener maybe), which explicitly built up the subject/observer relations. If I had some configurable relationships, I might pass in a list of observers, but that’s about as far as it would go;

\\ set by IOC
setChangeEventListeners(List<ChangeEventListener> listeners){
   this.changeListenersToRegister = listeners;

  \\ configure a subject with a list of observers
   for (ChangeEventListener listener : this.changeListenersToRegister){

   \\ now hard-code a subject that won't need to change frequently
   auditLog.addListener(new log4j.Category("AUDIT_LOG");

  \\ ... and so on  

- this object starts to look a bit vague and ill-defined, doing a little with lots of objects, but that’s because really it’s just a part of the context/configuration; it’s not a part of the domain per se.

There are a few other options that, in some situations, might be better than this;

  • Give the subject a constructor that takes a list of observers, and let it wire them at construction time – then pass the list from within your XML context
  • If you can’t modify the subject itself, make a custom FactoryBean that takes the list of observers, constructs the subject and adds all the observers to it
  • One that requires a bit of divergence from the standard Spring usage. Have a context that’s defined by a bit of scripting code – JRuby, or BSH, or javascript/rhino, rather than by XML. That way you make your method calls more explicit, and allow developers to easily see what relationships are being built up , whilst still keeping some clear separation between the configuration and the java code. If you had loads of Observer/subject configuration to maintain, you could define a little DSL for it (or store it in a database) and have a custom context to parse the DSL and configure the beans.

January 19, 2007

New tools

I’ve been playing with some new bits of technology. Not very new, but new to me, anyway. JSON and BeautifulSoup.

BeautifulSoup is a Python library, now ported into all good dynamic languages (I’m using the Ruby version), which parses HTML. It’s defining feature is that it’s very relaxed about well-formed-ness. If your markup is fully-validating XHTML, all good. If it’s horrible HTML 3.2 tag soup with unbalanced divs and unclosed tables, that’s cool too. Soup will make a pretty good job, parsing what it can with a DOM, and falling back to regexes, special-cases, and hacks for the rest. Having parsed the markup, it gives you a nice DOM tree, which you can traverse or search as you’d expect.

I’m using it to scrape some information from a webpage, which I then expose as data via a web service, which is where the JSON bit comes in. JSON is a data-transfer language, functionally equivalent to XML, but expressed as Javascript arrays and hashes.
So rather than having to parse a heap of XML, which is awkward and platform-dependent in javascript, you can just eval the JSON string (escaping as needed if you don’t trust your source), and have a pre-loaded object graph spring into existence.

My JSON is loaded after page-loading via a prototype Ajax.Request. The onComplete function evals the JSON, then sets up the innerHTML for a div, based on the objects it got back. It’s really very straightforward – even for a javascript newbie like me.

December 08, 2006

Solaris SMF manifest for a multi–instance jboss service

Today I have mostly been writing SMF manifests. We typically run several JBoss instances per physical server (or zone), using the JBoss service binding framework to take care of port allocations. I couldn’t find a decent SMF manifest that would be safe to use in a context where you’ve got lots of JBosses running, so I wrote my own. Here it is…

It’s still a tad rough around the edges.
  • It assumes you’ll name your SMF instances the same as your JBoss server instances
  • The RMI port for shutdowns is specified as a per-instance property – in theory one could parse it out of the service bindings file, but doing that robustly is just too much like hard work at the moment.
  • It assumes that you’ll want to run the service as a user called jboss, whose primary group is webservd – adjust to suit.
  • The jvm_opts instance property allows you to pass specific options (for example, heap size) into the JVM
  • It assumes that you’ll have a log directory per instance, located in /var/jboss/log/{instance name}-{rmi port}. The PID file is stored there, and the temp. file dir is set to there too (using /tmp for temporary files is a bad idea if you hoover your temp dir periodically, as you’ll delete useful stuff)
  • The stop method waits for the java process to terminate (otherwise restart won’t work. The start method doesn’t wait for the server to be ready and to have opened it’s HTTP listener, just for the VM to be created. I might add that next, although given that svcadm invocations are asynchronous there doesn’t seem much point.

The manifest itself:

<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'>
<service_bundle type='manifest' name='export'>
  <service name='application/jboss' type='service' version='0'>
    <instance name='default' enabled='true'>
      <dependency name='network' grouping='require_all' restart_on='error' type='service'>
        <service_fmri value='svc:/milestone/network:default'/>
      <dependency name='sysconfig' grouping='require_all' restart_on='error' type='service'>
        <service_fmri value='svc:/milestone/sysconfig:default'/>
      <dependency name='fs-local' grouping='require_all' restart_on='error' type='service'>
        <service_fmri value='svc:/system/filesystem/local:default'/>
      <exec_method name='start' type='method' exec='/usr/local/jboss/bin/svc-jboss start' timeout_seconds='180'>
            <method_credential user='jboss' group='webservd' />
      <exec_method name='stop' type='method' exec='/usr/local/jboss/bin/svc-jboss stop' timeout_seconds='180'>
      <property_group name='jboss' type='application'>
        <propval name='instance-rmi-port' type='astring' value='1099'/>
        <propval name='jvm-opts' type='astring' value='-server -Xmx1G -Xms1G'/>
    <stability value='Evolving'/>
        <loctext xml:lang='C'>JBoss J2EE application server</loctext>

... and the service method


. /lib/svc/share/

# General config

# instance-specific stuff:
# sed the instance name out of the FMRI
JBOSS_SERVICE=`echo $SMF_FMRI | sed 's/.*:\(.*\)/\1/'`
JBOSS_SERVICE_RMI_PORT=`svcprop -p jboss/instance-rmi-port $SMF_FMRI`
SERVICE_JVM_OPTS=`svcprop -p jboss/jvm-opts $SMF_FMRI`

# Derived stuff
JAVA_OPTS="${JBOSS_VAR} -Djava.awt.headless=true" 

if [ -z "$SMF_FMRI" ]; then
        echo "JBOSS startup script must be run via the SMF framework" 
        exit $SMF_EXIT_ERR_NOSMF

if [ -z "$JBOSS_SERVICE" ]; then
        echo "Unable to parse service name from SMF FRMI $SMF_FRMI" 
        exit $SMF_EXIT_ERR_NOSMF

        echo "starting jboss.." 
        if [ ! -z "$SERVICE_JVM_OPTS" ]; then

        $JAVA -classpath $JBOSS_CLASSPATH $JAVA_OPTS $SERVICE_JVM_OPTS org.jboss.Main -c ${JBOSS_SERVICE} >$JBOSS_CONSOLE 2>&1 & echo $! >${PIDFILE}

        echo "stopping jboss.." 
        $JAVA -classpath $JBOSS_CLASSPATH org.jboss.Shutdown $stop_service
        PID=`cat ${PIDFILE}`
        echo "waiting for termination of process $PID ..." 
        pwait $PID
        rm $PIDFILE

case $1 in


        echo "Restarting jboss" 

        echo "Usage: $0 { start | stop | restart }" 
        exit 1



postscript I wrote above that parsing the service-bindings file to find the RMI port is too hard; this turns out not to be true. Praise be to Blastwave!

pkg-get install xmlstarlet

xml sel -t -v "/service-bindings/server[@name='${INSTANCE_NAME}']/service-config[@name='jboss:service=Naming']/binding/@port" service-bindings.xml 

November 22, 2006

Tuning Java 5 garbage collection for mixed loads

Once again, I find myself glaring balefully at the output of garbage collection logs and wondering where my CPU is going. Sitebuilder2 has a very different GC profile to most of our apps, and whilst it’s not causing user-visible problems, it’s always good to have these things under control.

So, SB2 has an interesting set of requirements. Simplistically, we can say it does 3 things:

1) Serve HTML pages to users
2) Serve files to users
3) Let users edit HTML/Files/etc

these 3 things have interestingly different characteristics. HTML requests generate a moderate amount of garbage, but almost always execute much quicker than the gap between minor collections. So, in principle, as long as our young generation is big enough we should get hardly any old gen. garbage from them. Additionally, HTML requests need to execute quickly, else users will get bored and go elsewhere.

Requests for small files are rather similar to the HTML requests, but most of our file serving time is spent drip-feeding whacking great files (10MB and up) to slow clients. This kind of file-serving generates quite a lot of garbage, and it looks as if a lot of it sticks around for long enough that it ends up in the old gen. Certainly the requests themselves take much longer than the time between minor collects, so any objects which have a lifetime of the HTTP request will end up as heap garbage. Large file serving, though, is mostly unaffected by the odd GC pause. If your 50MB download hangs for a second or two halfway through, you most likely won’t notice.

Edit requests are a bit of a mishmash. Some are short and handle only a little data, others (uploading the aforementioned big files, for instance) are much longer running. But again, the odd pause here and there doesn’t really matter. There are orders of magnitude fewer edit requests than page/file views.

So, the VM is in something of a quandry. It needs a large heap to manage the large amounts of garbage generated from having multiple file serving requests going on at any given time. And it needs to minimise the number of Full GCs so as to minimise pauses for the HTML server. But, the cost of doing a minor collection goes as a function of the amount of old generation allocated, so a big, full heap implies a lot of CPU sucked up by the (parallel) minor collectors. It also means longer-running minor collections, and a greater chance of an unsuccessful minor collect, leading to a full GC.
(For reference, on our 8-way (4 proc) opteron box, a minor collect takes about 0.05s with 100MB of heap allocated, and about 0.7S with 1GB of heap allocated)

So, an obvious solution presents itself. Divide and Conquer.

Have a VM (or several) dedicated to serving HTML. These should have a small heap, and a large young generation, so that parallel GCs are generally fast, and even a full collection is not going to take too long. This VM will be very consistent, since pauses should be minimal.

Secondly, have a VM for serving big files. This needs a relatively big heap, but it can be instructed to do full GCs fairly frequently to keep things under control. There will be the occasional pause, but it doesn’t matter too much. Minor collections on this box will become rather irrelevant, since most requests will outlive the minor GC interval.

Finally, have a VM for edit sessions. This needs a whacking big heap, but it can tolerate pauses as and when required. Since the frequency of editor operations is low, the frequency of minor collects (and hence their CPU overhead) is also low.

The only downside is that we go from having 2 active app server instances to 6 (each function has a pair of VMs so we can take one down without affecting service). But that really only represents a few extra hundred MB of memory footprint, and a couple of dozen more threads on the box. It should, I hope, be a worthwhile trade off.

Most recent entries


Search this blog

on twitter...


    RSS2.0 Atom
    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMXXI