September 10, 2008

apache "fork: Unable to fork new process" errors on solaris

(note google-bait title; I hope this helps someone else out).

So, we had a problem where, every now and then, a sudden rush of requests to our webserver would lead to apache saying

“fork: Unable to fork new process”

in the error logs, once it tried to spawn more than ~ 400 httpds – and for a little while no-one got any webbage. I spent some time looking into why this should be, and never really got anywhere. In each case, a hard restart of apache would fix it. I could see that there was a problem with our apache; it starts out life at ~8MB per httpd process, but after a few months of running (with lots of “apachectl graceful”’s, but no restarts) would be more like 100MB. But, looking into the process, almost all of that was shared memory:

# pmap -ax 4987
 4987:   /opt/coolstack/apache2/bin/httpd -k start
 Address  Kbytes     RSS    Anon  Locked Mode   Mapped File
0803F000      36      36      12       - rwx--    [ stack ]
08050000     320     312       -       - r-x--  httpd
080AF000      12      12       8       - rwx--  httpd
080B2000       4       4       4       - rwx--  httpd
080B3000  116456  116324     104       - rwx--    [ heap ]
FDFA0000     184     184      16       - rw-s-    [ anon ]
FE000000     504     184       -       - rw-s-    [ anon ]
FE080000      64      16       -       - rwx--    [ anon ]
FE0A0000      24      24       -       - r-x--  mod_proxy_http.so
... other inconsequential items...
-------- ------- ------- ------- -------
total Kb  124144  121736     220       -

so this shouldn’t matter. Even if there were 1000 httpds, with an anonymous allocation of 220K each, that isn’t going to make a dent on our server, which has about 50GB of VM in total.

Additionally, Solaris maintains a cache for ZFS file systems, which will, by default use up almost all of the RAM on the box. However, the cache allocations are special; a call to fork() or malloc() is allowed to eat into cache memory whenever it needs to. But I could see on our box that the ZFS cache was sat at about 20GB – so if ZFS is still using all this RAM, why can’t apache?

Well, predictably enough, my failure to analyse the problem came back to bite us. One day, instead of just apache being unable to fork, the whole box locked up. I couldn’t even run ‘ps’ to find a pid to kill. So, we transferred the service as quickly as possible onto a standby box, and left the wedged server to itself.

Clearly, a deeper understanding was required. I went to chat with our resident solaris guru, who explained what was going on.

When unix fork()s a process, the OS doesn’t yet know how much of that process’s memory allocation is going to be shared, and how much will remain local. So, it has to reserve enough for the entire space (i.e. 100MB per process, in the case of our apaches).

In Linux, the OS will ‘overcommit’, and allow processes to carry on forking, even when all the virtual memory has already been allocated. In the unlikely event that all the processes actually need all the space they’ve been allocated, the ‘OOM killer’ comes into play; a daemon which looks for processes using a large amount of RAM and kills them. This makes things very efficient, but a little unpredictable.

On solaris, by comparison, overcommit is not allowed. If you want to fork() a 100MB process, there must be 100MB of free virtual memory left on the system. So it’s now easy to see why our httpd’s were failing to fork: 100MB * 400 process = 40GB – once you’ve added in the 10GB of oracle SGA, 5GB of java heap, and sundry other processes, that’s everything all gone.

Meanwhile, what about that 20GB of ZFS cache? Well, it turns out that this is allowed to share space with reserved-but-not-used VM. Since all the apaches were only actually using a tiny bit of their reservation, there was plently of space for the ARC cache to sit in.

So, there are a couple of solutions:

1) Allocate a shedload of swap space. Knowing that it’ll never actually get used; it wouldn’t really hurt to have, say, 100G of swap sitting idle. Except that we’d need to get some more disks.

2) Stop apache leaking. This would be the ideal solution; a webserver that takes 100MB of heap does seem a bit on the excessive side, even to a hardened java programmer like me ;-). But whether it’s possible or not, I don’t know. The standby server has a more up-to-date version of apache, so maybe the problem will magically fix itself…

3) Periodically restart apache. Ugghh. Really? This isn’t windows, you know…periodic hard restarts of user-facing services, with all the associated risk and downtime, are really not something I want to get into.

4) Front apache with squid (or haproxy, varnish, an F5, whatever), and periodically swap between two separate apache instances, allowing either one to be killed off as required. Better, but a helluva lot of extra infrastructure just to fix a leaky webserver

5) Use lighttpd. Hmm…..

Update not quite the same as our problem, but I’m reproducing it here for the benefit of anyone else suffering; a reader contacted me to observe that a recent Sun patch had upped the ServerLimit directive to 2048, and that this had led to very high (>100MB/process) memory use. I can see how this could be the case, particularly if you’re using a multithreaded MPM like Worker, so it’s worth watching out for.


- 5 comments by 1 or more people Not publicly viewable

  1. Andrew Ingram

    I quite like nginx

    10 Sep 2008, 18:40

  2. Chris May

    I do to. Though I worry that its bonkers config file format might scare our sysadmins :-)

    10 Sep 2008, 21:02

  3. Andrew Ingram

    The config file is one of the big wins for me over apache, I hate the apache config files with it’s indecisiveness over whether it’s xml or not. I was under the impression that lighttpd had a similiar JSON-like format to nginx?

    10 Sep 2008, 21:34

  4. Chris May

    Apache’s config file format pre-dates XML by a few years (it came from NCSA HTTPd, which dates back to the early ‘90s), and the syntax has remained backwards-compatible ever since. So I can forgive it it’s eccentricities in return for it’s invariance :-)
    (I wasn’t suggesting that lighttpd was any different in terms of accessibility for apache admins, btw; they’re both, as you say, quite similar).

    Whether we could ever migrate our extensive set of mod_rewrite configs over to either is another question…

    10 Sep 2008, 22:35

  5. Chris May

    Hmm; been looking at nginx vs. lighttpd a bit more. A big blocker for me at the moment would be the apparent lack of a decent server-status module. lighttpd has one which is very similar to the apache version, but nginx has only a very brief overview. In particular, nginx’s server-status doesn’t show current requests. This has proved absolutely invaluable in the past – when things start to bog down, being able to log in and see exactly what the server’s waiting on is a real time-saver for debugging.

    Fortunately, apache 2.2.6 doesn’t seem to be leaking like 2.2.3 did, so hopefully I won’t have to care about this :-)

    15 Sep 2008, 11:11


Add a comment

You are not allowed to comment on this entry as it has restricted commenting permissions.

Most recent entries

Loading…

Search this blog

on twitter...


    Tags

    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMXXI