apache "fork: Unable to fork new process" errors on solaris
(note google-bait title; I hope this helps someone else out).
So, we had a problem where, every now and then, a sudden rush of requests to our webserver would lead to apache saying
“fork: Unable to fork new process”
in the error logs, once it tried to spawn more than ~ 400 httpds – and for a little while no-one got any webbage. I spent some time looking into why this should be, and never really got anywhere. In each case, a hard restart of apache would fix it. I could see that there was a problem with our apache; it starts out life at ~8MB per httpd process, but after a few months of running (with lots of “apachectl graceful”’s, but no restarts) would be more like 100MB. But, looking into the process, almost all of that was shared memory:
# pmap -ax 4987 4987: /opt/coolstack/apache2/bin/httpd -k start Address Kbytes RSS Anon Locked Mode Mapped File 0803F000 36 36 12 - rwx-- [ stack ] 08050000 320 312 - - r-x-- httpd 080AF000 12 12 8 - rwx-- httpd 080B2000 4 4 4 - rwx-- httpd 080B3000 116456 116324 104 - rwx-- [ heap ] FDFA0000 184 184 16 - rw-s- [ anon ] FE000000 504 184 - - rw-s- [ anon ] FE080000 64 16 - - rwx-- [ anon ] FE0A0000 24 24 - - r-x-- mod_proxy_http.so ... other inconsequential items... -------- ------- ------- ------- ------- total Kb 124144 121736 220 -
so this shouldn’t matter. Even if there were 1000 httpds, with an anonymous allocation of 220K each, that isn’t going to make a dent on our server, which has about 50GB of VM in total.
Additionally, Solaris maintains a cache for ZFS file systems, which will, by default use up almost all of the RAM on the box. However, the cache allocations are special; a call to fork() or malloc() is allowed to eat into cache memory whenever it needs to. But I could see on our box that the ZFS cache was sat at about 20GB – so if ZFS is still using all this RAM, why can’t apache?
Well, predictably enough, my failure to analyse the problem came back to bite us. One day, instead of just apache being unable to fork, the whole box locked up. I couldn’t even run ‘ps’ to find a pid to kill. So, we transferred the service as quickly as possible onto a standby box, and left the wedged server to itself.
Clearly, a deeper understanding was required. I went to chat with our resident solaris guru, who explained what was going on.
When unix fork()s a process, the OS doesn’t yet know how much of that process’s memory allocation is going to be shared, and how much will remain local. So, it has to reserve enough for the entire space (i.e. 100MB per process, in the case of our apaches).
In Linux, the OS will ‘overcommit’, and allow processes to carry on forking, even when all the virtual memory has already been allocated. In the unlikely event that all the processes actually need all the space they’ve been allocated, the ‘OOM killer’ comes into play; a daemon which looks for processes using a large amount of RAM and kills them. This makes things very efficient, but a little unpredictable.
On solaris, by comparison, overcommit is not allowed. If you want to fork() a 100MB process, there must be 100MB of free virtual memory left on the system. So it’s now easy to see why our httpd’s were failing to fork: 100MB * 400 process = 40GB – once you’ve added in the 10GB of oracle SGA, 5GB of java heap, and sundry other processes, that’s everything all gone.
Meanwhile, what about that 20GB of ZFS cache? Well, it turns out that this is allowed to share space with reserved-but-not-used VM. Since all the apaches were only actually using a tiny bit of their reservation, there was plently of space for the ARC cache to sit in.
So, there are a couple of solutions:
1) Allocate a shedload of swap space. Knowing that it’ll never actually get used; it wouldn’t really hurt to have, say, 100G of swap sitting idle. Except that we’d need to get some more disks.
2) Stop apache leaking. This would be the ideal solution; a webserver that takes 100MB of heap does seem a bit on the excessive side, even to a hardened java programmer like me ;-). But whether it’s possible or not, I don’t know. The standby server has a more up-to-date version of apache, so maybe the problem will magically fix itself…
3) Periodically restart apache. Ugghh. Really? This isn’t windows, you know…periodic hard restarts of user-facing services, with all the associated risk and downtime, are really not something I want to get into.
4) Front apache with squid (or haproxy, varnish, an F5, whatever), and periodically swap between two separate apache instances, allowing either one to be killed off as required. Better, but a helluva lot of extra infrastructure just to fix a leaky webserver
5) Use lighttpd. Hmm…..
Update not quite the same as our problem, but I’m reproducing it here for the benefit of anyone else suffering; a reader contacted me to observe that a recent Sun patch had upped the ServerLimit directive to 2048, and that this had led to very high (>100MB/process) memory use. I can see how this could be the case, particularly if you’re using a multithreaded MPM like Worker, so it’s worth watching out for.