Tuning Java 5 garbage collection for mixed loads
Once again, I find myself glaring balefully at the output of garbage collection logs and wondering where my CPU is going. Sitebuilder2 has a very different GC profile to most of our apps, and whilst it’s not causing user-visible problems, it’s always good to have these things under control.
So, SB2 has an interesting set of requirements. Simplistically, we can say it does 3 things:
1) Serve HTML pages to users
2) Serve files to users
3) Let users edit HTML/Files/etc
these 3 things have interestingly different characteristics. HTML requests generate a moderate amount of garbage, but almost always execute much quicker than the gap between minor collections. So, in principle, as long as our young generation is big enough we should get hardly any old gen. garbage from them. Additionally, HTML requests need to execute quickly, else users will get bored and go elsewhere.
Requests for small files are rather similar to the HTML requests, but most of our file serving time is spent drip-feeding whacking great files (10MB and up) to slow clients. This kind of file-serving generates quite a lot of garbage, and it looks as if a lot of it sticks around for long enough that it ends up in the old gen. Certainly the requests themselves take much longer than the time between minor collects, so any objects which have a lifetime of the HTTP request will end up as heap garbage. Large file serving, though, is mostly unaffected by the odd GC pause. If your 50MB download hangs for a second or two halfway through, you most likely won’t notice.
Edit requests are a bit of a mishmash. Some are short and handle only a little data, others (uploading the aforementioned big files, for instance) are much longer running. But again, the odd pause here and there doesn’t really matter. There are orders of magnitude fewer edit requests than page/file views.
So, the VM is in something of a quandry. It needs a large heap to manage the large amounts of garbage generated from having multiple file serving requests going on at any given time. And it needs to minimise the number of Full GCs so as to minimise pauses for the HTML server. But, the cost of doing a minor collection goes as a function of the amount of old generation allocated, so a big, full heap implies a lot of CPU sucked up by the (parallel) minor collectors. It also means longer-running minor collections, and a greater chance of an unsuccessful minor collect, leading to a full GC.
(For reference, on our 8-way (4 proc) opteron box, a minor collect takes about 0.05s with 100MB of heap allocated, and about 0.7S with 1GB of heap allocated)
So, an obvious solution presents itself. Divide and Conquer.
Have a VM (or several) dedicated to serving HTML. These should have a small heap, and a large young generation, so that parallel GCs are generally fast, and even a full collection is not going to take too long. This VM will be very consistent, since pauses should be minimal.
Secondly, have a VM for serving big files. This needs a relatively big heap, but it can be instructed to do full GCs fairly frequently to keep things under control. There will be the occasional pause, but it doesn’t matter too much. Minor collections on this box will become rather irrelevant, since most requests will outlive the minor GC interval.
Finally, have a VM for edit sessions. This needs a whacking big heap, but it can tolerate pauses as and when required. Since the frequency of editor operations is low, the frequency of minor collects (and hence their CPU overhead) is also low.
The only downside is that we go from having 2 active app server instances to 6 (each function has a pair of VMs so we can take one down without affecting service). But that really only represents a few extra hundred MB of memory footprint, and a couple of dozen more threads on the box. It should, I hope, be a worthwhile trade off.