Garbage collection and Hibernate performance tuning
We've been experiencing some performance problems recently with BlogBuilder. Occasionally we've been having periods of up to 10 seconds where BlogBuilder has completly stopped responding. It didn't take too long to work out that this was the the garbage collection (GC) in the JVM. Only problem is…how do you fix it?
1) First things first, analyse the problem.
After a bit of faffing around I found that the easiest way to get some information about what is going on with GC is to set the following flag when you start up your application, in my case JBoss.
-Xloggc:some_garbage_collection_log_file.log -XX:+PrintGCTimeStamps
This will print out some nice stats such as these:
0.000: [Full GC 31662K->1547K(3114944K), 0.1660533 secs]
46.941: [Full GC 302615K->20372K(3114944K), 0.8736640 secs]
107.826: [Full GC 237781K->33703K(3114944K), 0.9595437 secs]
168.812: [Full GC 997600K->68269K(3114944K), 2.0233878 secs]
230.857: [Full GC 1045518K->85888K(3114944K), 1.9429575 secs]
278.238: [GC 1072896K->102427K(3114944K), 0.3734575 secs]
You can just look at these and see how long garbage collection is taking, in my case around 1 or 2 seconds per full GC. The problem comes once the application has been running for a while.
84564.203: [Full GC 1178937K->867583K(3114944K), 11.3700822 secs]
84618.071: [GC 1854591K->897218K(3114944K), 0.5409472 secs]
84635.640: [Full GC 1253982K->904732K(3114944K), 11.9113516 secs]
84697.271: [GC 1891740K->937871K(3114944K), 0.5528494 secs]
84707.583: [Full GC 1125289K->935547K(3114944K), 12.2230228 secs]
84760.543: [GC 1922555K->960498K(3114944K), 0.4644272 secs]
84779.833: [Full GC 1589435K->981758K(3114944K), 14.6843600 secs]
84854.553: [Full GC 1919262K->970964K(3114944K), 13.7111592 secs]
At a glance we can see things are going wrong, but to really get a good overview of what is happening, HPjTune comes to the rescue. This small java app analyses these logs and prints out some nice graphics for you.
As can be seen from this, the GC duration rises steadily and gets very spikey occasionally as well. From other information jTune provides, you can also find out the memory that is getting used up and freed up on each GC.
2) Solutions.
There are two main possible solutions to sorting out GC problems. These are:
- Use a better GC algorithm
- Reduce the memory used and number of objects created by your application
Using a better GC algorithm.
Tuning Garbage Collection with the 1.4.2 Java[tm] Virtual Machine tells you pretty much everything you need to know. The conclusion is that if we want GC to stop blocking, we need to use a parallel GC algorithm. This will only work on a multi-processor box though (which is ok because blogs does run on one), but it is difficult to test because we don't have a comparable test machine.
Reduce the memory used
The less memory there is used, the less memory the GC algorithm has to work through to clean it up. Simple. We use up a lot of memory in BlogBuilder because we have a very large second level Hibernate cache (using EHCache). I'm beginning to think it is too big and too long lived. Getting to the root cause of why we need such a big cache lead me to look a bit closer at some logs. It turns out BlogBuilder is pumping out a lot more SQL than it needs.
BlogBuilder is generating SQL inefficiently -> We need big caches so we don't re-run the SQL -> We get long GC periods -> We get bad performance.
It turns out you just can't get around not fixing the initial performance bottle necks, the caches will save you to a certain degree, but once you get a lot of data (and with 3000 blogs, 20000 entries and 40000 comments, we have a lot of data), you need to go back to basics and get the individual pages running efficiently.
3) Improve Hibernate SQL efficiency
BlogBuilder already has all collection relationships set as lazy so that data is only ever loaded when it is needed (this is a huge performance gain), but even so, when the data has to be loaded, there is no getting around the fact that some SQL is going to be issued.
The one optimisation I'd glaringly missed when first developing BlogBuilder was the batch-size settings for collections. If we have a permissions object which has two groups on it, staff and students, then Hibernate might issue three sql queries, one for the permission and one for each of the two groups. In practice, Hibernate is quite clever and will generally just issue two sql queries, one for the permission and then just a single query to get both groups. Good.
However, what setting batch-size does for you is instructs Hibernate to be even more aggressive in optimising the sql.
Now, say you have three permissions objects and each has two groups, you might normally have to issue either nine or six sql statements. If you set batch-size="10" on the groups relationship, Hibernate will try and load all groups across all three permissions in one go, meaning just four queries. Brilliant!
Using this and a couple of other methods, I reduced the number of queries for a single un-cached page down to one third of the original number.
4) Conclusion.
At the end of the day, you must track down performance problems to their root cause, don't just work around them with a higher level fix. Hopefully this will mean that I can be less reliant on the caching for good performance, so I can reduce the cache size and timeouts, leading to better GC performance.
7 comments by 1 or more people
[Skip to the latest comment]Matt Fleming
So did you enable the parallel garbage collector? How did it work out?
21 Apr 2005, 22:02
To be honest in the end the parallel GC was not actually that important. What really made a different was turning off the every minute RMI GC that was going on.
22 Apr 2005, 09:15
A couple more tweaking thoughts...
I'm studying GC and memory setting optimization myself right now, and I read about what Kieran was talking about (in the comments section) about RMI invoked garbage collection being a notorious problem.
There are a couple other things for the "general tuning" study above that are worth mentioning. First of all, the Parallel GC might or might not be the best choice. All the literature seems to suggest it's optimal for a 4+ CPU system. I think more web servers out there are going to tend to be 2 CPU systems. For that I would highly suggest the low-pass concurrent garbage collector. It is specifically intended for a multiprocessor system (including 2 CPUs) to tackle the "server pauses for long periods of time during full GC" issue. The server flag for that is "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC". (Actually, UseConcMarkSweep is okay I think for single CPU as well. It's point is to stop blocking the main app during the full GC at the slight cost of performance. If you have a 2+ CPU system, you can add UseParNewGC to create a parallel young-generation GC which may add a slight performance boost.)
The other really important thing is to look at (a) how much RAM your application is using and (b) how much of the HEAP is allocated to the "young generation" area. THE BEST MEMORY TWEAKING YOU CAN DO is to run your app and gather and look at the memory usage and get a feel for "about what's the max heap the system ever allocates during its hardest (90th percentile) loads" AND THEN SET YOUR -Xms###m FLAG TO THAT LEVEL. Then play around with the -XX:NewRatio=# parameter (start with 5 or 6, look at the graphs, see if you need to lower it to 4 or 3).
You see, most of the time almost all of your Java-created objects should live and die in the young generation zone. "Eden" it's called. If Eden is too small then short-lived objects will get promoted to the Tenured Generation zone and force too much of a load on the full (expensive) garbage collection.
It's not too hard to learn about these heap areas, look at some statistics (it's actually a little fun) and adjust your settings so that JVM memory settings are no longer a guessing game. The article Kieran suggests at the top is very readable and actually quite interesting.
17 Aug 2005, 14:11
Ian Johnson
Dual Opteron Java 1.5 options:
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
reduced our biggest GC pauses from 100+ ms several times a minute to max 3–5ms (but more often) All major GC were at startup and no more were seen after the first 20 seconds)
06 Dec 2005, 23:27
Dharmesh
Sorry, My question is :
I m firing simple single table query say : select id, name from master_table order by name.
master_table contains 45000 records and when Hibernate fires qeuries it talkes 1/2 hr and responds with outofmemory error.
I set batch–size = 1000, but even then also performance issue is there.
Could anyone has exact solution to do hibernate work?
Regards,
Dharmesh
24 May 2006, 09:02
Hibernate isn't really designed to do batch processing of large sets. They say something about this in the documentation here:
http://www.hibernate.org/hib_docs/v3/reference/en/html_single/#batch
24 May 2006, 11:17
anjul tyagi
Dear Expert,
i am facing the problem while executing the hibernate query. whenever i start the query execution my database server acquiring full memory. and responding very late. and my database is PostgresSQL. si can any body tell me where is the problem either in my code or its a database issue.
thanks
Anjul Tyagi
15 Mar 2007, 17:39
Add a comment
You are not allowed to comment on this entry as it has restricted commenting permissions.