All 2 entries tagged Sql
February 08, 2006
First post of the year (bad boy)...
I've spent a lot of this year so far jumping between lots of different things. I've started dipping into the new SiteBuilder code which is far more familiar as it is now Spring/Hibernate based rather than Struts/EJB.
I've also as usual been working on Single Sign-On and BlogBuilder.
As the complexity of BlogBuilder grows and our page views grows (now averaging more than 50,000 proper real people page views per day), it has become more and more important to optimise BlogBuilder for better performance.
Hongfeng our resident Oracle expert pointed me in the direction of quite a lot of particularly bad and slow pieces of SQL that were being generated out of BlogBuilder. The problem with BlogBuilder is that it is very very dynamic. We do not serve any static pages as every single page is customised to the currently logged in user as every blog/image/entry has its own permissions. There are also just a lot of different views on the blogs data; daily views, monthly views, favourites views, entries by tags, blogs by group, images by day, etc…
When using Hibernate 2 I did most of these queries with HQL, and it worked quite well, but I'm starting to feel the strain as some of the queries got more and more complicated.
With Hibernate 3 I can now take advantage of the Criteria API, which is quite nice for building complicated queries, but it still has some problems so I've now got a mix of HQL, Criteria and plain old SQL when a particularly complicated aggregation is needed.
Don't forget to turn on query caching and specifically tell your criteria and queries to cache as although the documentation says that for most queries caching doesn't make much difference, I've found it can make a huge difference.
Another little trick is to be careful with date range queries. If you want to do something like find items based on the current time, round your time to the nearest hour or minute rather than passing in a date with second or millisecond accuracy as this will prevent those queries being cached for more than a second…not a lot of good.
Another trick when moving from Hibernate 2 to Hibernate 3 is that you used to have to do "query.iterate().next()" to get a result when you knew there was just a single result (such as a count query), but now there is the uniqueResult() method. It is important to switch over because the uniqueResult() calls get cached, but the iternate().next() ones don't.
October 06, 2005
I had to clear up some old bad data that was left over from a bit of bad code. Unfortunately the bad data didn't rear its ugly head until recently and a lot of data had built up. It was also very hard to detect the bad data because of many places it could be referenced from…only if it had no references to it from any of 7 places would it need to be deleted.
This means doing a really horrible query either like this:
select id from atable where id not in (select id from anothertable) and id not in (select id from yetanothertable) and id not in (select id from moretables) ..... .....
This is very, very, very slow.
A more efficient way of doing this is this:
select id from atable a where not exists (select id from anothertable b where a.id = b.id) and not exists (select id from yetanothertable c where a.id = c.id) and not exists (select id from moretables d where a.id = d.id) ..... .....
However, when you are dealing with potentially 100,000's of rows it is still quite slow…but it does get there. The next problem is actually deleting the data once you've managed to select it. As a little test I thought I'd try and delete the whole lot, but that just didn't work…too slow. Even if I did have the patience to leave it running for hours, I couldn't let it lock up the database like that for that long.
So, the only solution was to do it in batches. I wrote a quick java program that would iterate through an do the deletes in small batches of a 100 or so at a time. My first mistake was trying to reuse some Spring/Hibernate code I already had instead of going straight to old school and using JDBC.
Although in theory you can get a Connection object from the hibernate Session, via session.connection(), it really is NOT the same as just getting a good old fashioned JDBC connection. The deletes were taking absolutely ages, so I profiled it and noticed that hibernate was still trying to do some of its funky stuff in the background, really slowing things down.
Plan B (or is it D by now?). Spring comes with a handy little JdbcTemplate which lets you do real JDBC but without a lot of the exception and connection/statement/resultset closing pains. Finally…it worked.
So, lesson of the day:
- not exists type queries are faster than not in queries
- Bulk deletes can be verrrrrrry slow
- Batching deletes is better, but with real JDBC not hibernate SQL calls