LDAP connection pooling
We recently had problems with load on our single sign on (SSO) server. Being the start of term, things are generally busier than the rest of the year and we often see higher load than normal. However, this was too far from normal to be right.
A bit of investigation showed that our JBoss instance had literally 100s and 100s of threads. Lsof is a very handy utility in cases like this.
lsof -p <procid>
This revealed 100s of open connections to our LDAP servers. Not good.
Looking at the LDAP code we have, there are two places where we make LDAP connections, or as they are known in Java; contexts.
Hashtable env = new Hashtable(); env.put(Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.ldap.LdapCtxFactory"); env.put(Context.PROVIDER_URL, "ldap://ourldap.warwick.ac.uk"); LdapContext ctx = new InitialLdapContext(env,null); // do something useful with ctx ctx.close()
This is pretty much how our code worked in both places. Importantly I'd checked that the contexts were always closed…and they were.
This is where LDAP connection pooling came into the picture. It turned out that one piece of code (not written by us), used this:
This turns on connection pooling. However, we didn't use pooling in the other bit of code. So, one of the other wasn't working. Trying out pooling on both bits of code didn't improve things either, basically because it is a multi–threaded application with 100's of requests a minute, if you just keep creating new LdapContext's from a LdapCtxFactory, you are using a new LdapCtxFactory every time.
Thankfully our SSO application uses Spring so it was simple enough to create an XML entry for the LdapCtxFactory and the environment config and plug the same LdapCtxFactory into the two places it was needed. At least now we were using the same factory.
We could now do this:
Map env = new Hashtable(); env.putAll(getLdapEnv()); env.put("java.naming.security.principal", user); env.put("java.naming.security.credentials", pass); LdapContext ldapContext = (LdapContext) getLdapContextFactory().getInitialContext((Hashtable) env);
Where the base LDAP environment and LdapCtxFactory was injected into where it was needed. Then just the username and password to bind as is passed in dynamically.
To really know if pooling is working you need to turn on the debugging for the ldap connection pooling by adding a java option to your test/application/server. There are other handy options for tweaking the pooling behaviour as well.
-Dcom.sun.jndi.ldap.connect.pool.debug=fine -Dcom.sun.jndi.ldap.connect.pool.initsize=20 -Dcom.sun.jndi.ldap.connect.pool.timeout=10000
The bugging will give you messages like this if pooling isn't working:
Create com.sun.jndi.ldap.LdapClient@c87d32[nds.warwick.ac.uk:389] Use com.sun.jndi.ldap.LdapClient@c87d32 Create com.sun.jndi.ldap.LdapClient@c81a32[nds.warwick.ac.uk:389] Use com.sun.jndi.ldap.LdapClient@c81a32 Create com.sun.jndi.ldap.LdapClient@a17d35[nds.warwick.ac.uk:389] Use com.sun.jndi.ldap.LdapClient@a17d35 Create com.sun.jndi.ldap.LdapClient@1a7e35[nds.warwick.ac.uk:389] Use com.sun.jndi.ldap.LdapClient@1a7e35
New connections are just being created every time with no reuse. What you should see is:
Use com.sun.jndi.ldap.LdapClient@17bd5d1 Release com.sun.jndi.ldap.LdapClient@17bd5d1 Create com.sun.jndi.ldap.LdapClient@cce3fe[nds.warwick.ac.uk:389] Use com.sun.jndi.ldap.LdapClient@cce3fe Release com.sun.jndi.ldap.LdapClient@cce3fe Use com.sun.jndi.ldap.LdapClient@1922b38 Release com.sun.jndi.ldap.LdapClient@1922b38 Use com.sun.jndi.ldap.LdapClient@17bd5d1 Release com.sun.jndi.ldap.LdapClient@17bd5d1
As you can see, there are actually two differences here from a fully working connection pool and a well and truely broken one.
- There are very few creates and lots of reuse in the good code
- There are lots of releases after connection use in the good code
This is where we came across our second problem. Although in theory the connection pooling was working and I could see some reuse, it was still creating a lot of connections and I wasn't seeing barely any 'Release' messages.
Chris hit the nail on the head with pointing out that NamingEnumerations could well be just like PreparedStatements and ResultSets for JDBC. It is all fine and well closing the connection/context itself, but if you don't close the other resources, it won't actually get released.
The proof of this shows up again in lsof or netstat. A context that has been closed but still has an open NamingEnumeration shows up like this:
java 21533 jboss 80u IPv6 0x32376e2cf70 0t70743 TCP ssoserver:60465->ldapserver.warwick.ac.uk:ldap (ESTABLISHED)
However, when it is closed, it should wait to be closed, like this:
java 21533 jboss 80u IPv6 0x32376e2cf70 0t70743 TCP ssoserver:60465->ldapserver.warwick.ac.uk:ldap (TIME_WAIT)
Upon closing all NamingEnumerations, we finally got the perfect results. 100s of requests a minute and only ever around 10–15 ldap connections open at any one time.
So, lessons learnt.
- When creating contexts, share the factory to use pooling
- Make sure you close everything. If it has a close()...use it!
- Occasionally take a look at the open connections and threads that you application has…it might surprise you.
<bean id="ldapContextFactory" class="com.sun.jndi.ldap.LdapCtxFactory" singleton="true"/> <bean id="ldapEnv" class="java.util.Hashtable"> <constructor-arg> <map> <entry key="java.naming.factory.initial"><value>com.sun.jndi.ldap.LdapCtxFactory</value></entry> <entry key="java.naming.provider.url"><value>ldaps://ourldap.ac.uk</value></entry> <entry key="java.naming.ldap.derefAliases"><value>never</value></entry> <entry key="com.sun.jndi.ldap.connect.timeout"><value>5000</value></entry> <entry key="java.naming.ldap.version"><value>3</value></entry> <entry key="com.sun.jndi.ldap.connect.pool"><value>true</value></entry> </map> </constructor-arg> </bean>
We now do connection pooling with LDAPS so we use the additional system property: