In writing our Sakai UserDirectoryProvider (UDP) I had taken effort to ensure that the JAAS context/connection to the LDAP servers was torn down and rebuilt for every request. Why? To work around an issue with Java keeping IP #s for load-balanced LDAP servers.
As I didn't ( still don't ) have QA LDAP servers to work with I had to hope that it worked. In retrospect I could of played with my DNS settings to make some fail to resolve...
During a scheduled outage of an LDAP server (during finals week? I don't get it. ) today it seems that tearing down the entire secure connection context isn't enough to avoid the sticky IP problem. As a result post Kerberos AuthN work against the LDAP pool wasn't working correctly for a good number of users - and they couldn't completely log into Sakai. We're not sure if the TTL for the DNS entry was set low enough before the outage, but assuming it was I expected the new from-the-bottom connection to do a full DNS request and get only the good IPs.
It looks like we'll have to set a couple of JVM properties: networkaddress.cache.ttl being the first one. The default is to have the JVM cache successful DNS resolutions forever. So even if the JAAS context was built up each time the JVM would still of been giving it the wrong machine to work with. Ugh.
Some Stanford teams actually roll through IP #s for LDAP connections, recovering from bad contexts by sidestepping the DNS loadbalancer altogether.