A while ago, when we were designing Sitebuilder 2, one of our design goals was that the app server should be as stateless as it possibly could be. We wanted to end up in a situation where we could scale the app simply by bringing new servers online, with no need to replicate between them. We also wanted to be able to bring individual servers down for maintenance without users noticing, simply redirecting requests onto the remaining servers.
As far as the viewing of pages is concerned, this has worked out pretty well. There are occasional blips when our version of mod_jk fails to realise that a server has dropped out of the cluster, but they're rare and we could work around them quite easily if need be.
But for editing we haven't quite realised our goals. It started out very well, but we were seduced by a bit of technology that nearly did what we needed, but not quite. That technology is based on Spring Web Flow .
What SWF does (amongst other things) is to let you associate an arbitrary bunch of objects with a business process. So if you were, say, uploading and unpacking a zip file onto the server, then the process might include 3 steps – you upload the file, then you choose which files go where, then you're told which ones were successfully uploaded and which weren't.
There's quite a bit of state associated with that process, and SWF does kind of solve the problem of how you could do step 1 on server 1, step 2 on server 2, and step 3 on server 3. It does this by using 'client continuations' – basically, all the server–side objects needed for the process are serialized and the resulting ObjectOutputStream is written into a field on the form, and hence re–submitted when the user processes the next step.
So far so good. But the first hurdle comes when you've got a lot of server-side state - like, say, a 200MB zip file full of MP3s. If you try serializing that back to the client, you'll have a lot of network IO, plus you'll have to post all of your forms as multiparts, which is a bit bogus.
So, when we serialized our objects, we wrote all the files out to some shared file storage, so that any node in the cluster could pick them up (for purposes of disaster planning, our shared storage never fails ;–) )
So far so good; now we have clients with almost all of the server–side state they need to continue the process, and the rest of the state is shared amongst all the nodes.
But spotting the files to store on the server is kind of tricky; sometimes they're buried at the bottom of an object graph in hard–to–find places. So some clever chap hit upon the realisation that, if we're relying on a shared file system for some of our state, we might as well rely on it for all of the state. So instead of serializing the objects and sending them back to the client, we serialize them all to disk, and just send the client a pointer to the file on disk. All good ?
Well, then along comes the next problem. Someone adds a non–serializable attribute to one of these objects. Of course, it's buried at the bottom of a huge graph of objects whose main job is something compeltely different to holding conversational state, so no–one spots until it gets deployed live, and suddenly all kinds of edit operations are throwing NotSerializableExceptions. Great. Sorry, everyone.
So, we write some mildly heroic custom infrastructure that looks through the objects it's about to serialize, spots any non–serializable classes, and calls a special beforeSerialize / afterDeserialize hook to allow the object to convert itself properly, passing it whatever service objects it might need.
Now is it all fixed? Well, no, actually it's not. Because if we release a new version of the code, and the new version changes one of the serialized classes, then anyone who's in the middle of an editing process when we release the code, is going to find that their nicely serialized state is no longer compatible, and is going to staring at something like local class incompatible: stream classdesc serialVersionUID = -6586187098630577013, local class serialVersionUID = -8144714009347234947 . Great.
There has got to be a better solution to this; and I can't help thinking that it probably just involves a form with a big bunch'o hidden fields with all the previously–submitted data in. Oh well, back to the old skool we go…