I’ve been playing with some new bits of technology. Not very new, but new to me, anyway. JSON and BeautifulSoup.
BeautifulSoup is a Python library, now ported into all good dynamic languages (I’m using the Ruby version), which parses HTML. It’s defining feature is that it’s very relaxed about well-formed-ness. If your markup is fully-validating XHTML, all good. If it’s horrible HTML 3.2 tag soup with unbalanced divs and unclosed tables, that’s cool too. Soup will make a pretty good job, parsing what it can with a DOM, and falling back to regexes, special-cases, and hacks for the rest. Having parsed the markup, it gives you a nice DOM tree, which you can traverse or search as you’d expect.