Friday 19 January 2007

January 19, 2007

New tools

I’ve been playing with some new bits of technology. Not very new, but new to me, anyway. JSON and BeautifulSoup.

BeautifulSoup is a Python library, now ported into all good dynamic languages (I’m using the Ruby version), which parses HTML. It’s defining feature is that it’s very relaxed about well-formed-ness. If your markup is fully-validating XHTML, all good. If it’s horrible HTML 3.2 tag soup with unbalanced divs and unclosed tables, that’s cool too. Soup will make a pretty good job, parsing what it can with a DOM, and falling back to regexes, special-cases, and hacks for the rest. Having parsed the markup, it gives you a nice DOM tree, which you can traverse or search as you’d expect.

I’m using it to scrape some information from a webpage, which I then expose as data via a web service, which is where the JSON bit comes in. JSON is a data-transfer language, functionally equivalent to XML, but expressed as Javascript arrays and hashes.
So rather than having to parse a heap of XML, which is awkward and platform-dependent in javascript, you can just eval the JSON string (escaping as needed if you don’t trust your source), and have a pre-loaded object graph spring into existence.

My JSON is loaded after page-loading via a prototype Ajax.Request. The onComplete function evals the JSON, then sets up the innerHTML for a div, based on the objects it got back. It’s really very straightforward – even for a javascript newbie like me.

