January 19, 2007

New tools

I’ve been playing with some new bits of technology. Not very new, but new to me, anyway. JSON and BeautifulSoup.

BeautifulSoup is a Python library, now ported into all good dynamic languages (I’m using the Ruby version), which parses HTML. It’s defining feature is that it’s very relaxed about well-formed-ness. If your markup is fully-validating XHTML, all good. If it’s horrible HTML 3.2 tag soup with unbalanced divs and unclosed tables, that’s cool too. Soup will make a pretty good job, parsing what it can with a DOM, and falling back to regexes, special-cases, and hacks for the rest. Having parsed the markup, it gives you a nice DOM tree, which you can traverse or search as you’d expect.

I’m using it to scrape some information from a webpage, which I then expose as data via a web service, which is where the JSON bit comes in. JSON is a data-transfer language, functionally equivalent to XML, but expressed as Javascript arrays and hashes.
So rather than having to parse a heap of XML, which is awkward and platform-dependent in javascript, you can just eval the JSON string (escaping as needed if you don’t trust your source), and have a pre-loaded object graph spring into existence.

My JSON is loaded after page-loading via a prototype Ajax.Request. The onComplete function evals the JSON, then sets up the innerHTML for a div, based on the objects it got back. It’s really very straightforward – even for a javascript newbie like me.

- 14 comments by 4 or more people Not publicly viewable

[Skip to the latest comment]
  1. Robert O'Toole

    I’ve been using JSON to load data into the SIMILE timeline tool. I have a servlet that converts RSS from a blog into the JSON array, so that I can load a feed into a Sitebuilder page from blogs. It works well.

    22 Jan 2007, 14:32

  2. Chris May

    If we made JSON feeds available in Sitebuilder for things like table-of-tags / table-of-contents / recent changes (and potentially in blogs too), would that be of any use/interest to you? I have a golden hammer now, I need to find some nails :-)

    22 Jan 2007, 14:42

  3. Robert O'Toole

    It’s certainly easier to use (and hence more reliable) than the XML DOM. It would probably make me less reliant on my server side XSL convertor, a I would process the feeds in Javascript. JSON feeds from blogs and forums would also get around the xml loading security barrier in the browser, which would be really good.

    Yes, I would use it.

    How about in Flash? I sit easier to load JSON data?

    22 Jan 2007, 15:02

  4. Chris May

    I don’t know much about flash*, but it seems as if it’s do-able. Google have an actionscript library that includes JSON manipulation. However, given that Flash has quite a lot of XML-handling stuff, I doubt it’s much easier.

    If we provided feeds in the web arch. applications, they’d be alongside XML/RSS – so you could choose which ever one is most convenient.

    * other than that it’s the work of the devil, and to be avoided at all costs :-)

    22 Jan 2007, 15:15

  5. Steven Carpenter

    * other than that it’s the work of the devil, and to be avoided at all costs :-)

    I’m not reacting to that. :-p

    22 Jan 2007, 15:25

  6. Steven Carpenter

    On a more productive note, there’s a JSON handler class available to Flash, but like Chris says the work of the devil can already manipulate and read XML, so unless there’s a specific reason to use it over the XML parser I’d stick to that.

    22 Jan 2007, 15:30

  7. Robert O'Toole

    Will Flash load xml across “domains”? For example, a blog feed loaded into a Flash app on a Sitebuilder page?

    For security reasons, a Macromedia Flash movie playing in a web browser is not allowed to access data that resides outside the exact web domain from which the SWF originated.

    As an enhancement to Macromedia Flash Player 7, domains must be identical for data to be read. With this change a sub-domain can no longer read data from a parent domain and vice versa.


    ...cross-domain policy files. A policy file is a simple XML file that gives the Flash Player permission to access data from a given domain without displaying a security dialog.


    Steve – have you tried this?

    22 Jan 2007, 16:01

  8. Steven Carpenter

    Yes – we already have those in place where necessary. Flash won’t load data across domains without them, so I’m guessing this would go for JSON too.

    22 Jan 2007, 16:10

  9. Nick Howes

    In addition to JSON output, I wonder if Sitebuilder has any places where it could take it as input. Perhaps page types could use it for storing metadata, rather than adding yet more columns to the creaking tables. Though of course you lose the advantage of being able to index and query within the data.

    I found JSON useful for a website where I wanted a datasource that the owner could edit easily without having to manipulate any database tables; making a small JSON file worked quite well, with a PHP library to read it in (though which annoyingly chokes on tab characters!)

    23 Jan 2007, 00:01

  10. Chris May

    I don’t think I’d be keen on storing JSON data. I think if I had to store some kind of semi-structured data I’d chose XML rather than JSON, since there’s some limited support for querying XML within oracle. But I can’t really think of that many cases where I wouldn’t just want to create either new columns, or a new 1:1 table instead.

    Of course, users who want a chunk of JSON persisted with the page have an easy mechanism for doing so right now – just stick it in the head, inside a script tag!

    Taking JSON as an input to APIs is moderately appealing – so one could imagine an API for updating a page that took a JSON representation of the metadata. But I think I’d need to see a client that could use it before I actually wrote such an API. XML or YAML seems like it would probably get more of the market as far as scripted updates are concerned.

    23 Jan 2007, 09:49

  11. Nick Howes

    Looking at my second paragraph, I realise I got my acronyms mixed up because I actually used YAML there, not JSON. Schoolboy error!

    24 Jan 2007, 20:41

  12. Chris May

    The point still stands, I think; yaml and json are very similar in syntax (as (in a recursive-bracketty way) are lisp s-expressions). If you want to express a small amount of data in a way that moderately technical end users can understand and edit, both work pretty well.

    I’d still favour XML for persistance though, because the tool support is better. At least until I can get Oracle JSONDB and a JPath query engine :-)

    24 Jan 2007, 22:07

  13. Robert O'Toole

    Is Oracle XML easy? I’ve been using SQLXML in SQLServer. It does an acceptable job of automatically generating an xml schema from a simple query (one without too many joins), but anything more complex required me to use the horrendous templating system for mapping fields in the resultset to xml elements and attributes. Is Oracle any easier? If I could do the XML queries in Oracle then I could stop using my SQLServer, which would be good.

    25 Jan 2007, 11:22

  14. Chris May

    Don’t know :-) Whilst I know that it exists, I’ve never actually used it. I prefer to just keep relational data in my RDBMS, and if I really have to query XML then I’ll use XPATH on a file! If you want to have a play, I’m sure we could set one up.

    25 Jan 2007, 15:10

Add a comment

You are not allowed to comment on this entry as it has restricted commenting permissions.

Most recent entries


Search this blog

on twitter...


    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMXXI