All entries for Monday 14 March 2005

March 14, 2005

Building Apps with RSS, Atom, and the Atom API

Writing about web page

Ben Hammersley (sporting his UtiliKilt)

  • RSS / RDF / Atom are syndication formats. RSS has lots of different formats, largely due to it's politicized development process
  • Atom: is currently only at v 0.5; is changing quite fast.
  • Atom mandates a lot more stuff than RSS
  • RSS 2.0 great for machine readable lists. RSS1 good for super-complex interlinked document mining. Atom learns from the experience of RSS1 & 2 and sits somewhere between them.
  • [ a diversion about the philosophical origins of the word 'atom' ensues …]
  • 5 atomic facts about a document: who (created it), when (it was created), where (it is) , what (it's called), what (it contains). An atom document must state those 5 facts about a document, because if you don't capture that up front you can never get it back with certainty. note to self: how does atom deal with modifications?
  • Key concept: Resource - the document + all the data about it. Representation – some view (e.g. html) of the document in an application.
  • 2 types of document: an entry and a feed. An entry is the resource in XML - the content and all the metadata.
  • Constructs – extension points in the atom schema. 6 types: text, person, date, link, category, identity, service. Ben asserts that these 6 are sufficient.
  • I wonder what we could do with an atom entry for every sitebuilder page? It would be a simple alternate rendition
  • Ben Hammersley fancies himself to be the Wittgenstein of web syndication ( © John Dale 2005 :-)) Where are his bees?
  • Feeds - a collection of documents (entries) plus it's own metadata. A feed is a query over resources. note to self: support for projections?
  • Feeds of Feeds are supported, since a feed is also a resource. (an entry can describe a feed)

Atom API

  • History: BloggerAPI, Metaweblog API (XML-RPC / SOAP)
  • Atom uses REST
  • [ a brief interlude about REST ] Ben's interpretation of PUT/POST is not what I understood: He interprets POST=create, PUT=replace; I would have said PUT=create, POST=update. Must investigate (doubtless I am wrong!)
  • note to self: could we define individual warwickgroups as atom entries?
  • An atom API call is an atom entry document sent over HTTP with the appropriate verb (method)
  • atom endpoints: 4 per system; postURI (one per system), editURI (one per resource), FeedURI (one per query), ResourcePostURI (one per system)
  • Adding a link rel="" element to a page makes the PostURI discoverable by atom API-aware clients. similarly for service.edit (edit an entry), service.feed (get feed) etc.
  • Features: inherits from lower-level protocols e.g. internationalisation (use XML) authentication, cacheing (use HTTP), encryption (use SSL). Keeps the spec small; means clients must be multi-protocol aware.
  • Versions: Each version is its own resource (or you could create each diff as it's own resource), and you use a link element with an app-specific tag to indicate that each resource in a version history is related to the same 'document'.

part 2

  • Documentcentrism – input—>content stored as atom entries—>view
  • inputs: atom api, file creation, other interfaces ; output: html, XML, RSS …
  • Ben demos his simplest-possible atom CMS, which does HTML and RSS representations of queries over atom resources. In use at the Guardian/Observer, for building their media blog.
  • Using apache + http content negotiation is a neat way of having 1 URI serve multiple different renditions of content.
  • Atom doesn't specify how to do locking of resources – defers to the application. Not obvious to me how you would communicate to the server that you wished to lock a resource, though it's easy enough to communicate back a lock failure (HTTP 503 resource unavailable or something). Maybe you could represent locks as entries in their own right?

Sitting around

So, I'm killing time before lunch: Noting some cool things bundles – group your tags! Go to the settings/tags page but replace 'tags' with 'bundle' in-page categorisation. Kind of hard to describe, but way cool. There's a flash demo here

Etech back–channels

Where else is stuff happening?

irc: / #etech.


Web Services Mash–up

Alan Taylor, Cal Henderson (flikr), Eric Benson

AT :

  • Amazon light – glues amazon to yahoo, google, gmail, blogger, libraries (using Jon Udell's LibraryLookup OPAC integration). Start with a book then use the other services to do stuff with/about that book.
  • Need to be very aware of the Terms of Service for API use. Providers may change this without regard to your requirements!
  • 'small w' web services – alpha/beta code; not enterprise-level. Not necessarily SOAP/XML-RPC – could be anything.
  • What's available?

* Amazon, Google, Technorati, Flikr

* ebay,, paypal, o'R. Safari,, GigaBlast (RSS-based web search engine).

* RSS feeds from news/media corps/orgs.

* screenscrapes / include files.

  • Use with care – people don't always build these interfaces with 3rd party application development in mind (esp. RSS feeds / screen-scrapes which are optimised for readers)
  • Mash-em-up: Look for shared keys from tags/metadata to join different sources. e.g. google categories->flikr tags
  • Adding your own information/content is safer/easier than trying to join 2 third-pary services.
  • most services require a dev. token to use
  • flikr api
  • Amazon

* Amazon are working on a 'remote shopping cart' – keep your own look and feel while the user builds their basket, then dumps you into amazon for checkout

* XSL transforms – can reskin amazon based on a set of XSL stylesheets

  • amazon own Alexa & IMDB, though the APIs for those two are semi-private
  • Google: Search, cache, spellcheck. Advanced Search queries make for a fairly powerful API frontend

* SOAP only; 1000 requests/day max

  • Technorati: Query, keywords, tags, pinging (info technorati dev wiki)
  • Considerations: Terms of use, attribution/credit/licensing; act conservatively with others' resources; handle failure gracefully when you have a long pipeline/chain of requests.
  • Cache: You must cache your calls if you want to be even remotely nice
  • Talk to the data owners; they'll find out sooner or later anyway and it's better to be talking to them from the start.
    Real-world examples
  • Mappr
  • Dropcash – fundraising tool. Uses typekey for authentication and paypal for payment
  • – what books are people blogging about ?

CH: (is a good speaker and very droll.)

  • Flikr: The centre of a big distributed DB . Only the UI is photo-centric
  • Once you provide the API, people build new ways to get data in and out
  • You do need to store data that people care about
  • everything happens over HTTP (except the odd bit of SMTP)
  • flikr does SOAP/XML-RPC/REST
  • REST is the coolest at the moment. Way simple. 90% of flikr users use REST.
  • Flikr use a wrapper for their REST responses to indicate status (why?)
  • Page-scraping (HTML-over-HTTP) – volatile, bandwidth-greedy
  • Being transport-agnostic is a good thing – once you've defined your domain, it's easy to offer multiple formats.
  • Beware of 'shitty coders' – people who scape the site and pull in large numbers of pages, especially API abusers e.g. the windows flikr screensaver that checks for new photos every 2 seconds. with 100 users that's 50 hits/second.
  • provide client bindings/libs which do cacheing – naieve programmers will use the bindings
  • Cache on the host too.
  • Monitor closely
  • Authentication: flikr use query params for authentication – sends pwds over clear text (lame-o)
  • Http BASIC is next step up from that; HTTPS means you can keep using query params since they're encrypted on the wire.

Learning points from flikr:

  • Be Open – don't hide any of your data
  • Be protocol agnostic
  • Be careful of abuse – use dev. tokens so you can block abusers easily


  • Robot co-op make 43 things – creating communities based on shared wishes.
  • trying to enable machines to talk about humans e.g. flikr is just computers talking to more computers, but it's a community because they're talking about things people care about
  • 43 things read API: people, goals, tags, entries, cities, teams
  • write api: add/update/remove goals, write entries
  • I wonder if we could make a Warwick43Things ?
  • Most people aren't worrying too much about how clean their REST APIs are – query strings are OK! If the data is interesting enough, users will learn the API


– How can you make money off this stuff? [A] You don't make money off web apis, you make a service that people want to pay for, and then provide the APIs to attract more people. e.g. Amazon / Ebay uploader / storefront tools

– What will happen with paying for higher-volume access [A] it's on the way, but it's happening on a one-off basis. I wonder if we should talk to flikr re. galleries in WB ?

  • Flikr color wheel works by scraping all new images off flikr periodically, cacheing locally, and scans them for color-balance.

I'm at ETech

W00T. I'm in sunny* San Diego for the O'Reilly Emerging Technology 2005 conference. Hopefully, I should learn all kinds of new and cool tricks to do with HTTP, REST interfaces, RDF, RSS/Atom, and all that jazz. Or maybe I'll just get to spend a week somewhere warm with palm trees. Either way it's a win.

Right now, though, I'm just wondering how it is that apple can get away with spoiling the otherwise-flawless Powerbook Ti with a battery life that is nothing more than tEh SUX0r. 2 hours indeed. Bah.

* Actually not that sunny. But it's 67F, dry, with sunny spells, which is a fair bit better than Coventry right now.

Most recent entries


Search this blog

on twitter...


    Not signed in
    Sign in

    Powered by BlogBuilder