All entries for Monday 14 March 2005
March 14, 2005
Writing about web page http://conferences.oreillynet.com/cs/et2005/view/e_sess/6244
Ben Hammersley (sporting his UtiliKilt)
- RSS / RDF / Atom are syndication formats. RSS has lots of different formats, largely due to it's politicized development process
- Atom: is currently only at v 0.5; is changing quite fast.
- Atom mandates a lot more stuff than RSS
- RSS 2.0 great for machine readable lists. RSS1 good for super-complex interlinked document mining. Atom learns from the experience of RSS1 & 2 and sits somewhere between them.
- [ a diversion about the philosophical origins of the word 'atom' ensues …]
- 5 atomic facts about a document: who (created it), when (it was created), where (it is) , what (it's called), what (it contains). An atom document must state those 5 facts about a document, because if you don't capture that up front you can never get it back with certainty. note to self: how does atom deal with modifications?
- Key concept: Resource - the document + all the data about it. Representation – some view (e.g. html) of the document in an application.
- 2 types of document: an entry and a feed. An entry is the resource in XML - the content and all the metadata.
- Constructs – extension points in the atom schema. 6 types: text, person, date, link, category, identity, service. Ben asserts that these 6 are sufficient.
- I wonder what we could do with an atom entry for every sitebuilder page? It would be a simple alternate rendition
- Ben Hammersley fancies himself to be the Wittgenstein of web syndication ( © John Dale 2005 :-)) Where are his bees?
- Feeds - a collection of documents (entries) plus it's own metadata. A feed is a query over resources. note to self: support for projections?
- Feeds of Feeds are supported, since a feed is also a resource. (an entry can describe a feed)
- History: BloggerAPI, Metaweblog API (XML-RPC / SOAP)
- Atom uses REST
- [ a brief interlude about REST ] Ben's interpretation of PUT/POST is not what I understood: He interprets POST=create, PUT=replace; I would have said PUT=create, POST=update. Must investigate (doubtless I am wrong!)
- note to self: could we define individual warwickgroups as atom entries?
- An atom API call is an atom entry document sent over HTTP with the appropriate verb (method)
- atom endpoints: 4 per system; postURI (one per system), editURI (one per resource), FeedURI (one per query), ResourcePostURI (one per system)
- Adding a link rel="service.post" element to a page makes the PostURI discoverable by atom API-aware clients. similarly for service.edit (edit an entry), service.feed (get feed) etc.
- Features: inherits from lower-level protocols e.g. internationalisation (use XML) authentication, cacheing (use HTTP), encryption (use SSL). Keeps the spec small; means clients must be multi-protocol aware.
- Versions: Each version is its own resource (or you could create each diff as it's own resource), and you use a link element with an app-specific tag to indicate that each resource in a version history is related to the same 'document'.
- Documentcentrism – input—>content stored as atom entries—>view
- inputs: atom api, file creation, other interfaces ; output: html, XML, RSS …
- Ben demos his simplest-possible atom CMS, which does HTML and RSS representations of queries over atom resources. In use at the Guardian/Observer, for building their media blog.
- Using apache + http content negotiation is a neat way of having 1 URI serve multiple different renditions of content.
- Atom doesn't specify how to do locking of resources – defers to the application. Not obvious to me how you would communicate to the server that you wished to lock a resource, though it's easy enough to communicate back a lock failure (HTTP 503 resource unavailable or something). Maybe you could represent locks as entries in their own right?
So, I'm killing time before lunch: Noting some cool things
del.icio.us bundles – group your tags! Go to the settings/tags page but replace 'tags' with 'bundle'
del.icio.us in-page categorisation. Kind of hard to describe, but way cool. There's a flash demo here
Alan Taylor, Cal Henderson (flikr), Eric Benson
- Amazon light – glues amazon to yahoo, google, gmail, blogger, libraries (using Jon Udell's LibraryLookup OPAC integration). Start with a book then use the other services to do stuff with/about that book.
- Need to be very aware of the Terms of Service for API use. Providers may change this without regard to your requirements!
- 'small w' web services – alpha/beta code; not enterprise-level. Not necessarily SOAP/XML-RPC – could be anything.
- What's available?
* Amazon, Google, Technorati, Flikr
* ebay, salesforce.com, paypal, o'R. Safari, Weather.com, GigaBlast (RSS-based web search engine).
* RSS feeds from news/media corps/orgs.
* screenscrapes / include files.
- Use with care – people don't always build these interfaces with 3rd party application development in mind (esp. RSS feeds / screen-scrapes which are optimised for readers)
- Mash-em-up: Look for shared keys from tags/metadata to join different sources. e.g. google categories->flikr tags
- Adding your own information/content is safer/easier than trying to join 2 third-pary services.
- most services require a dev. token to use
- flikr api
* Amazon are working on a 'remote shopping cart' – keep your own look and feel while the user builds their basket, then dumps you into amazon for checkout
* XSL transforms – can reskin amazon based on a set of XSL stylesheets
- amazon own Alexa & IMDB, though the APIs for those two are semi-private
- Google: Search, cache, spellcheck. Advanced Search queries make for a fairly powerful API frontend
* SOAP only; 1000 requests/day max
- Technorati: Query, keywords, tags, pinging (info technorati dev wiki)
- Cache: You must cache your calls if you want to be even remotely nice
- Talk to the data owners; they'll find out sooner or later anyway and it's better to be talking to them from the start.
- Dropcash – fundraising tool. Uses typekey for authentication and paypal for payment
- AllConsuming.net – what books are people blogging about ?
CH: (is a good speaker and very droll.)
- Flikr: The centre of a big distributed DB . Only the UI is photo-centric
- Once you provide the API, people build new ways to get data in and out
- You do need to store data that people care about
- everything happens over HTTP (except the odd bit of SMTP)
- flikr does SOAP/XML-RPC/REST
- REST is the coolest at the moment. Way simple. 90% of flikr users use REST.
- Flikr use a wrapper for their REST responses to indicate status (why?)
- Page-scraping (HTML-over-HTTP) – volatile, bandwidth-greedy
- Being transport-agnostic is a good thing – once you've defined your domain, it's easy to offer multiple formats.
- Beware of 'shitty coders' – people who scape the site and pull in large numbers of pages, especially API abusers e.g. the windows flikr screensaver that checks for new photos every 2 seconds. with 100 users that's 50 hits/second.
- provide client bindings/libs which do cacheing – naieve programmers will use the bindings
- Cache on the host too.
- Monitor closely
- Authentication: flikr use query params for authentication – sends pwds over clear text (lame-o)
- Http BASIC is next step up from that; HTTPS means you can keep using query params since they're encrypted on the wire.
Learning points from flikr:
- Be Open – don't hide any of your data
- Be protocol agnostic
- Be careful of abuse – use dev. tokens so you can block abusers easily
- Robot co-op make 43 things – creating communities based on shared wishes.
- trying to enable machines to talk about humans e.g. flikr is just computers talking to more computers, but it's a community because they're talking about things people care about
- 43 things read API: people, goals, tags, entries, cities, teams
- write api: add/update/remove goals, write entries
- I wonder if we could make a Warwick43Things ?
- Most people aren't worrying too much about how clean their REST APIs are – query strings are OK! If the data is interesting enough, users will learn the API
– How can you make money off this stuff? [A] You don't make money off web apis, you make a service that people want to pay for, and then provide the APIs to attract more people. e.g. Amazon / Ebay uploader / storefront tools
– What will happen with paying for higher-volume access [A] it's on the way, but it's happening on a one-off basis. I wonder if we should talk to flikr re. galleries in WB ?
- Flikr color wheel works by scraping all new images off flikr periodically, cacheing locally, and scans them for color-balance.
W00T. I'm in sunny* San Diego for the O'Reilly Emerging Technology 2005 conference. Hopefully, I should learn all kinds of new and cool tricks to do with HTTP, REST interfaces, RDF, RSS/Atom, and all that jazz. Or maybe I'll just get to spend a week somewhere warm with palm trees. Either way it's a win.
Right now, though, I'm just wondering how it is that apple can get away with spoiling the otherwise-flawless Powerbook Ti with a battery life that is nothing more than tEh SUX0r. 2 hours indeed. Bah.
* Actually not that sunny. But it's 67F, dry, with sunny spells, which is a fair bit better than Coventry right now.