All entries for Wednesday 07 November 2007

November 07, 2007

Using someone else's infrastructure: utilising amazon s3 and ec2

  • starting point: one machine, running the whole stack
  • scaling from 1->n app servers is hard
    – backing up large datasets is hard
    – scaling filesystems to large numbers of clients is hard
    – inter-app communications is hard
    – managing traffic spikes is hard
    – managing load fluctuations is hard
  • Amazon offerings: s3: 15c/GB; ec2 10c cpu/hour
  • s3: redundant storage; as much as you like. 5GB per object (large files have to be split). APIs are HTTP or BitTorrent
  • s3 buckets are a single-level hierarchies (each one must have a unique name). Each bucket contains key-value tuples
  • APIs for most languages
  • EC2: linux 2.6.16 Xen images; Can have small, medium or large servers; large is 4 dual-core processors, 15GB RAM, 1.6TB storage. Storage is not persistent; when the instance is spun down the storage is lost
  • ACLs for host/port access
  • commandline toolsets for stopping/starting instances
  • SQS: reliable messaging service. 256KB message payload. pay per-message (10c/1000 messages). Basic permissions model
  • Uses:
    – backup (s3sync.rb: like rsync)
    – S3 asset host: Use S3 to hold large files rather than serving them locally.
    – S3: authenticated downloads; s3 can make files publicly available for a short period of time (~2 s) if the user presents a specific token
    – rails: attachment_fu does this for you automatically
    -load fluctuation: start additional servers with cron for busy periods; or even use monitoring system to detect h igh load and spin up new machines
    - to make storage persistent; have a slave database which syncs periodically to s3

    - Problems: dynamic IP addresses makes it hard to manage the loadbalancing, can’t specify DB server very well.
    - Solution: run DB server in-house, just use EC2 for app servers for surplus capacity; VPN through from your DC to EC2. Still problems with latency between your DC and EC2

  • – entirely built with AWS. No datacentre of their own
  • High quality and low cost; occasional problems with latency/integrity and vendor lock-in

Designing Tag navigation

James Kalbach, LexisNexis

  • Metadata: Owner-created (controlled vocabulary), technically generated (search-generated), user-generated (tagging). Which to use is situational; each has pros and cons
  • tagging != tag cloud; there are other ways to present it
  • Why do people tag? 1: to find stuff for themselves 2: To share with others
  • 3 stages of tagging: creating tags; using and managing your own tags; using other peoples tags
  • UIs need to encourage tagging for it to work; make the field (a) visible, and (b) large if you want rich and abundant tags (Which you do)
  • make suggestions for tags
  • provide multiple views onto tags – favourite, recent, popular
  • don’t make tags be the only way of searching for resources
  • allow of combinations of tags (e.g. related tags)
  • allow slicing+dicing: grouped tag clouds
  • Good tag-driven sites: LibraryThing; Buzzilions;
  • tag clouds are inaccessible unless they include some kind of textual indicator of the number of tags rather than just relying on fontsize

OpenID: emerging from web 2.0

David Recorden, Martin Paljak

  • Decentralised, lightweight
  • reduce the number of usernames and passwords needed online
  • supported by lots of geeky tools, and increasing numbers of development toolsets. starting to get penetration in larger service companies.
  • end-user tools from sxipper, symantec, verisign.
  • Estonian smartcard system – used for all kinds of e-services. Uses openID behind the scenes to manage SSO
  • Gives users more control over their identity data. Services only need to get identity, not personal information, so users don’t need to have multiple privacy policies.
  • Need the right hardware and software to use it. card + PIN verification
  • Developers don’t like it, in part because of the cost of getting an SSL-enabled site (need a distinct IP address and a certificate)
  • Mobile-ID: Same data from the smartcard, on a GSM SIM; but the implementation is totally different. Websites allow you to enter a phone number as an ID; you get sent a confirmation text, use a PIN to reply (PIN stays on the phone), can then continue logged in
  • Anonymity: anonymity is a priviledge; provides partial anonymity
  • OpenID 2: multiple identities. Can have an openID with no personally-identifiable information in the ID. provides anonymity whilst still allowing sites to assert that these are real, unique people.
  • Other EU countries deploying openID. OpenID is designed for interop.

Kwame Ferriera: Mobile to Web and back

1st wave of social networks – forums, message boards
2nd wave – make friends through friends
3rd wave – tool assisted, as networks start to get more meaning

  • The real mobile social netowork is the phone’s address book
  • The networked computer is where the social networking revolution is currently happening, but 85% of the world doesn’t have this. As phones become networked mobile computers, there will be an explosive expansion of social networking
  • Cellphones revolutionised communication; portable computing will do the same
  • social networks are currently too geeky; they require you to be sat down and on your own. portable computing allows you rely on bodies rather than on screens.
  • Killer app for geeks: Where the application layer can make inferences based on social data
  • Killer app for normal people: When you meet someone, how can the mobile help? could the phone gather social network data from people who are close?
  • Casual Computing: Quicker, more engaging. People won’t dedicate as much attention to mobile devices as they do when sat in front of a computer
  • search centric – as device storage increases, search-based navigation is more important

Microformats – The nanotechnology of the web

Writing about web page

Jeremy Keith

Centralised vs. Decentralised: Smart vs. Dumb

- The internet is a dumb, decentralised network, but each component is just smart enough to get by, and this leads to the whole being very resilient to complexity
- RDF vs HTML: RDF is too complex and hard to publish. Anyone can make HTML
- If markup were robots, XML/RDF would be mechas – big, cool, but not for everyone, HTML would be a nanobot – Dumb, ubiquitous, and reliant on network effects to acheive anything
- People first, machines second. Given a trade-off between ease of publication and ease of machine-consumption, favour publication heavily.
- Microformats are an 80/20 solution; some use cases just aren’t do-able with microformats. That’s a deliberate design choice. Go for the low-hanging fruit

  • Microformat building blocks:
    – the rel attribute on links and anchors: rel=”license”, XFN ( rel=”friend met colleague” etc)
    – the class attribute: hCard, class=”vcard”, class=”fn url”; hCalendar
    – hCalendar is an example where a microformat has had to give way on the ease-of-publication vs. ease-of-consumption scale; dates are specified twice;
    [abbr title="2007-11-05" class="dtstart"] November 5th [/abbr]    

(2007-11-05 is an ISO standard format for dates)

- microformats are seeds
- expose – CSS.
- Discover – Firefox Operator
- Convert – technorati XSL service

  • problems:
    – Spam
    – trust
    – these are problems with HTML generally; microformats don’t make this easier or harder
    – grey goo – explosion of formats – there’s an established process for developing new microformats, to control adoption.
    – Microformats vs. POSH (Plain Old Semantic HTML)
  • Community: wiki, irc, email, blog (tag microformats)
  • The future
    – portable social networks
    – syndicated contact details
    – semantic web (little-s little-w). No RDF!

Most recent entries


Search this blog

on twitter...


    Not signed in
    Sign in

    Powered by BlogBuilder
    © MMXXI