All entries for Wednesday 07 November 2007
November 07, 2007
- starting point: one machine, running the whole stack
- scaling from 1->n app servers is hard
– backing up large datasets is hard
– scaling filesystems to large numbers of clients is hard
– inter-app communications is hard
– managing traffic spikes is hard
– managing load fluctuations is hard
- Amazon offerings: s3: 15c/GB; ec2 10c cpu/hour
- s3: redundant storage; as much as you like. 5GB per object (large files have to be split). APIs are HTTP or BitTorrent
- s3 buckets are a single-level hierarchies (each one must have a unique name). Each bucket contains key-value tuples
- APIs for most languages
- EC2: linux 2.6.16 Xen images; Can have small, medium or large servers; large is 4 dual-core processors, 15GB RAM, 1.6TB storage. Storage is not persistent; when the instance is spun down the storage is lost
- ACLs for host/port access
- commandline toolsets for stopping/starting instances
- SQS: reliable messaging service. 256KB message payload. pay per-message (10c/1000 messages). Basic permissions model
– backup (s3sync.rb: like rsync)
– S3 asset host: Use S3 to hold large files rather than serving them locally.
– S3: authenticated downloads; s3 can make files publicly available for a short period of time (~2 s) if the user presents a specific token
– rails: attachment_fu does this for you automatically
-load fluctuation: start additional servers with cron for busy periods; or even use monitoring system to detect h igh load and spin up new machines
- to make storage persistent; have a slave database which syncs periodically to s3
- Problems: dynamic IP addresses makes it hard to manage the loadbalancing, can’t specify DB server very well.
- Solution: run DB server in-house, just use EC2 for app servers for surplus capacity; VPN through from your DC to EC2. Still problems with latency between your DC and EC2
- http://g.ho.st – entirely built with AWS. No datacentre of their own
- High quality and low cost; occasional problems with latency/integrity and vendor lock-in
James Kalbach, LexisNexis
- Metadata: Owner-created (controlled vocabulary), technically generated (search-generated), user-generated (tagging). Which to use is situational; each has pros and cons
- tagging != tag cloud; there are other ways to present it
- Why do people tag? 1: to find stuff for themselves 2: To share with others
- 3 stages of tagging: creating tags; using and managing your own tags; using other peoples tags
- UIs need to encourage tagging for it to work; make the field (a) visible, and (b) large if you want rich and abundant tags (Which you do)
- make suggestions for tags
- provide multiple views onto tags – favourite, recent, popular
- don’t make tags be the only way of searching for resources
- allow of combinations of tags (e.g. del.icio.us related tags)
- allow slicing+dicing: grouped tag clouds
- Good tag-driven sites: LibraryThing; Buzzilions;
- tag clouds are inaccessible unless they include some kind of textual indicator of the number of tags rather than just relying on fontsize
David Recorden, Martin Paljak
- Decentralised, lightweight
- reduce the number of usernames and passwords needed online
- supported by lots of geeky tools, and increasing numbers of development toolsets. starting to get penetration in larger service companies.
- end-user tools from sxipper, symantec, verisign.
- Estonian smartcard system – used for all kinds of e-services. Uses openID behind the scenes to manage SSO
- Gives users more control over their identity data. Services only need to get identity, not personal information, so users don’t need to have multiple privacy policies.
- Need the right hardware and software to use it. card + PIN verification
- Developers don’t like it, in part because of the cost of getting an SSL-enabled site (need a distinct IP address and a certificate)
- Mobile-ID: Same data from the smartcard, on a GSM SIM; but the implementation is totally different. Websites allow you to enter a phone number as an ID; you get sent a confirmation text, use a PIN to reply (PIN stays on the phone), can then continue logged in
- Anonymity: anonymity is a priviledge; open.id.ee provides partial anonymity
- OpenID 2: multiple identities. Can have an openID with no personally-identifiable information in the ID. provides anonymity whilst still allowing sites to assert that these are real, unique people.
- Other EU countries deploying openID. OpenID is designed for interop.
1st wave of social networks – forums, message boards
2nd wave – make friends through friends
3rd wave – tool assisted, as networks start to get more meaning
- The real mobile social netowork is the phone’s address book
- The networked computer is where the social networking revolution is currently happening, but 85% of the world doesn’t have this. As phones become networked mobile computers, there will be an explosive expansion of social networking
- Cellphones revolutionised communication; portable computing will do the same
- social networks are currently too geeky; they require you to be sat down and on your own. portable computing allows you rely on bodies rather than on screens.
- Killer app for geeks: Where the application layer can make inferences based on social data
- Killer app for normal people: When you meet someone, how can the mobile help? could the phone gather social network data from people who are close?
- Casual Computing: Quicker, more engaging. People won’t dedicate as much attention to mobile devices as they do when sat in front of a computer
- search centric – as device storage increases, search-based navigation is more important
Writing about web page http://www.microformats.org
Centralised vs. Decentralised: Smart vs. Dumb
- The internet is a dumb, decentralised network, but each component is just smart enough to get by, and this leads to the whole being very resilient to complexity
- RDF vs HTML: RDF is too complex and hard to publish. Anyone can make HTML
- If markup were robots, XML/RDF would be mechas – big, cool, but not for everyone, HTML would be a nanobot – Dumb, ubiquitous, and reliant on network effects to acheive anything
- People first, machines second. Given a trade-off between ease of publication and ease of machine-consumption, favour publication heavily.
- Microformats are an 80/20 solution; some use cases just aren’t do-able with microformats. That’s a deliberate design choice. Go for the low-hanging fruit
- Microformat building blocks:
– the rel attribute on links and anchors: rel=”license”, XFN ( rel=”friend met colleague” etc)
– the class attribute: hCard, class=”vcard”, class=”fn url”; hCalendar
– hCalendar is an example where a microformat has had to give way on the ease-of-publication vs. ease-of-consumption scale; dates are specified twice;
[abbr title="2007-11-05" class="dtstart"] November 5th [/abbr]
(2007-11-05 is an ISO standard format for dates)
- microformats are seeds
- expose – CSS.
- Discover – Firefox Operator
- Convert – technorati XSL service
– these are problems with HTML generally; microformats don’t make this easier or harder
– grey goo – explosion of formats – there’s an established process for developing new microformats, to control adoption.
– Microformats vs. POSH (Plain Old Semantic HTML)
- Community: wiki, irc, email, blog (tag microformats)
- The future
– portable social networks
– syndicated contact details
– semantic web (little-s little-w). No RDF!