All entries about Etech
March 17, 2005
Danny O-Brien, Merlin Mann
(DoB: last year's summary)
- hackers ♥ plain text
- my other app is in ~/bin ( keep your own scripts; make lots of glue)
- network, network, network. Share everything. Social organisation rules
This year: It's not just for α-geeks
- decent email search – gmail, lookout, tiger spotlight
- social file sharing for everyone – flikr, novell iFolder, Groove
- easy web scraping – RSS
- keyboard macros for win/linux: c.f. quicksilver
(MM: What's popular on 43Folders right now?)
- GTD - especially implementing it in real life
- lo-fi hacks: whereover possible, think about the simplest way to solve a problem – paper
- make failure difficult (the 'forehead ticket hack')
- quicksilver: like a GUI bash – minimise distractions
- more people wanting to tailor their own working environment
(DoB: What crossed over, and why?)
- text editors – didn't make it into the mainstream
- keyboards: people like them. When you're using the keyboard you can get into the 'zone', mousing disrupts flow
- novice users never get 'flow' so the keyboard doesn't mater for them
- why big screens? (c.f Mary Czerwinski). big screens are more productive and reduce cognitive load – no overlapping windows
- there's no muscle-memory to alt-tab – this is why quicksilver is so much better
- The dark secret of life hacks: turn off the sodding computer. Eliminate navigation, eliminate distraction
this years killer apps
- google suggest (p=1)
- passive informants – dashboard, emacs remembrance, IRC - background info (p=0.7)
- unified notification (Growl API) (p=0.6)
- the nightmare of desktop search (p=0.3)
MM: what is a life hack?
A) It's a way of patching around a problem. What happens if you fix, rather than patch? What comes after the hacks ?
Writing about web page http://www.lessig.org/blog/
- Stop telling the world that we've invented something new and different: There's nothing new here
- cultures have always evolved by remixing
- In order to evolve and remix individuals need to be free to act creatively
- When the tools of remix change, the freedoms must change too
- Right now reuse of technology is not sufficiently free, because a remix requires a copy
- Either we reform the law, or we reform the technology
- connect: to the old guard; call "piracy" wrong; we are not defending the right to copy against the copyright holder's wishes
- teach: show people what technological remix is about
- change: teach people how to change, not abolish, laws that don't work
Gavin Bell, Matt Biddulph, Tom Coates (BBC)
Identifiers If you give something an ID, it becomes addressable. Bar Codes --> MS Aura project, ASINs-->Navigable site + API, Postcode -->maps, UpMyStreet.com
BBC are rolling out UIDs for their programmes: PIPs – a database of information about all the BBC's programmes
Packaging and grouping programmes is complex – they fit into a large variety of different categories. Programmes are sometimes nested e.g. a particular event within Grandstand.
The data is scattered between about 15 assorted systems, with no common key
Logistical problems – last minute scheduling changes, legal rights, size of the bbc
core element: the episode
extra info: brand ('the office','absolutely fabulous'), groups ('season 1','book of the week'), versions.
The web product (done for radio 3 now)
- A page for every episode of every programme that the bbc creates
- structure was a simplified reflection of the data structure to enable navigation between episodes and brands
- every episode uniquely identifiable and addressable for ever
- persistent schedules
- starts to put the infrastructure in place for av-on-demand
Data architecture: inputs
- most consistent source of data is EPG (Electronic programme guide) – but the data here is very sparse – just time and title
- SMEF - Standard Media Exchange Format – very detailed logical model for broadcast info data
- Local production data may be added in automatically, and can be added by hand
- Read-only ReST
- Have an RDF rendition because the data is highly linked
Writing about web page http://conferences.oreillynet.com/cs/et2005/view/e_sess/5947
- hansard went online about 1997
- it's website is utterly utterly lame
- TheyWorkForYou.com are trying to fix some of this by grabbing the output of hasard and re-purposing it
- They parse hansard text and create hyperlinks – e.g. to people, wikipedia, a user-defined glossary
- exposes RSS for everything to allow other people to build on top of it
- supports email alerting for all kinds of events
- generates voting records for each MP
- link in the register of interest for each MP
- In theory parliamentary copyright doesn't allow reuse that may bring parliament into disrepute – but in practice they backed off from issuing a cease & desist about unmoderated comments
- runs on LAMP (except with F-BSD).
- Input files are very dirty HTML with no semantic markup
- parse process works by maintaining a set of diff patches against each day's record to clean up textual errors
- gathering the stats picks up errors in hansard that would otherwise go uncorrected
March 16, 2005
Matthew Haughey, Creative Commons
CC is 2 things:
- a set of liberal licenses
- a search engine to find licensed work
When designing the search infrastructure, it had to be decentralised, metadata-aware, small orgainisation, with existing toolkits. RDF is a good fit
Metadata format: HTML head generally ignored by search engines, robots.txt hacks too limited and hacky, supporting files (too much faff for end users). So they went for RDF documents embedded in HTML comments. This means that you can search for media with a particular re-usability status.
Built a prototype crawler + search engine out of python + postgresql. It's not scalable but it's a proof of concept
Rebuilt the app on Lucene. Lucene is teH r0X0R for building open-source indexing tools; there's an app called nutch which runs on top of lucene which they used. The CC search engine is about 500 lines of java on top of nutch. The search engine now indexes about 10 million pages.
This talk is not very good.
Writing about web page http://conferences.oreillynet.com/cs/et2005/view/e_sess/6117
Premise: The ways we currently try to apply categorisation to the web is wrong, because we're trying to re-apply categorisation techniques from the pre-web world
Assertion: It will get worse before it gets better; things get broken before they get fixed
Parable: Travel agents – Travelocity takes the place of traditional travel agents: they made the mistake of trying to do everything that a travel agent does including helping people make choices about which holiday to take. In fact people don't want an online agent to do this: travelocity made the mistake of transposing the offline categorisation into the online world.
It's not possible to avoid cultural assumptions in categorisation: Dewey Decimal categorisation has 10 religion categories – 9 for aspects of christianity and 1 for 'other'. Similarly the lib. of congress. geographic terms.
Physical ontologies are optimised for physical access – librarians invent classifications to help them find books. If there is no shelf then there's no need for a librarians ontology.
Hierarchical ontologies are fundamentally not suitable for non-physical information – because they're predicated on an object being in one place at one time – which isn't true.
hierarchy—>hierarchy+links. When the number of links becomes large enough, you don't need the hierarchy any more.
browse (hierarchy) —>search(network of links)
when does ontological organisation work well?
- Small domain, formal categories, stable entities, restricted entities, clear edges
- Coordinated expert users (searchers/browsers), expert catalogers, authoritative source
n.b. the web is the diametric opposite of this!
When categorisations are collapsed, there's always some signal loss. Clay's example: if I tag something "queer" and you tag it "homosexual" we probably mean something subtly different. When categorisations are fixed, there will always be errors introduced with time. e.g. "dresden is in east germany"
great minds don't think alike
Usage (number per user) of tags on delicious follows a power law – indicating an organic organisation. Similarly the number of items per tag for an individual user. Looking at the number and distribution of tags for a given URL gives an indication of how clear 'the community' is about the categorisation of the item.
Key point: In a folksonomy, each categorisation is worth less individually than a 'professional' categorisation would have been – but when aggregated they have much more value.
User and Time are important attributes of tags. You need to know who tagged a resource and when, in order to assign a value to the tag. The semantics in a folksonomy are in the users not in the system. When del.icio.us sees OSX it doesn't know that it's an operating system; it just knows that things tagged as OSX are also often tagged as 'Mac' or 'Apple'
Does the world make sense, or do we make sense of the world?
Clay Shirkey (NYU) Phone as platform
NYU have been looking at use of phones in teaching;
- PacManhatten / ConQuest – 'big games' – controllers use phones to control players on a macro scale.
- dodgeball: started as a site for rating nightspots; but evolved into a more general social network. Tells you who else is where you are
- Mobjects – sqeezable controller for driving a bluetooth phone. Squidge it and it sends an SMS. Heartbeat – very similar idea.
- phone is beginning to be used as a platform rather than as a device unto itself. standardisation of comms protocols (bluetooth) is making this easier although phone mfrs are not used to 'hackers'
- Server infrastructure is the key – expose back-end data in ways which phones can use
- voice is increasingly underused.
Tom Igoe ITP / NYU
Physical computing: crossover betwen art & programming – making computer-control of phsical objects simpler. Lots of cool toys / tools, particularly in the space of communicating emotions over a network. I can't easily describe them in a blog post, so hopefully there'll be a slideshow available on the web soon…
Tom Hoffman / Tim Lauer wiki in the classroom
Teaching middle-school children. No managed file space or tech support – looked into wikis. Instiki is a ruby-based wiki that can run on a workstation / laptop. Since the school is mac-based students can use Rendezvous to discover the pages. Since the teachers run the wikis on their own laptops they can run them at home just as easily.
- Benefits: easy, responsive, completed project can be published as static HTML
- Problems: if the teacher's machine is asleep it doesn't work, not all teachers grasped that their laptop had to be in the lesson. It's not a supported technology by the LEA
Trying to get student information into the wiki is difficult because the student records are silo-ed (sound familiar ? :-)) Tim got around this by building an open-source, open-api platform for school student records: SchoolTool. SchoolTool is based around a set of ReST APIs. relationships are modelled as xlinks.
Collection action is sometimes touted as a magic bullet – the idea that and difficult problem can be solved with a large enough group of people. It is true that in the right context, a crowd can be smarter than the smartest person in it. e.g. the average of a large set of guesses about a single value ('guess the weight of the ox'), or the odds of a horse winning – with a large sample, horses with 3/1 odds win about 1/4 of the time.
Works well when the problem has single 'right' answer. Collective wisdom isn't appearing out of consensus, it arises from the variation in answers. Also there's not much interaction between each person.
Contrast with Linux: A large group work on problems but ultimately 1 person writes the code – the decision-making process is highly centralised. Or alternatively the anthill – lots of dumb agents with lots of interconnection and simple rules. However, humans are not ants. We don't do the same efficient interaction; in some scenarios the more we interact the less inteligence the group has overall.
The reasons for this are all basically Herding – 'it's better to fail conventionally than to succeed unconventionally'. People like the comfort of the crowd. Leads to 'information cascade'.
The root of all problems in the world is that man cannot simply sit by himself in his room
- Keep ties loose. Loose coupling minimises the disruptive influence of others around you.
- Keep a wide range of inputs, so that you get the maximum amount of diversity/randomness injected into the solution space.
Jon Bostrom Nokia: Mobile computing on the edge
Advantages of edge computing:
- ease of use
- dynamic evolution / low centralised control
… this is turning into a bit of a Nokia sales pitch….yawn…
Writing about web page http://conferences.oreillynet.com/cs/et2005/view/e_sess/5910
Neil Gershenfeld Bits & Atoms
The state-of-the-art in fabrication is the chip factory: Actually right now it's not very sophisticated: You spread some stuff around and cook it. Compared to biology the big difference is that the things you're making don't know anything about being made – whereas when you make an animal the cells know how to make more cells – the specification of the structure lives within the structure itself.
For traditional manufacturing, errors in mfg are proportional to noise in the process. In signal theory (e.g. networks) a certain amount of noise can be tolerated without having any effect on errors in the system. If we can make fabrication processes where the object being fabbed knows the specification we can get the same kind of noise-toleration (e.g. genes can cope with errors and still make an organism)
20K buys you a field fab-lab: a laser cutter, sign cutter, micron-scale milling machine, and a microcontroller programming setup. Microfabrication is now in the same place that computing was about 25 years ago (e.g. when minicomputers like the PDP were around). The PC equivalent of a microfabricator is not far off
Fabrication labs at this scale are a disruptive technology, Neil's group have been introducing them into developing countries to see what can be acheived: Answer – all sorts of cool small-scale solutions to local problems
Cory Doctorow All complex ecosystems have parasites
- AOL chooses to allow spam through despite the cost, because if you solve spam you break email. Uncontrollability is a key element of a fault-tolerant system like email
- DVD has been developed to be controllable; CDs were not. The result is that if you invest in CDs, you can re-use them as MP3s, ringtones, etc,etc… With an investment in DVDs you never get any increase in the value.
- The DVD control model is fragile and unscalable; trying to extend it out to other devices – wider DRM - won't work, or will cripple the industry if it does. DRM isn't working now – any movie is available over P2P, depsite the huge costs of implementation.
Justin Chepweske, Onion networks
2 billion dollars a year is spent on http optimisation: load balancers, caches, etc. This is at least partly because HTTP is sub-optimal for the size of the web
- Http is very bad at transferring large (multi-GB) files – packet loss, broken 32-bit apps, etc.
- One solution is to use very high-quality transports, but it would be better to have a fault-tolerant transport (like RAID for storage)
- swarming is RAID for the web: tolerates failures of transport and failures of servers
- swarming features: it's a content-delivery system. data is signed and encrypted so you don't need to trust the host you download from. runs over standard protocols – it's an extension to HTTP
- standard java http stack replacement available (open-source)
Jimmy Wales – Wikipedia & the future of social software
- 500K entries
- taxonomy: 350K categories, hierarchical, dynamic
- 500MM page views / month
- the original dream of the net – people sharing information freely
- problems – quality control, author fatigue
- solution: wiki[pedia|cities] – a social computing successor to 'free homepages'. Uses a free content license so that people can take their content with them if they want to leave
- sites are maintained by communities rather than by an individual, thus mitigating the risks of quality control and author fatigue
- wiki software doesn't eforce social rules – for exampe 'votes for deletion' page
- wikipedia is a social innovation, not a technological one.
- software which enables collaboration is the future of the net
Panel discussion Folksonomy
Why do companies allow end-users to participate in tagging?
In flickr's case it was primarily done for the individual user and then aggregated, in wikipedia's case it was primarily done for the community. SB (flickr) – folksonomies are not a replacement for a formal taxonomy, they are an addition. JS (del.icio.us) – also started from the assumption that tags were a personal thing, and just the the folksonomy emerge.
Some tags are nothing to do with categorisation e.g. toread on del.icio.us, even though they are interesting as a social behaviour
flickr / del.icio.us are different to wikipedia, because they start with individual spaces and then aggregate them, whereas WP starts with a shared space and uses negotiation/governance to manage it. The individual approach is less optimal for the social stuff – e.g. people tagging pictures of their trip to mexico as 'etech' because they went just before the conference – right for the individual, but breaks the aggregation.
JS: Although you can key between tags between de.icio.us / flikr / technorati, it's not always appropriate – the tags mean different things in the different applications
Q: How do you provide feedback to people to improve their tagging? In wikipedia it's easy; in flikr it doesn't matter – the primary purpose of a flikr tag is personal. Also the volume of pics. is so great that you don't need a perfect vocabular. In del.icio.us, there are some tools to help you see which of your tags are also used by others.
SB: formal taxonomies are ultimately limited because (as far as we can tell) the real world isn't easily classified.
What kind of problems will amazon face in delivering retail services to mars? Or to put it another way, why is it that we don't think global e-commerce is possible?
We already do some things at massive scale – the internet, mobile phones, chips (multi-billion transistors that all work). There are 1 quadrillion ants on the planet (allegedly)
What do we need to solve the problem of massive scalability? Not just technology, though that may be a necessary precursor. There are only a few systems that can scale up to millions of parallel nodes.
Amazon scale: 47 MM users, 7 websites, 50% is non-US sales. 2.8MM units/day ordered at peak time. 32 orders/second peak. 2MM packages dispatched
Scale ought to be seen as an advantage – the more you scale the more you can sell
Can we use the same engineering techniques to build really large systems that we use for current big systems? Management becomes a big deal; how to cope with unreliability
Real Life scales well - systems need to learn from biology for high fault-tolerance. Biological systems go through continuous refresh - cells are designed to die and be born without affecting the organism as a whole.
Outside monitors are not a good indicator of 'health'. system should be designed for continuous change, not stability.
Turings 3 categories of systems:
- organised (current apps)
- unorganised (networks)
- self-organising (biological)
– need to move to self-organisation for massive scalability
Can't expect complete top-down control – since applications won't be deterministic. Real life is not a state machine
Functional units need to be self-organising feedback-centric machines
comparison point: Why are epidemics so robust wrt message loss / node failure? Can be mathematically modelled in a rigorous way. It works because each node can operate independently if it needs to. As the number of nodes becomes really large then you only need to know a subset of the system in order to succeed.
Fault detection protocols – monitor on a particular node A how long since another node B updated it's state. B does not need to contact A directly because the state will eventually replicate around the whole system. Need clear partitioning of data but then the system becomes highly reliable.
Writing about web page http://intertwingly.net/slides/2005/etcon/
Sam's slides are all online so I don't need to annotate everything. Everyone who develops web apps should read them
- understand unicode: it is an attractive nuisance . Inexperienced developers will screw it up. c.f. the recent (p)unicode domain name hijacking bugs in mozilla
- The default encoding for HTML is iso-8859–1, for XML it's utf-8. This is why you can't put HTML directly into RSS, win-1282 (the default enctype on windows) isn't compatible with either (27 differences, mostly around quotes and the euro symbol)
- URIs: Encoding is not defined – it's up to you to document yours clearly to your clients. Equality of URIs is not well-defined; the CLR uri.equal method is broken.
- layering is problematic e.g. the rules for encoding a URI don't apply in an XML document (can't encode a ~ as %7e )
- RSS/Atom: Lots of unanswered questions / ill-defined points in the spec
- Layering is the problem, not the solution. Layered designs inherit the bugs from all layers.
Q: If everything is broken, how come it still works? A: because people are very fault-tolerant. The more machine-machine communication you have the more problematic it becomes.
Q: Are web services genuinely better than HTTP, or just newer? A: because the client stacks have (sometimes) been written with the spec to hand, they're generally more reliable.
Q: How do you avoid the attractive nuisance problem e.g. when writing the atom spec? A: make the spec force people to think about the problems e.g. by specifying the content-encoding for specific element types.