Amazon: Interplanetary e–commerce
What kind of problems will amazon face in delivering retail services to mars? Or to put it another way, why is it that we don't think global e-commerce is possible?
We already do some things at massive scale – the internet, mobile phones, chips (multi-billion transistors that all work). There are 1 quadrillion ants on the planet (allegedly)
What do we need to solve the problem of massive scalability? Not just technology, though that may be a necessary precursor. There are only a few systems that can scale up to millions of parallel nodes.
Amazon scale: 47 MM users, 7 websites, 50% is non-US sales. 2.8MM units/day ordered at peak time. 32 orders/second peak. 2MM packages dispatched
Scale ought to be seen as an advantage – the more you scale the more you can sell
Can we use the same engineering techniques to build really large systems that we use for current big systems? Management becomes a big deal; how to cope with unreliability
Real Life scales well - systems need to learn from biology for high fault-tolerance. Biological systems go through continuous refresh - cells are designed to die and be born without affecting the organism as a whole.
Outside monitors are not a good indicator of 'health'. system should be designed for continuous change, not stability.
Turings 3 categories of systems:
- organised (current apps)
- unorganised (networks)
- self-organising (biological)
– need to move to self-organisation for massive scalability
Can't expect complete top-down control – since applications won't be deterministic. Real life is not a state machine
Functional units need to be self-organising feedback-centric machines
comparison point: Why are epidemics so robust wrt message loss / node failure? Can be mathematically modelled in a rigorous way. It works because each node can operate independently if it needs to. As the number of nodes becomes really large then you only need to know a subset of the system in order to succeed.
Fault detection protocols – monitor on a particular node A how long since another node B updated it's state. B does not need to contact A directly because the state will eventually replicate around the whole system. Need clear partitioning of data but then the system becomes highly reliable.