Dynamo concepts in depth. Pavlo Baron, codecentric AG Friday, August 31, 12
Pavlo Baron pavlo.baron@codecentric.de @pavlobaron Friday, August 31, 12
The shopping cart case Friday, August 31, 12
The 2 AM alarm call case Friday, August 31, 12
The Tower of Babel case Friday, August 31, 12
The Neo vs. Smiths case Friday, August 31, 12
The Pavlo case Friday, August 31, 12
Friday, August 31, 12
So Dynamo isn’t about speed. It’s about immediate, reliable writes. It’s about operation relaxation. It’s about distribution and fault tolerance. It’s about almost linear scalability. Friday, August 31, 12
Time and timestamps Friday, August 31, 12
Clocks V(i), V(j): competing Conflict resolution: 1: siblings, client 2: merge, system 3: voting, system Friday, August 31, 12
Vector clocks Node 1 1,0,0 2,2,0 3,2,0 4,3,3 Node 2 1,1,0 1,2,0 1,3,3 4,4,3 Node 3 1,0,1 1,2,2 1,2,3 4,3,4 Friday, August 31, 12
Vector clocks Node 1 1,0,0,0 Node 2 1,1,0,0 1,2,0,0 1,3,0,3 Node 3 Node 4 1,0,1,0 1,0,2,0 1,0,0,1 1,2,0,2 1,2,0,3 Friday, August 31, 12
O(1) for data lookups / delta tracking # Friday, August 31, 12
Merkle Trees N, M: nodes HT(N), HT(M): hash trees M needs update: obtain HT(N) calc delta(HT(M), HT(N)) pull keys(delta) Friday, August 31, 12
Node a.1 Merkle Trees a ab ac abc abd acb acc abe abd ada adb ab ad a Node a.2 Friday, August 31, 12
Node a.1 Merkle Trees a ab abc abd abd ada adb ab ad a Node a.2 Friday, August 31, 12
“Equal” nodes based decentralized distribution Friday, August 31, 12
Consensus, agreement, voting, quorum Friday, August 31, 12
Consistent hashing - the ring X bit integer space 0 <= N <= 2 ^ X or: 2 x Pi 0 <= A <= 2 x Pi x(N) = cos(A) y(N) = sin(A) Friday, August 31, 12
Quorum V: vnodes holding a key W: write quorum R: read quorum DW: durable write quorum W > 0.5 * V R + W > V Friday, August 31, 12
Insert key Key = “foo” (sloppy quorum) # = N, W = 2 replicate N ok Friday, August 31, 12
Add node copy leave leave copy y p o leave c Friday, August 31, 12
Lookup key (sloppy quorum) N Value = “bar” Key = “foo” # = N, R = 2 Friday, August 31, 12
Remove node copy leave Friday, August 31, 12
Gossip – node down/up Node 1 Node 2 update, read, update update 4 down 4 up Node 3 update read Node 4 Friday, August 31, 12
Eventual consistency Friday, August 31, 12
BASE Basically Available, Soft-state, Eventually consistent Opposite to ACID Friday, August 31, 12
Read your write consistency FE1 FE2 write read write read v 2 v2 v 1 v1 v 1 v 2 v 3 Data store Friday, August 31, 12
Session consistency FE Session 1 Session 2 write read write read v 2 v2 v 1 v1 v 1 v 2 v 3 Data store Friday, August 31, 12
Monotonic read consistency FE1 FE2 read read read read read v 2 v2 v3 v 3 v4 v 1 v 2 v 3 v 4 Data store Friday, August 31, 12
Monotonic write consistency FE1 FE2 write write read read v 1 v2 v 3 v3 v 1 v 2 v 3 v 4 Data store Friday, August 31, 12
Eventual consistency FE2 FE1 read read read read write v 1 v2 v2 v3 v 3 v 1 v 2 v 3 Data store Friday, August 31, 12
Hinted hando fg N: node, G: group including N node(N) is unavailable replicate to G or store data(N) locally hint hando fg for later node(N) is alive hando fg data to node(N) Friday, August 31, 12
Direct Key = “foo”, # = N -> replica hando fg hint = true fails Key = “foo” N replicate Friday, August 31, 12
Replica hando fg recovers Friday, August 31, 12
All Key = “foo”, replicas # = N -> fail hando fg hint = true N Friday, August 31, 12
All replicas hando fg recover replicate Friday, August 31, 12
Friday, August 31, 12
Latency is an adjustment screw Friday, August 31, 12
Availability is an adjustment screw Friday, August 31, 12
CAP – the variations CA – irrelevant CP – eventually unavailable o fg ering maximum consistency AP – eventually inconsistent o fg ering maximum availability Friday, August 31, 12
CAP – the tradeo fg A C Friday, August 31, 12
Replica 1 CP v 1 read write v 2 v 2 v 2 v 1 read Replica 2 Friday, August 31, 12
Replica 1 CP (partition) v 1 read write v 2 v 2 v 1 read Replica 2 Friday, August 31, 12
Replica 1 AP v 1 write v 2 v 2 read replicate v 1 read v 2 Replica 2 Friday, August 31, 12
Replica 1 AP (partition) v 1 write v 2 v 2 read hint hando fg v 2 v 1 read Replica 2 Friday, August 31, 12
Frequent structure changes Friday, August 31, 12
Thank you Friday, August 31, 12
Many graphics I’ve created myself Some images originate from istockphoto.com except few ones taken from Wikipedia and product pages Friday, August 31, 12
Recommend
More recommend