Channel Into Universe Of Eventually Perfect Distributed Systems @lenadro id
Fundamental techniques and building blocks @lenadro id
Are fundamentals still important? @lenadro id
Your System Your Trade-Offs @lenadro id
hm… Gap Theory Practice @lenadro id
There are challenges @lenadro id
There are challenges Road to Correctness and Understanding @lenadro id
Simple problems become hard @lenadro id
Ordering is Hard @lenadro id
Lamport Clock 1 2 3 A 3 2 1 B 2 2 and 3 could be concurrent @lenadro id
Vector Clock {A:4, B:1, { A:1, B:1 } {A:2, B:1 } {A:3, B:1, A {A:2, B:2, B { B:1 } X Y C {A:2, B:1, C:1} {A:2, B:1, C:2} {A:2, B:1, @lenadro id
Agreement In Distributed Systems @lenadro id
Two Phase Commit @lenadro id
Blocking Failure in Two Phase Commit Crashed ? Crashed Committed X X OK Nodes are Not Committed blocked! ? ? Can’t decide Can’t decide! OK OK Not Committed Not Committed Can’t decide Can’t decide @lenadro id
Hm… it’s blocking when there’re Two Phase Three Phase @lenadro id
? ? Crash X Pre- O committ Not Pre- ? ? O O Not Pre- Not Pre- @lenadro id
Crash X Abor Comm Abor Abor @lenadro id
Hm… it’s blocking when there are Two Phase Three Phase Might be inconsistent in asynchronous environment @lenadro id
FLP: Impossibility Result “Distributed consensus is impossible in asynchronous ??? system where at least one X node can fail.” @lenadro id
Hm… it’s blocking when there’re Two Phase Three Phase FLP Might be X inconsistent Paxos in asynchrono Classical Fast Paxos Zab Raf Multi-Paxos Vertical Chandra- Cheap Paxos Toueg @lenadro id
Paxos @lenadro id
Trade-offs Optimizations Weak or strong leader? Proposal Copying Quorum size? Distinguished Proposer Number of Failures Combining Roles Tolerated? Strategies for Proposal Numbers @lenadro id
Discovering New Trade-offs and Optimizations Quorum intersection revised Quorum based value selection Proposal numbers uniqueness And many more… cl.cam.ac.uk/techreports/UCAM-CL-TR-935.pdf by Dr. Heidi Howard @lenadro id
Consistent Replication? @lenadro id
dl.acm.org/citation.cfm?id=3183713.3196937 @lenadro id
Conflict-Free Replicated Data Types @lenadro id
(_, 1, 1) (1, _, _) (1, 0, 0) (0, 0, 0) (1, 1, 1) (1, _, _) (_, 1, 1) (0, 1, 1) @lenadro id
add X delete X add X ? delete X @lenadro id
Cosmos DB x: { b, c } x: { a, b, c } x: { a } @lenadro id
Failure Detection @lenadro id
Completeness : can all nodes discover all the failures? Accuracy : how precise can a node be in its failure suspicions? @lenadro id
Understanding Trade-offs Helps ✓ To make the right choices ✓ To know what correct means for us ✓ To verify and maintain correctness in practice @lenadro id
Maintaining Correctness In Real Systems @lenadro id
Model Checking @lenadro id
Verifying and Maintaining Correctness in Practical Real-World Systems @lenadro id
Kafka System Tests 450+ system tests, 6800+ unit tests, 600+ integration tests testing.confluent.io/confluent-kafka-system-test-results confluent.io/blog/apache-kafka-tested @lenadro id
Cassandra Tests Replay testing Dynamic test generation Property-based testing and fuzzing Distributed tests and fault-injection Upgrade testing cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html cassandra.apache.org/blog/2018/08/21/testing_apache_cassandra.html @lenadro id
+ Model-checking + Property-based testing and fuzzing + Performance and upgrade testing + Unit and integration testing + Fault injection + Attention to exception handling logic + С ode reviews @lenadro id
Take-Aways @lenadro id
✓ Know your trade-offs ✓ Create understandable systems ✓ Invest in correctness, it doesn’t come for free ✓ Don’t trust: test and verify ✓ Automate, but be ready when things fail ✓ Remember the real problem you are solving @lenadro id
@lenadro id
#SystemsYouUnderstand @lenadro id
Recommend
More recommend