how to make chord correct
play

HOW TO MAKE CHORD CORRECT Pamela Zave AT&T LaboratoriesResearch - PowerPoint PPT Presentation

HOW TO MAKE CHORD CORRECT Pamela Zave AT&T LaboratoriesResearch Florham Park, New Jersey, USA CHORD IS A DISTRIBUTED HASH TABLE: AN AD-HOC PEER-TO-PEER NETWORK identifier of a node (assumed IMPLEMENTING A unique) is an m-bit hash of


  1. HOW TO MAKE CHORD CORRECT Pamela Zave AT&T Laboratories—Research Florham Park, New Jersey, USA

  2. CHORD IS A DISTRIBUTED HASH TABLE: AN AD-HOC PEER-TO-PEER NETWORK identifier of a node (assumed IMPLEMENTING A unique) is an m-bit hash of m = 6 its IP address KEY-VALUE STORE 1 keys are also m bits 8 51 14 nodes are arranged in a ring, each node 48 having a successor pointer to the next node (in integer order 21 with wraparound at 0) 42 the ring-maintenance 38 storage and lookup protocol preserves 32 rely on the ring the ring structure as structure nodes join and leave key-value pairs for silently or fail keys 22 - 32 are stored here

  3. WHY IS CHORD IMPORTANT? the SIGCOMM paper introducing Chord is the 4th-most-referenced paper in computer science, . . . . . . and won SIGCOMM’s 2011 Test of Time Award APPLICATIONS OF DISTRIBUTED OTHER DISTRIBUTED HASH TABLES HASH TABLES Pastry allow millions of peers to cooperate in implementing a data store Tapestry used as a building-block in fault- tolerant applications CAN the best-known application is Kademlia BitTorrent and others

  4. AN IDEAL NETWORK . . . . . . when all pointers are present successor 61 9 15 48 21 predecessor 39 35 30 successor2

  5. OPERATIONS OF THE RING-MAINTENANCE PROTOCOL an operation changes most operations are scheduled, asynchronously the state of one node and autonomously, by their own nodes 10 JOINS 16 NOTIFIED 7 STABILIZES 10 NOTIFIED 7 7 7 7 10 10 10 10 16 16 16 16 just as Stabilize and Notified . . . Update, Reconcile, and Flush operations repair the disruption operations repair the disruption caused by Joins, . . . caused by Failures (using redundant successors)

  6. A FAILURE . . . . . . AND ITS REPAIR BEFORE BEFORE 9 9 update: replace dead successor by live succ2 16 16 succ2 flush: 22 remove 35 35 dead failing predecessor AFTER 9 9 AFTER reconcile: improve 16 succ2 by replacing 16 with successor's successor 35 35

  7. A VALID NETWORK 63 2 13 defining a node’s best successor 6 7 as its first successor 55 pointing to a live node (member): 16 there is a cycle of 19 best successors 40 there is no more 29 than one cycle on the cycle of best WHAT THE PROTOCOL CAN DO (allegedly) successors, the nodes are in identifier order keep the network valid at all times from each member not in the repair any other defect (appendages, cycle, the cycle is reachable missing pointers, etc. ) . . . through best successors . . . so that eventually, if there are no new joins or failures, the network becomes ideal WHAT THE PROTOCOL CANNOT DO there are no intervals in which sets of nodes are “locked” to implement if the network becomes invalid, multi-node atomic operations the protocol cannot repair it great performance! fast and easy to analyze!

  8. THE CLAIMS DO REAL IMPLEMENTATIONS "Three features that distinguish Chord HAVE THESE FLAWS? from many peer-to-peer lookup protocols are its simplicity, provable correctness, and provable performance." some implementations have even the easiest-to-fix flaws almost certain that all THE REALITY implementations have some flaws cannot tell for sure without even with simple bugs fixed and reading the code, as optimistic assumptions about implementors do not document atomicity, the original protocol is what they have actually not correct implemented of the seven properties claimed THE GOAL invariant of the original version, not one is actually an invariant find a specification that is actually correct some (or maybe all) of the many persuade people to take the papers analyzing Chord performance specification seriously are based on false assumptions about how the protocol works

  9. LIGHTWEIGHT MODELING DEFINITION WHY IS IT INTERESTING? constructing a small, abstract it is a proven tool for revealing logical model of the key concepts conceptual errors and improving of a system software quality, in a cost-effective manner analyzing the properties of the model with a tool that performs you will see how little work exhaustive enumeration over a it takes to find problems bounded domain with Chord it is a formal method that can be WHY IS IT "LIGHTWEIGHT"? used and appreciated by very practical people because the model is very abstract protocol designers in comparison to a real should model as they design implementation, it is small and can be constructed quickly it is easy (at least to get started) and fun! because the analysis tool is "push- button", it yields results with "If you like surprises, you will relatively little effort love lightweight modeling." —Pamela Zave in contrast, theorem proving is not “push-button”

  10. MY FAVORITE TOOLS Promela (language) / Spin Alloy (language) / Alloy Analyzer Promela is a simple programming Alloy combines relational language with concurrent algebra, first-order predicate processes, messages, bounded calculus, transitive closure, and message queues, and fixed-size objects. arrays. Analyzer compiles a model into a Spin is a model checker: the set of Boolean constraints, uses program specifies a large finite- SAT solvers to decide whether the state machine which the checker set of constraints is satisfiable. explores exhaustively. the style of modeling in these two languages is radically different the analysis capabilities are also radically different both are applicable to Chord (see “A practical comparison of Alloy and Spin”) but this talk uses Alloy

  11. A PROPERTY CLAIMED INVARIANT OrderedMerges . . . The good news: this property . . . means that is easily violations are repaired by appendages are violated, as stabilization in the correct shown here The bad news: places, as they are here causes some lookups to fail invalidates some assumptions used in 6 6 performance analysis The main point: 6 stabilizes and How could this go unknown 12 12 notified for ten years? behavior appears in 10 10 networks with 3 nodes it takes an 88-line model 12 16 16 and .3 seconds of analysis to find this with Alloy

  12. RELATIONAL JOIN THE KEY TO UNDERSTANDING RELATIONAL ALGEBRA (AND ALLOY) RELATIONS P is of type A Q is of type A -> B -> C R is of type C -> D A A$0 -> B$0 -> C$0 C$0 -> D$0 A A$1 A$1 -> B$1 -> C$1 C$1 -> D$1 A A$2 A$2 -> B$2 -> C$2 A A JOIN EXPRESSION P . Q . R columns on either side of dot must have same type A COMPUTATION OF JOIN X value in “shared column” A$0 -> B$0 -> C$0 C$0 -> D$0 in must resulting match A$1 A$1 -> B$1 -> C$1 C$1 -> D$1 relation, “shared X A$2 A$2 -> B$2 -> C$2 columns” are removed VALUE OF JOIN EXPRESSION B$1 -> D$1 result is a relation with any number of tuples, including zero or many

  13. TIME IN ALLOY: PART OF THE MODEL YOU WRITE, NOT PART OF THE LANGUAGE YOU WRITE IN sig Time { } Alloy “facts” produce a basic type, declared these relationships A to be totally ordered individuals of type Time A an object type, A sig Event { with two fields A pre: Time, A post post: Time } pre A individuals of type Event

  14. TIME IN ALLOY: PART OF THE MODEL YOU WRITE, NOT PART OF THE LANGUAGE YOU WRITE IN sig Time { } A individuals of type Time A an object type, A sig Event { with two fields A pre: Time, A post post: Time } pre A individuals of type Event OBJECTS IN ALLOY HAVE A FUNDAMENTALLY SIMPLE RELATIONAL SEMANTICS pre is a relation from Event to Time . . . Event$0 -> Time$0 Event$1 -> Time$1 . . . so if e stands for Event$1, . . . then e . pre is Time$1

  15. TEMPORAL STATE IN ALLOY succ is a ternary relation from sig Node { Node to Node to Time A for each Node, each Time succ: Node lone -> Time, corresponds to one or zero predecessor Nodes A A prdc: Node lone -> Time } A A A

  16. TEMPORAL STATE IN ALLOY sig Node { A succ: Node lone -> Time, A A prdc: Node lone -> Time } A A { all t: Time | no succ.t => no prdc.t } A if a Node is not a member of the network it has no successor . . . . . . in which case it cannot have a predecessor, either; stated separately from the signature it would look like this: fact { all n: Node, t: Time | no n.succ.t => no n.prdc.t }

  17. TEMPORAL STATE IN ALLOY sig Node { A succ: Node lone -> Time, A A prdc: Node lone -> Time } A A { all t: Time | no succ.t => no prdc.t } Nodes are also declared to be A totally ordered, so we can use library predicates to define cycle ordering: pred Between [n1, n2, n3: Node] { lt [n1,n3] special case for wraparound at zero => ( lt [n1,n2] && lt [n2,n3] ) else ( lt [n1,n2] || lt [n2,n3] ) }

  18. GRAPH PROPERTIES IN ALLOY transitive closure A pred OneOrderedRing [t: Time] { A A let ringMembers = { n: Node | n in n.(^(succ.t)) } | A 63 2 A ringMembers is the set of all nodes that are members A (because they have 7 successors) . . . 55 . . . and that are reachable 16 from themselves by } following successor pointers 19 40 29

  19. GRAPH PROPERTIES IN ALLOY A pred OneOrderedRing [t: Time] { A A let ringMembers = { n: Node | n in n.(^(succ.t)) } | A some ringMembers A there is at least A one ring }

  20. GRAPH PROPERTIES IN ALLOY A pred OneOrderedRing [t: Time] { A A let ringMembers = { n: Node | n in n.(^(succ.t)) } | A some ringMembers there is at most A one ring A && (all disj n1, n2: ringMembers | n1 in n2.(^(succ.t)) ) }

Recommend


More recommend