Locality Aware Mechanisms for Large-scale Networks: The Tapestry Infrastructure John D. Kubiatowicz UC Berkeley
Global Scale Applications � Clear demand for global scale applications – Exploiting collective resources – File sharing, data dissemination, shared computation � Wide-area issues – Scalability, fault-handling, adaptability, manageability � Decentralized Object Location and Routing (DOLR) – Provides scalable message routing to objects Tahoe Retreat 06/02 John Kubiatowicz
Utility-based Storage: OceanStore Canadian OceanStore Sprint AT&T IBM Pac Bell IBM Tahoe Retreat 06/02 John Kubiatowicz
Locality, Locality, Locality � “ The ability to exploit local resources over remote ones whenever possible ” � “-Centric” approach – Client-centric, server-centric, data source-centric � Requirements: – Find data quickly, wherever it might reside � Locate nearby object without global communication � Permit rapid object migration – Verifiable: can’t be sidetracked � Data name cryptographically related to data Tahoe Retreat 06/02 John Kubiatowicz
Basic Tapestry Mesh 3 4 2 NodeID NodeID NodeID 0xEF97 0xEF32 0xE399 NodeID NodeID 1 4 0xEF34 0xEF34 NodeID 3 NodeID 0xEF37 0xEF44 2 1 3 NodeID 4 4 3 0x099F 2 NodeID NodeID 0xE530 3 NodeID 4 0xEF40 2 0xEF31 NodeID NodeID NodeID 3 0xE555 0xEFBA 1 0x0999 2 1 2 3 NodeID NodeID NodeID 1 NodeID 0xE932 0xFF37 0x0921 0xE324 Tahoe Retreat 06/02 John Kubiatowicz
Randomization and Locality Tahoe Retreat 06/02 John Kubiatowicz
Parallel Insert Algorithms (SPAA ’02) � Massive parallel insert is important – We now have algorithms that handle “arbitrary simultaneous inserts” – Construction of nearest-neighbor mesh links � Log 2 n message complexity ⇒ fully operational routing mesh – Objects kept available during this process � Incremental movement of pointers � Interesting Issue: Introduction service – How does a new node find a gateway into the Tapestry? Tahoe Retreat 06/02 John Kubiatowicz
Highly Dynamic Systems � Instability is the common case….! – Small half-life for P2P apps (1 hour????) – Congestion, flash crowds, misconfiguration, faults � BGP convergence 3-30 mins! – Mobile clients in semi-connected mode � Must Use Overlay under instability! � Must be somehow: – Insensitive to faults and denial of service attacks � Route around bad servers and ignore bad data – Repairable infrastructure � Easy to reconstruct routing and location information – Without care, worst case: sub-optimal paths, network partitions, broken invariants ⇒ loss of availability Tahoe Retreat 06/02 John Kubiatowicz
Continuous Self-Repair � All state is Soft State – Continuous probing, selection of neighbors – Periodic restoration of state � Stability through statistics: – Redundancy at many levels: – Neighbor links, Object Roots, etc. � Dynamic Stabilization: – Integrate/remove themselves automatically – Pointer state routed around faulty node � Markov Models: – What is a misbehaving router? Communication link? – What level of redundancy necessary? – Are we under attack? Tahoe Retreat 06/02 John Kubiatowicz
State Maintenance � Maintenance of link conditions, routing state � Example: soft-state beacons measure link conditions between neighbors (Tapestry, Scribe) � Locality awareness – Naïve: messages sent across overlay hops at same rate regardless of actual network distance – Traffic can scale with length of overlay hop, possibly causing congestion � Alternatives: – Scale soft-state frequency w/ length of overlay hop – External fault-detection / measurement platform Tahoe Retreat 06/02 John Kubiatowicz
Locality-Based Heartbeats Tahoe Retreat 06/02 John Kubiatowicz
Redundant Routing � Keep multiple routes for every link – Pre-compute alternatives – Quick adaptation (using up pre-computing) � Destination-rooted spanning tree – For each node N , reachable by all other nodes – All paths union to form spanning tree rooted at N � Convergence – # of candidates for the next hop decreases by b every hop, where b = base of Tapestry ID – Nodes routing to same ID converge as a function of the network density between them Tahoe Retreat 06/02 John Kubiatowicz
Convergence Tahoe Retreat 06/02 John Kubiatowicz
First Reachable Link Selection (FRLS) Tahoe Retreat 06/02 John Kubiatowicz
FRLS Reachability FRLS Packet Delivery Rate vs. Link Failure 1 0.9 Packet Delivery Rate 0.8 E: No route connecting endpoints 0.7 0.6 D: Route exists, FRLS = no, IP = no 0.5 0.4 B: FRLS=no, IP=yes 0.3 C: FRLS=yes, IP=no 0.2 A: FRLS = yes, IP = yes 0.1 0 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 Fraction of Failed Links Tahoe Retreat 06/02 John Kubiatowicz
Proactive Multicast Tahoe Retreat 06/02 John Kubiatowicz
Self-Organizing Soft-State Replication � Simple algorithms for placing replicas on nodes in the interior – Intuition: locality properties of Tapestry help select positions for replicas – Tapestry helps associate parents and children to build multicast tree � Preliminary results show that this is effective Tahoe Retreat 06/02 John Kubiatowicz
The Living Network Model � Gaia: a living network – James Lovelock [1979] � Large scale self-management – Locally constrained interactions � scalability, performance – Layered control structure – Upward propagation of aggregate data � Survival via active redundancy & self-repair – Catastrophic failures handled by top level control (human interaction) Tahoe Retreat 06/02 John Kubiatowicz
Conclusion � Decentralized Object Location and Routing (DOLR) – Important to be able to route to objects � … With Locality – Use of local resources whenever possible! � Continuous adaptation, repair – System never quite fully stable – Continuous convergence – Keep objects as available as possible Tahoe Retreat 06/02 John Kubiatowicz
Recommend
More recommend