Distributed Hash Tables What is a DHT? Hash Table data structure - PowerPoint PPT Presentation

Distributed Hash Tables

What is a DHT? • Hash Table • data structure that maps “keys” to “values” • essen=al building block in so?ware systems • Distributed Hash Table (DHT) • similar, but spread across many hosts • Interface • insert(key, value) • lookup(key)

How do DHTs work? Every DHT node supports a single opera=on: • Given key as input; route messages to node holding key • DHTs are content-addressable

DHT: basic idea K V K V K V K V K V K V K V K V K V K V K V

DHT: basic idea K V K V K V K V K V K V K V K V K V K V K V Neighboring nodes are “connected” at the application-level

DHT: basic idea K V K V K V K V K V K V K V K V K V K V K V Operation: take key as input; route messages to node holding key

DHT: basic idea K V K V K V K V K V K V K V K V K V K V K V insert(K 1 ,V 1 ) Operation: take key as input; route messages to node holding key

DHT: basic idea (K 1 ,V 1 ) K V K V K V K V K V K V K V K V K V K V K V Operation: take key as input; route messages to node holding key

DHT: basic idea K V K V K V K V K V K V K V K V K V K V K V retrieve (K 1 ) Operation: take key as input; route messages to node holding key

• For what seKngs do DHTs make sense? • Why would you want DHTs?

Fundamental Design Idea I • Consistent Hashing • Map keys and nodes to an identifier space; implicit assignment of responsibility B C D A Identifiers 1111111111 Key 0000000000 Mapping performed using hash functions (e.g., SHA-1) • What is the advantage of consistent hashing?

Consistent Hashing

Fundamental Design Idea II • Prefix / Hypercube rou=ng Source Destination

State Assignment in Chord 000 111 001 110 010 101 011 d(100, 111) = 3 100 • Nodes are randomly chosen points on a clock-wise ring of values • Each node stores the id space ( values ) between itself and its predecessor

Chord Topology and Route Selection 000 110 111 d(000, 001) = 1 001 110 010 d(000, 010) = 2 101 011 100 d(000, 001) = 4 • Neighbor selec=on: i th neighbor at 2 i distance • Route selec=on: pick neighbor closest to des=na=on

Joining Node • Assume system starts out w/ correct rou=ng tables. • Use rou=ng tables to help the new node find informa=on. • New node m sends a lookup for its own key • This yields m.successor • m asks its successor for its en=re finger table. • Tweaks its own finger table in background • By looking up each m + 2^i

Rou=ng to new node • Ini=ally, lookups will go to where it would have gone before m joined • m's predecessor needs to set successor to m. Steps: • Each node keeps track of its current predecessor • When m joins, tells its successor that its predecessor has changed. • Periodically ask your successor who its predecessor is: • If that node is closer to you, switch to that guy. • this is called "stabiliza=on" • Correct successors are sufficient for correct lookups!

Concurrent Joins • Two new nodes with very close ids, might have same successor. • Example: • Ini=ally 40, 70 • 50 and 60 join concurrently • at first 40, 50, and 60 think their successor is 70! • which means lookups for 45 will yield 70, not 50 • a?er one stabiliza=on, 40 and 50 will learn about 60 • then 40 will learn about 50

Node Failures • Assume nodes fail w/o warning (harder issue) • Other nodes' rou=ng tables refer to dead node. • Dead node's predecessor has no successor. • If you try to route via dead node, detect =meout, route to numerically closer entry instead. • Maintain a _list_ of successors: r successors. • Lookup answer is first live successor >= key • or forward to *any* successor < key

Issues • How do you characterize the performance of DHTs? • How do you improve the performance of DHTs?

Security • Self-authen=ca=ng data, e.g. key = SHA1(value) • So DHT node can't forge data, but it is immutable data • Can someone cause millions of made-up hosts to join? Sybil aqack! • Can disrupt rou=ng, eavesdrop on all requests, etc. • Maybe you can require (and check) that node ID = SHA1(IP address) • How to deal with route disrup=ons, storage corrup=on? • Do parallel lookups, replicated store, etc.

CAP Theorem • Can't have all three of: consistency, availability, tolerance to par==ons • proposed by Eric Brewer in a keynote in 2000 • later proven by Gilbert & Lynch [2002] • but with a specific set of defini=ons that don't necessarily match what you'd assume (or Brewer meant!) • really influen=al on the design of NoSQL systems • and really controversial; “the CAP theorem encourages engineers to make awful decisions.” (Stonebraker) • usually misinterpreted!

Misinterpreta=ons • pick any two: consistency, availability, par==on tolerance • “I want my system to be available, so consistency has to go” • or "I need my system to be consistent, so it's not going to be available” • three possibili=es: CP, AP, CA systems

Issues with CAP • what does it mean to choose or not choose par==on tolerance? • it's a property of the environment, other two are goals • in other words, what's the difference between a "CA" and "CP" system? both give up availability on a par==on! • beqer phrasing: if the network can have par==ons, do we give up on consistency or availability?

Another "P": performance • providing strong consistency means coordina=ng across replicas • besides par==ons, also means expensive latency cost • at least some opera=ons must incur the cost of a wide-area RTT • can do beqer with weak consistency: only apply writes locally • then propagate asynchronously

CAP Implica=ons • can't have consistency when: • want the system to be always online • need to support disconnected opera=on • need faster replies than majority RTT • in prac=ce: can have consistency and availability together under • realis=c failure condi=ons • a majority of nodes are up and can communicate • can redirect clients to that majority

Dynamo • Real DHT (1-hop) used inside datacenters • E.g., shopping cart at Amazon • More available than Spanner etc. • Less consistent than Spanner • Influen=al — inspired Cassandra

Context • SLA: 99.9th delay latency < 300ms • constant failures • always writeable

Quorums • Sloppy quorum: first N reachable nodes a?er the home node on a DHT • Quorum rule: R + W > N • allows you to op=mize for the common case • but can s=ll provide inconsistencies in the presence of failures (unlike Paxos)

Eventual Consistency • accept writes at any replica • allow divergent replicas • allow reads to see stale or conflic=ng data • resolve mul=ple versions when failures go away • latest version if no conflic=ng updates • if conflicts, reader must merge and then write

More Details • Coordinator: successor of key on a ring • Coordinator forwards ops to N other nodes on the ring • Each opera=on is tagged with the coordinator =mestamp • Values have an associated “vector clock” of coordinator =mestamps • Gets return mul=ple values along with the vector clocks of values • Client resolves conflicts and stores the resolved value

Distributed Hash Tables What is a DHT? Hash Table data structure - PowerPoint PPT Presentation

Distributed Hash Tables What is a DHT? Hash Table data structure that maps keys to values essen=al building block in so?ware systems Distributed Hash Table (DHT) similar, but spread across many hosts Interface

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Datastructures 1 Hash Tables Red Black Trees Week 8 Objectives Hash Tables, Hashing

Hash Tables 1 / 91 Hash Tables Administrivia Assignment 2 has been released. We will be

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Searching Hash Tables Hash Functions

Hash tables Hash functions Open addressing March 09, 2020 Cinda Heeren / Andy Roth / Geoffrey

Working with Hash Tables Daniel Petrolito (ANZ Bank) Working With Hash Tables Daniel SAS

Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing

Ken Birman i Cornell University. CS5410 Fall 2008. What is a Distributed Hash Table (DHT)?

Introduction to Distributed Hash Tables Eric Rescorla Network Resonance ekr@networkresonance.com

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used

Hash Tables 1 Hash Table in Primary Storage Main parameter B = number of buckets Hash

CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implementations: average

Learning from Fine-Grained and Long-Tailed Visual Data Yin Cui Google Research Dec 11 2019

Building a Digital First Future: Digital Primary Care Congress Chamber of Commerce, Manchester 5

Extracting drug-drug interactions from pharmacological texts. Isabel Segura Bedmar, Cesar de

CompSci 514: Computer Networks Lecture 13: Distributed Hash Table Xiaowei Yang Overview

Handling Churn in a DHT Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz UC

Distributed Hash Tables CS425 /ECE428 DISTRIBUTED SYSTEMS SPRING 2020 Material derived

Handling Churn in a DHT Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz University

BRANCH PREDICTORS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of

Distributed Hash Tables What is a DHT? Hash Table data structure - PowerPoint PPT Presentation

Distributed Hash Tables What is a DHT? Hash Table data structure that maps keys to values essen=al building block in so?ware systems Distributed Hash Table (DHT) similar, but spread across many hosts Interface

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Datastructures 1 Hash Tables Red Black Trees Week 8 Objectives Hash Tables, Hashing

Hash Tables 1 / 91 Hash Tables Administrivia Assignment 2 has been released. We will be

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Searching Hash Tables Hash Functions

Hash tables Hash functions Open addressing March 09, 2020 Cinda Heeren / Andy Roth / Geoffrey

Working with Hash Tables Daniel Petrolito (ANZ Bank) Working With Hash Tables Daniel SAS

Hash Tables Direct-Address Tables Hash Functions Universal Hashing Chaining Open Addressing

Ken Birman i Cornell University. CS5410 Fall 2008. What is a Distributed Hash Table (DHT)?

Introduction to Distributed Hash Tables Eric Rescorla Network Resonance ekr@networkresonance.com

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

Topic 22 Hash Tables &quot; hash collision n. [from the techspeak] (var. `hash clash') When used

Hash Tables 1 Hash Table in Primary Storage Main parameter B = number of buckets Hash

CS200: Hash Tables Prichard Ch. 13.2 CS200 - Hash Tables 1 Table Implementations: average

Learning from Fine-Grained and Long-Tailed Visual Data Yin Cui Google Research Dec 11 2019

Building a Digital First Future: Digital Primary Care Congress Chamber of Commerce, Manchester 5

Extracting drug-drug interactions from pharmacological texts. Isabel Segura Bedmar, Cesar de

CompSci 514: Computer Networks Lecture 13: Distributed Hash Table Xiaowei Yang Overview

Handling Churn in a DHT Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz UC

Distributed Hash Tables CS425 /ECE428 DISTRIBUTED SYSTEMS SPRING 2020 Material derived

Handling Churn in a DHT Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz University

BRANCH PREDICTORS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of

Topic 22 Hash Tables " hash collision n. [from the techspeak] (var. `hash clash') When used