Distributed, Secure Load Balancing with Skew, Heterogeneity, and - PowerPoint PPT Presentation

Distributed, Secure Load Balancing with Skew, Heterogeneity, and Churn Jonathan Ledlie and Margo Seltzer INFOCOM 2005 - March 16, 2005

Motivation - Why balance DHTs? • Distributed hash tables (DHTs): – Becoming “off-the-shelf” distributed data structures – Was: backup storage; now: ALM, resource discovery • DHTs must be versatile: – Handle variety of loads - low msg loss • Allocate network capacity – Realistic network conditions – Reasonably secure • Numerous load balancing proposals in literature – Unrealistic assumptions – Poor performance 3/16/2005 Jonathan Ledlie - INFOCOM 2005 2

Problematic Assumptions Assumption Reality Physical Nodes Uniform Broad Capacity Heterogeneity Workload Uniform Hotspots (Skew) Membership Stable Lots of Churn Security Pick any ID Malicious participants Current load balancing algorithms are insufficient 3/16/2005 Jonathan Ledlie - INFOCOM 2005 3

k-Choices Algorithm • Support variation in skew, node heterogeneity, and churn • Make IDs verifiable 1. Sample ? ? 2. Cost fn 3. Join ? 3/16/2005 Jonathan Ledlie - INFOCOM 2005 4

Talk Outline • Overview • Preliminaries – DHTs – Security – Network Characteristics • k-Choices • Prior Techniques • Evaluation • Conclusion 3/16/2005 Jonathan Ledlie - INFOCOM 2005 5

DHTs - Refresher • Each node has one or more virtual servers (VSs). • Each virtual server has an ID namespace ( e.g., (0,1], (0,2 160 ]). • Msgs via O(log(N)) hops between any two VSs. (a,b,c) a g (d) e (e,f) (g,h) d Chord-like routing 3/16/2005 Jonathan Ledlie - INFOCOM 2005 6

DHTs - Load Load i (a,b,c) a g (d) e (e,f) f (g,h) d (i,j) b 3/16/2005 Jonathan Ledlie - INFOCOM 2005 7

Sybil Attacks (a,b,c) g a m (d) e (e,f) (m) d Unsecured IDs � > Take over portions of ring 3/16/2005 Jonathan Ledlie - INFOCOM 2005 8

Sybil Attack - Solution • Central authority certifies each ID [Castro02] • k-Choices uses similar scheme to generate limited set of certified IDs. 3/16/2005 Jonathan Ledlie - INFOCOM 2005 9

Outline • Overview • Preliminaries – DHTs – Security – Network Characteristics • k-Choices • Prior Techniques • Evaluation • Conclusion 3/16/2005 Jonathan Ledlie - INFOCOM 2005 10

Characteristics - Skew • Skew: hotspots popular content • Typically Zipf popularity • E.g., Gnutella queries (log-log scale): 3/16/2005 Jonathan Ledlie - INFOCOM 2005 11

Characteristics - Churn • Churn: pattern of participant join and departure. • Pareto (memory-full) distribution (60 minute avg). 3/16/2005 Jonathan Ledlie - INFOCOM 2005 12

Characteristics - Heterogeneity • Network bandwidths vary by five orders-of-magnitude. • Routing capacity varies widely. 3/16/2005 Jonathan Ledlie - INFOCOM 2005 13

Outline • Overview • Preliminaries – DHTs – Security – Network Characteristics • k-Choices • Prior Techniques • Evaluation • Conclusion 3/16/2005 Jonathan Ledlie - INFOCOM 2005 14

k-Choices - Steps 1. Probe 2. Evaluate Cost Function 3. Join 3/16/2005 Jonathan Ledlie - INFOCOM 2005 15

k-Choices - Sample k=3 Sample ID b: Sample ID a: Learn succ(a) actual load, Learn succ(b) actual load, a b target load, and node capacity. target load, and node capacity. Load Capacity c Over target Discover load and capacity at each ID 3/16/2005 Jonathan Ledlie - INFOCOM 2005 16

k-Choices - Cost Function Current Future Load Capacity Load Capacity + ID a = … + ID b = … Choose ID that minimizes mismatch between target and load normalized by capacity. 3/16/2005 Jonathan Ledlie - INFOCOM 2005 17

k-Choices - Properties • Incorporates workload skew and node heterogeneity. • Proactive load balancing - join time • Reactive load balancing - reselect ID • Verifiable IDs 3/16/2005 Jonathan Ledlie - INFOCOM 2005 18

Outline • Overview • Preliminaries • k-Choices • Prior Techniques – log(N) virtual servers – Transfer – Proportion – Threshold • Evaluation • Conclusion 3/16/2005 Jonathan Ledlie - INFOCOM 2005 19

Prior Work - log(N) VS • Namespace balancing (e.g. [Karger97]) • Central Limit Theorem – Total namespace for each node approximately equal Namespace balancing does not equal load balancing . 3/16/2005 Jonathan Ledlie - INFOCOM 2005 20

Prior Work - Transfer • Overload: a) >1 VS: attempt to transfer b) 1 VS: split first, then transfer • Pros: Simple, Good Performance • Cons: Unsecure – Split to arbitrary ID (cut in half) – Transfer to anyone [Rao03,Godfrey04] 3/16/2005 Jonathan Ledlie - INFOCOM 2005 21

Evaluation • Trace Driven Simulation • Results – Determining k – Vary applied load – Vary churn – Vary skew • Pastry Implementation – Throughput – Heterogeneous real node bandwidths (Emulab) 3/16/2005 Jonathan Ledlie - INFOCOM 2005 22

Results - Choosing k 4k nodes, avg capacity=100 m/s, 60 min avg lifetime k=8 sufficiently reduced utilization. 3/16/2005 Jonathan Ledlie - INFOCOM 2005 23

Results - Trace 5508 nodes; median capacity: 191 msgs/sec k-Choices and Transfer performed equally well with skewed workloads. 3/16/2005 Jonathan Ledlie - INFOCOM 2005 24

Results - Implementation Pastry; “lookup+download”; 64x4 nodes - last mile limited k-Choices: 20% throughput improvement 3/16/2005 Jonathan Ledlie - INFOCOM 2005 25

Conclusion • k-Choices: – Approx. same performance as Transfer – Doesn’t change security properties – Not the final word - range queries • Design for empirical system – Namespace balancing? – Skew, wide capacity distribution, churn – Security: Sybil attacks 3/16/2005 Jonathan Ledlie - INFOCOM 2005 26

Questions? • Thanks! • Contact: – Jonathan Ledlie – jonathan@eecs.harvard.edu 3/16/2005 Jonathan Ledlie - INFOCOM 2005 27

Prior Work - Threshold • If our utilization has increased beyond threshold – Compare utilization to neighbors – Shift their IDs? • Else – Compare to set of log(N) random VSs – Move best to be our new predecessor [Ganesan04] 3/16/2005 Jonathan Ledlie - INFOCOM 2005 28

Prior Work - Proportion • Overload: shed VSs • Underload: create them • Pros: No communication • Cons: – Large number of VSs created – New lowest common denominator – Cascading deletes [Dabek01] 3/16/2005 Jonathan Ledlie - INFOCOM 2005 29

Distributed, Secure Load Balancing with Skew, Heterogeneity, and - PowerPoint PPT Presentation

Distributed, Secure Load Balancing with Skew, Heterogeneity, and Churn Jonathan Ledlie and Margo Seltzer INFOCOM 2005 - March 16, 2005 Motivation - Why balance DHTs? Distributed hash tables (DHTs): Becoming off-the-shelf

Load Balancing Load Balancing Load balancing: distributing data and/or computations across

Load Balancing with nftables by Laura Garca (Zen Load Balancer Team) Netdev 1.1 Prototype of

Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal e 15th April 1 / 25

Internal Load Balancing in 5 mins Deliver scalable and resilient internal-only services on GCP

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele

Probability BIO5312 FALL2017 STEPHANIE J. SPIELMAN, PHD Skew Symmetric Left-skew Right-skew

Load Balancing in Ceph: Load Balancing With Pseudorandom Placement Esteban Molina-Estolano,

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING IS IMPOSSIBLE Tyler McMullen

On Skew-Homomorphisms B. Kuzma 1 G. Dolinar G. Nagy P . Szokol 1 UP FAMNIT May 28, 2015

Balancing Gas system information provision 12 June 2018 GRTgaz balancing in a nutshell -> 2

Load Balancing Load Balancing: Example Example Problem Consider 6 jobs whose processing times

Load Balancing and Termination Detection Load balancing used to distribute computations fairly

Time skew analysis using web cookies Bj orgvin Ragnarsson 07-03-2013 Time skew analysis using

Hook formulas for skew shapes Greta Panova (University of Pennsylvania) joint with Alejandro

M obius disjointness for skew products on T \ G Jianya LIU Shandong University Cetraro

Heavy tails: right skew ! Right skew ! normal distribution (not heavy tailed) ! e.g. heights of

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Load balancing David Bindel 12 Nov 2015 Inefficiencies in parallel code Poor single

Using SimGrid to Evaluate the Impact of AMPI Load Balancing In a Geophysics HPC Application

Generalized roofline analysis? Jee Choi Marat Dukhan Richard (Rich) Vuduc October 2, 2013

Balancing Gossip Exchanges in Networks with van Renesse and Firewalls L. Rodrigues

Laura Avanzino Department of Experimental Medicine, section of Human Physiology University of

A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin Wong

61A Lecture 15 Announcements Object-Oriented Programming Object-Oriented Programming A method

Distributed, Secure Load Balancing with Skew, Heterogeneity, and - PowerPoint PPT Presentation

Distributed, Secure Load Balancing with Skew, Heterogeneity, and Churn Jonathan Ledlie and Margo Seltzer INFOCOM 2005 - March 16, 2005 Motivation - Why balance DHTs? Distributed hash tables (DHTs): Becoming off-the-shelf

Load Balancing Load Balancing Load balancing: distributing data and/or computations across

Load Balancing with nftables by Laura Garca (Zen Load Balancer Team) Netdev 1.1 Prototype of

Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal e 15th April 1 / 25

Internal Load Balancing in 5 mins Deliver scalable and resilient internal-only services on GCP

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele

Probability BIO5312 FALL2017 STEPHANIE J. SPIELMAN, PHD Skew Symmetric Left-skew Right-skew

Load Balancing in Ceph: Load Balancing With Pseudorandom Placement Esteban Molina-Estolano,

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING IS IMPOSSIBLE Tyler McMullen

On Skew-Homomorphisms B. Kuzma 1 G. Dolinar G. Nagy P . Szokol 1 UP FAMNIT May 28, 2015

Balancing Gas system information provision 12 June 2018 GRTgaz balancing in a nutshell -&gt; 2

Load Balancing Load Balancing: Example Example Problem Consider 6 jobs whose processing times

Load Balancing and Termination Detection Load balancing used to distribute computations fairly

Time skew analysis using web cookies Bj orgvin Ragnarsson 07-03-2013 Time skew analysis using

Hook formulas for skew shapes Greta Panova (University of Pennsylvania) joint with Alejandro

M obius disjointness for skew products on T \ G Jianya LIU Shandong University Cetraro

Heavy tails: right skew ! Right skew ! normal distribution (not heavy tailed) ! e.g. heights of

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Load balancing David Bindel 12 Nov 2015 Inefficiencies in parallel code Poor single

Using SimGrid to Evaluate the Impact of AMPI Load Balancing In a Geophysics HPC Application

Generalized roofline analysis? Jee Choi Marat Dukhan Richard (Rich) Vuduc October 2, 2013

Balancing Gossip Exchanges in Networks with van Renesse and Firewalls L. Rodrigues

Laura Avanzino Department of Experimental Medicine, section of Human Physiology University of

A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin Wong

61A Lecture 15 Announcements Object-Oriented Programming Object-Oriented Programming A method

Balancing Gas system information provision 12 June 2018 GRTgaz balancing in a nutshell -> 2