Today Simplest.. Load balance: m balls in n bins. � k ≤ n k � k � n � n � � ne For simplicity: n balls in n bins. ≤ k ! ≤ k k k Round robin: load 1 ! Load balancing. Centralized! Not so good. � n = n ( n − 1 ) ··· ( n − k + 1 ) = n k · n − 1 k − 1 ··· n − k + 1 ≥ n k · n k ··· n Balls in Bins. � k k ( k − 1 ) · 1 1 k Uniformly at random? Average load 1. Power of two choices. n ( n − 1 ) ··· ( n − k + 1 ) ≤ n k Max load? Cuckoo hashing. � k � k k ! ≥ e n . Uh Oh! Max load with probability ≥ 1 − δ ? δ = 1 n c for today. c is 1 or 2. Balls in bins. Power of two.. Analysis. n / 8 balls in n bins. For each of n balls, choose random bin: X i balls in bin i . Each ball chooses two bins at random. Pr [ X i ≥ k ] ≤ ∑ S ⊆ [ n ] , | S | = k Pr [ balls in S chooses bin i ] picks least loaded. From Union Bound: Pr [ ∪ i A i ] ≤ ∑ i Pr [ A i ] n balls in n bins. View as graph. � 1 � k � n Pr [ balls in S chooses bin i ] = and � subsets S . Bin is vertex. n k Choose two bins, pick least loaded. � k Each ball is edge. � n �� 1 Pr [ X i ≥ k ] ≤ still distributed, but a bit less than not looking. Analysis Intuition: k n Add edge, add one to lower endpoint’s “count.” Is max load lower? Yes? No? Yes. � k n k � 1 = 1 ≤ Max load is max vertices count. How much lower? k ! n k ! If max count is k . � log n / 2? log n ? O ( loglog n ) ? neighbors with counts ≥ k − 1 , k − 2 , k − 3 ,... . Choose k , so that Pr [ X i ≥ k ] ≤ 1 n 2 . and so on! O ( loglog n ) ! ! ! ! No cycles and max-load k → ≥ 2 k / 2 nodes in tree. Pr [ any X i ≥ k ] ≤ n × 1 n 2 = 1 n → max load ≤ k w.p. ≥ 1 − 1 n Exponentially better! Old bound is exponential of new bound. k ! ≥ n 2 for k = 2 e log n (Recall k ! ≥ ( k No connected component of size X and no cycles e ) k .) = ⇒ max load O ( log X ) . Lemma: Max load is Θ( log n ) with probability ≥ 1 − 1 n . Will show: Much better than n . Max conn. comp is O ( log n ) w.h.p. Actually Max load is Θ( log n / loglog n ) w.h.p. Average induced degree is small. (E.g.: cycle degree 2) (W.h.p. - means with probability at least 1 − O ( 1 / n c ) for today.) Extend tree intuition.
Connected Component. Not dense. Removal Process! Random Graph: Component size is c log n and max-induced degree Induced degree of node on subset, S , is degree of internal edges. Claim: Component size in n vertex, n is 8 w.h.p. 8 edge random graph is O ( log n ) w/ prob. ≥ 1 − 1 n c . Process: Remove degree ≤ 16 nodes pause and incident edges. Repeat. Proof: Size k component, C , contains ≥ k − 1 edges. Claim: O ( log X ) iterations where X is max component size. �� n / 8 � 2 ( k − 1 ) For any connected component: � n �� k Pr [ | C | ≥ k ] ≤ (1) Induced degree of nodes in blue subset is 2, not 5! Average induced degree 8 → half nodes w/degree ≤ 16. k k − 1 n → half nodes removed in each iteration. Claim: Average induced degree on any subset of nodes is ≤ 8 with Possible C . Which edges. Prob. both endpoints inside C . → log X iterations to remove all nodes. probability ≥ 1 − O ( 1 n 2 ) . Claim: Max load is O ( loglog n ) w.h.p. Proof: Induced degree ≥ 8 � 2 k � n �� n / 8 �� k n → 4 k internal edges for subset of size k . Recall edge corresponds to ball. Pr [ | C | ≥ k ] ≤ k k k n Height of ball, h i , is load of bin when it is placed in bin. � 4 k � k � 8 k � 3 k � 3 k � e 1 . 25 � n �� n / 8 �� k � k Corresponding edge removed in iteration r i . � k � 2 k � k � ne � k � k � e 2 n � ne = n ≤ n Pr [ dense S ] ≤ ≤ ≤ k ( 0 . 93 ) k ≤ (2) Property: h i ≤ 16 r i . k 4 k n 32 n n k k 8 k n k 8 Case r i = 1 - only 16 balls incident to bin → h i ≤ 16. Starts at 1 / n 3 , decreasing till k ≤ n / 8 (at least) Induction: Previous removed edges(ball) induce load ≤ 16 ( r i − 1 ) . Choose k = − ( c + 1 ) log . 93 n make probability ≤ 1 / n c . → Total O ( 1 / n 2 ) . + 16 edges/balls this iteration. → h i ≤ 16 r i . Power of two choices. Cuckoo hashing. Sum up Hashing with two choices: max load O ( loglog n ) . Cuckoo hashing: Array. Two hash functions h 1 , h 2 . Insert x : place in h 1 ( x ) or h 2 ( x ) if space. Else bump elt y in h i ( x ) u.a.r. for i ∈ [ 1 , 2 ] . Bump y , x : place y in h j ( y ) where j � = i if space. Max load: log X where X is max component size. Balls in bins: Θ( log n / loglog n ) load. Else bump y ′ in h i ( y ) . X is O ( log n ) with high probability. Power of two: Θ( loglog n ) . If go too long. Fail. Rehash entire hash table. Max load is O ( loglog n ) . Cuckoo hashing. Fails if cycle for insert. C ℓ - event of cycle of length ℓ at a vertex. �� ℓ � ℓ � 2 ( ℓ ) � e 2 � m �� n Pr [ C ℓ ] ≤ ≤ (3) ℓ ℓ n 8 � ℓ � e 2 Probability that an insert hits a cycle of length ℓ ≤ ℓ n 8 Rehash every Ω( n ) inserts (if ≤ n / 8 items in table.) O ( 1 ) time on average.
See you on Thursday...
Recommend
More recommend