Advanced Algorithms (III) Shanghai Jiao Tong University Chihao - PowerPoint PPT Presentation

Advanced Algorithms (III) Shanghai Jiao Tong University Chihao Zhang March 16th, 2020

Balls-into-Bins

Balls-into-Bins Throw balls into bins uniformly at random m n

Balls-into-Bins Throw balls into bins uniformly at random m n • What is the chance that some bin contains more than one balls? (Birthday paradox)

Balls-into-Bins Throw balls into bins uniformly at random m n • What is the chance that some bin contains more than one balls? (Birthday paradox) • How many balls in the fullest bin? (Max load)

Balls-into-Bins Throw balls into bins uniformly at random m n • What is the chance that some bin contains more than one balls? (Birthday paradox) • How many balls in the fullest bin? (Max load) • How large is to hit all bins (Coupon Collector) m

Birthday Paradox

Birthday Paradox In a group of more than 30 people, which very high chances that two of them have the same birthday

Birthday Paradox In a group of more than 30 people, which very high chances that two of them have the same birthday Pr[no same birthday] ≤ 1 ⋅ ( ) ⋅ ( ) … ( ) n − 1 n − 2 n − m + 1 n n n = exp ( − m ( m − 1) ∑ m − 1 m − 1 i =1 i i =1 ( 1 − i n ) ≤ exp − ) ∏ = n 2 n

Pr[no same birthday] ≤ exp ( − m ( m − 1) ) 2 n

Pr[no same birthday] ≤ exp ( − m ( m − 1) ) 2 n For , , the probability is less than 0.304 m = 30 n = 365

Pr[no same birthday] ≤ exp ( − m ( m − 1) ) 2 n For , , the probability is less than 0.304 m = 30 n = 365 m = O ( n ) For , the probability can be arbitrarily close to 0.

Max Load

Max Load Let be the number of balls in the -th bin X i i

Max Load Let be the number of balls in the -th bin X i i What is We analyze this when X = max i ∈ [ n ] X i ? m = n

Max Load Let be the number of balls in the -th bin X i i What is We analyze this when X = max i ∈ [ n ] X i ? m = n If we can argue that, is less than with X 1 k 1 − O ( n ) 1 probability , then by union bound, Pr[ X ≥ k ] = O (1)

Again by union bound, Pr[ X 1 ≥ k ] ≤ ( k ) n − k ≤ 1 n k !

Again by union bound, Pr[ X 1 ≥ k ] ≤ ( k ) n − k ≤ 1 n k ! 2 π k ( k e ) k We apply the Stirling’s formula k ! ≈

Again by union bound, Pr[ X 1 ≥ k ] ≤ ( k ) n − k ≤ 1 n k ! 2 π k ( k e ) k We apply the Stirling’s formula k ! ≈ k ! ≤ ( k k ) So Pr[ X ≥ k ] ≤ 1 e

Again by union bound, Pr[ X 1 ≥ k ] ≤ ( k ) n − k ≤ 1 n k ! 2 π k ( k e ) k We apply the Stirling’s formula k ! ≈ k ! ≤ ( k k ) So Pr[ X ≥ k ] ≤ 1 e = O ( k = O ( k ( k ) n ) log log n ) e 1 log n We want . Choose

Concentration Bounds

Concentration Bounds We shall develop general tools to obtain “with high probability” results…

Concentration Bounds We shall develop general tools to obtain “with high probability” results… These results are critical for analyzing randomized algorithms

Concentration Bounds We shall develop general tools to obtain “with high probability” results… These results are critical for analyzing randomized algorithms This is the main topic in the coming 4-5 weeks

Markov Inequality

Markov Inequality Markov Inequality For any nonnegative random variable and , X a > 0 Pr[ X > a ] ≤ E [ X ] a

Markov Inequality Markov Inequality For any nonnegative random variable and , X a > 0 Pr[ X > a ] ≤ E [ X ] a Proof . E [ X ] = E [ X ∣ X > a ] ⋅ Pr[ X > a ] + E [ X | X ≤ a ] ⋅ Pr[ X ≤ a ] ≥ a ⋅ Pr[ X > a ]

Applications

Applications • A Las-Vegas randomized algorithm with expected running time terminates in O ( n 2 ) time with O ( n ) 1 − O ( n ) 1 probability

Applications • A Las-Vegas randomized algorithm with expected running time terminates in O ( n 2 ) time with O ( n ) 1 − O ( n ) 1 probability • In -balls-into- -bins problem, . So n n E [ X i ] = 1 Pr [ X 1 > log log n ] ≤ log log n log n log n

Applications • A Las-Vegas randomized algorithm with expected running time terminates in O ( n 2 ) time with O ( n ) 1 − O ( n ) 1 probability • In -balls-into- -bins problem, . So n n E [ X i ] = 1 Pr [ X 1 > log log n ] ≤ log log n log n log n This is far from the truth…

Chebyshev’s Inequality

Chebyshev’s Inequality A common trick to improve concentration is to consider instead of for some non- E [ f ( X )] E [ X ] decreasing f : ℝ → ℝ

Chebyshev’s Inequality A common trick to improve concentration is to consider instead of for some non- E [ f ( X )] E [ X ] decreasing f : ℝ → ℝ E [ f ( X ) ] Pr [ X ≥ a ] = Pr [ f ( X ) ≥ f ( a ) ] ≤ f ( a )

Chebyshev’s Inequality A common trick to improve concentration is to consider instead of for some non- E [ f ( X )] E [ X ] decreasing f : ℝ → ℝ E [ f ( X ) ] Pr [ X ≥ a ] = Pr [ f ( X ) ≥ f ( a ) ] ≤ f ( a ) f ( x ) = x 2 gives the Chebyshev’s inequality

Chebyshev’s Inequality A common trick to improve concentration is to consider instead of for some non- E [ f ( X )] E [ X ] decreasing f : ℝ → ℝ E [ f ( X ) ] Pr [ X ≥ a ] = Pr [ f ( X ) ≥ f ( a ) ] ≤ f ( a ) f ( x ) = x 2 gives the Chebyshev’s inequality Pr[ X ≥ a ] ≤ E [ X 2 ] or Pr [ | X − E [ X ] | ≥ a ] ≤ Var [ X ] a 2 a 2

Coupon Collector

Coupon Collector Recall the coupon collector problem is to ask

Coupon Collector Recall the coupon collector problem is to ask “How many ball one needs to throw so that none of the bins is empty?” n

Coupon Collector Recall the coupon collector problem is to ask “How many ball one needs to throw so that none of the bins is empty?” n We already established that E [ X ] = nH n ≈ n (log n + γ )

Coupon Collector Recall the coupon collector problem is to ask “How many ball one needs to throw so that none of the bins is empty?” n We already established that E [ X ] = nH n ≈ n (log n + γ ) The Markov inequality only provides a very weak concentration…

In order to apply Chebyshev’s inequality, we need to compute Var [ X ] = E [ X 2 ] − ( E [ X ]) 2

In order to apply Chebyshev’s inequality, we need to compute Var [ X ] = E [ X 2 ] − ( E [ X ]) 2 n − 1 ∑ Recall that where each follows geometric X = X i X i i =0 n − i distribution with parameter n

In order to apply Chebyshev’s inequality, we need to compute Var [ X ] = E [ X 2 ] − ( E [ X ]) 2 n − 1 ∑ Recall that where each follows geometric X = X i X i i =0 n − i distribution with parameter n are independent, so X 0 , …, X n − 1

In order to apply Chebyshev’s inequality, we need to compute Var [ X ] = E [ X 2 ] − ( E [ X ]) 2 n − 1 ∑ Recall that where each follows geometric X = X i X i i =0 n − i distribution with parameter n are independent, so X 0 , …, X n − 1 Var [ X i ] = n − 1 n − 1 ∑ ∑ Var [ X i ] i =0 i =0

Variance of Geometric Variables Assume follow geometric distribution with Y parameter p ∞ i 2 (1 − p ) i − 1 p = 2 − p ∑ E [ Y 2 ] = p 2 i =1 Var [ Y ] = E [ Y 2 ] − ( E [ Y ]) 2 = 1 − p p 2

n − 1 n − 1 n − 1 n ⋅ i 1 ∑ ∑ ∑ ( n − i ) 2 ≤ n 2 Var [ X ] = Var [ X i ] = ( n − i ) 2 i =0 i =0 i =0 n 2 ) = π 2 n 2 ( 1 2 + 1 1 2 2 + 1 3 2 + … + 1 = n 2 . 6

n − 1 n − 1 n − 1 n ⋅ i 1 ∑ ∑ ∑ ( n − i ) 2 ≤ n 2 Var [ X ] = Var [ X i ] = ( n − i ) 2 i =0 i =0 i =0 n 2 ) = π 2 n 2 ( 1 2 + 1 1 2 2 + 1 3 2 + … + 1 = n 2 . 6 By Chebyshev’s inequality, Pr[ X ≥ nH n + cn ] ≤ π 2 6 c 2

n − 1 n − 1 n − 1 n ⋅ i 1 ∑ ∑ ∑ ( n − i ) 2 ≤ n 2 Var [ X ] = Var [ X i ] = ( n − i ) 2 i =0 i =0 i =0 n 2 ) = π 2 n 2 ( 1 2 + 1 1 2 2 + 1 3 2 + … + 1 = n 2 . 6 By Chebyshev’s inequality, Pr[ X ≥ nH n + cn ] ≤ π 2 6 c 2 The use of Chebyshev’s inequality is often referred to as the “second-moment method”

Random Graph

Random Graph Erd ő s–Rényi random graph G ( n , p )

Random Graph Erd ő s–Rényi random graph G ( n , p ) vertices, each edge appears with probability n p independently

Random Graph Erd ő s–Rényi random graph G ( n , p ) vertices, each edge appears with probability n p independently Given a graph property , define its threshold function P as: r ( n )

Random Graph Erd ő s–Rényi random graph G ( n , p ) vertices, each edge appears with probability n p independently Given a graph property , define its threshold function P as: r ( n ) • if , does not satisfy whp; p ≪ r ( n ) G ∼ G ( n , p ) P • if , satisfies P whp. p ≫ r ( n ) G ∼ G ( n , p )

We will show that the property “ contains a -clique” P = G 4 n − 2/3 has threshold function

We will show that the property “ contains a -clique” P = G 4 n − 2/3 has threshold function S ∈ ( 4 ) [ n ] For every , let be the indicator that X S “ is a clique”. G [ S ]

Advanced Algorithms (III) Shanghai Jiao Tong University Chihao - PowerPoint PPT Presentation

Advanced Algorithms (III) Shanghai Jiao Tong University Chihao Zhang March 16th, 2020 Balls-into-Bins Balls-into-Bins Throw balls into bins uniformly at random m n Balls-into-Bins Throw balls into bins uniformly at random m n

Advanced Algorithms (III) Chihao Zhang Shanghai Jiao Tong University Mar. 11, 2019 Advanced

I III IV I III IV I III IV BUILDING TRUST Radical Candor Chart HIGH I III IV

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

R i f R i f Reinforcement Learning III Reinforcement Learning III t L t L i i III III Dec

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

Advanced Algorithms (I) Chihao Zhang Shanghai Jiao Tong University Feb. 25, 2019 Advanced

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Advanced Algorithms (IV) Chihao Zhang Shanghai Jiao Tong University Mar. 18, 2019 Advanced

Data and Analysis Part III Unstructured Data Ian Stark February 2011 Part III: Unstructured

Algorithms and Data Structures Lecture 10 Graph Algorithms III: Shortest Paths Fabian Kuhn

ATLAS ATLAS III-V Advanced Material Device Modeling Requirements for III-V Device Simulation

General remarks Algorithms Algorithms Oliver Oliver Week 8 Kullmann Kullmann Greedy Greedy

EVIDENCE OF INTERMEDIATE-SCALE ENERGY SPECTRUM ANISOTROPY IN THE NORTHERN HEMISPHERE FROM

Polynomiality for Bin Packing with a Constant Number of Item Types Michel X. Goemans & Thomas

to functional package management with GNU Guix Package managers are really useful. But they can

Today Simplest.. Load balance: m balls in n bins. k n k k n n ne For

Discrepancy Theory and Applications to Bin Packing Thomas Rothvoss Joint work with Becca Hoberg

XL1F: Create Histogram using HISTOGRAM in Excel 2013 V0G XL1F: V0G Create Histogram using

Generalized Linear Models (GLMs) Jonathan Pillow 1 Example 3: unknown neuron 100 75 (spike

Inherent Trade-Offs in Algorithmic Fairness Instructor: Haifeng Xu COMPAS: A Risk Prediction T

Advanced Algorithms (III) Shanghai Jiao Tong University Chihao - PowerPoint PPT Presentation

Advanced Algorithms (III) Shanghai Jiao Tong University Chihao Zhang March 16th, 2020 Balls-into-Bins Balls-into-Bins Throw balls into bins uniformly at random m n Balls-into-Bins Throw balls into bins uniformly at random m n

Advanced Algorithms (III) Chihao Zhang Shanghai Jiao Tong University Mar. 11, 2019 Advanced

I III IV I III IV I III IV BUILDING TRUST Radical Candor Chart HIGH I III IV

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

R i f R i f Reinforcement Learning III Reinforcement Learning III t L t L i i III III Dec

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

Advanced Algorithms (I) Chihao Zhang Shanghai Jiao Tong University Feb. 25, 2019 Advanced

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Advanced Algorithms (IV) Chihao Zhang Shanghai Jiao Tong University Mar. 18, 2019 Advanced

Data and Analysis Part III Unstructured Data Ian Stark February 2011 Part III: Unstructured

Algorithms and Data Structures Lecture 10 Graph Algorithms III: Shortest Paths Fabian Kuhn

ATLAS ATLAS III-V Advanced Material Device Modeling Requirements for III-V Device Simulation

General remarks Algorithms Algorithms Oliver Oliver Week 8 Kullmann Kullmann Greedy Greedy

EVIDENCE OF INTERMEDIATE-SCALE ENERGY SPECTRUM ANISOTROPY IN THE NORTHERN HEMISPHERE FROM

Polynomiality for Bin Packing with a Constant Number of Item Types Michel X. Goemans &amp; Thomas

to functional package management with GNU Guix Package managers are really useful. But they can

Today Simplest.. Load balance: m balls in n bins. k n k k n n ne For

Discrepancy Theory and Applications to Bin Packing Thomas Rothvoss Joint work with Becca Hoberg

XL1F: Create Histogram using HISTOGRAM in Excel 2013 V0G XL1F: V0G Create Histogram using

Generalized Linear Models (GLMs) Jonathan Pillow 1 Example 3: unknown neuron 100 75 (spike

Inherent Trade-Offs in Algorithmic Fairness Instructor: Haifeng Xu COMPAS: A Risk Prediction T

Polynomiality for Bin Packing with a Constant Number of Item Types Michel X. Goemans & Thomas