Sublinear Algorithms L ECTURE 3 Last time • Properties of lists and functions. • Testing if a list is sorted/Lipschitz and if a function is monotone. Today • Testing if a graph is connected. • Estimating the number of connected components. • Estimating the weight of a MST 9/10/2020 Sofya Raskhodnikova;Boston University
Graph Properties
Testing if a Graph is Connected [Goldreich Ron] Input: a graph 𝐻 = (𝑊, 𝐹) on 𝑜 vertices • in adjacency lists representation (a list of neighbors for each vertex) • maximum degree d , i.e., adjacency lists of length d with some empty entries Query (𝑤, 𝑗) , where 𝑤 ∈ 𝑊 and 𝑗 ∈ [𝑒] : entry 𝑗 of adjacency list of vertex 𝑤 Exact Answer: W (dn) time • Approximate version: Is the graph connected or ² -far from connected? # 𝑝𝑔 𝑓𝑜𝑢𝑗𝑠𝑓𝑡 𝑗𝑜 𝑏𝑒𝑘𝑏𝑑𝑓𝑜𝑑𝑧 𝑚𝑗𝑡𝑢𝑡 𝑝𝑜 𝑥ℎ𝑗𝑑ℎ 𝐻 1 𝑏𝑜𝑒 𝐻 2 𝑒𝑗𝑔𝑔𝑓𝑠 dist 𝐻 1 , 𝐻 2 = 𝑒𝑜 1 Time: 𝑃 today No dependence on n! 𝜁 2 𝑒 + improvement on HW 3
Testing Connectedness: Algorithm Connectedness Tester(n, d, ε , query access to G) Repeat s=8/ e d times: 1. pick a random vertex 𝑣 2. 3. determine if connected component of 𝑣 is small: perform BFS from 𝑣 , stopping after at most 4/ e d new nodes Reject if a small connected component was found, otherwise accept. 4. Run time: O( 𝑒 / e 2 𝑒 2 )=O(1/ e 2 𝑒 ) Analysis: • Connected graphs are always accepted. • Remains to show: 2 If a graph is ² -far from connected, it is rejected with probability ≥ 3 4
Testing Connectedness: Analysis Claim 1 If G is e -far from connected, it has ≥ e 𝑒𝑜 connected components. 2 Claim 2 If G is e -far from connected, it has ≥ e 𝑒𝑜 connected components 4 of size at most 4/ e d . By Claim 2, at least e 𝑒𝑜 • nodes are in small connected components. 4 2⋅4 8 • By Witness lemma, it suffices to sample e 𝑒𝑜/𝑜 = e 𝑒 nodes to detect one from a small connected component. 5
Testing Connectedness: Proof of Claim 1 Claim 1 If G is e -far from connected, it has ≥ e 𝑒𝑜 connected components. 2 We prove the contrapositive: If G has < e 𝑒𝑜 connected components, one can make G connected by 2 modifying < e fraction of its representation, i.e., < e 𝑒𝑜 entries. • If there are no degree restrictions, k components can be connected by adding 𝑙 -1 edges, each affecting 2 nodes. Here, 𝑙 < e 𝑒𝑜 2 , so 2𝑙 − 2 < e 𝑒𝑜 . • What if adjacency lists of all vertices in a component are full, i.e., all vertex degrees are d? 6
Freeing up an Adjacency List Entry Claim 1 If G is e -far from connected, it has ≥ e 𝑒𝑜 connected components. 2 What if adjacency lists of all vertices in a component are full, i.e., all vertex degrees are d? 𝑤 • Consider an MST of this component. Let 𝑤 be a leaf of the MST. • Disconnect 𝑤 from a node other than its parent in the MST. • • Two entries are changed while keeping the same number of components. 7
Freeing up an Adjacency List Entry Claim 1 If G is e -far from connected, it has ≥ e 𝑒𝑜 connected components. 2 What if adjacency lists of all vertices in a component are full, i.e., all vertex degrees are d? 𝑤 • Apply this to each component with <2 free spots in adjacency lists. • Now we can connect all the components using the freed up spots while ensuring that we never change more than 2 spots per component. • Thus, k components can be connected by changing 2k spots. Here, k < e 𝑒𝑜 2 , so 2k < e 𝑒𝑜 . 8
Testing Connectedness: Proof of Claim 2 Claim 1 If G is e -far from connected, it has ≥ e 𝑒𝑜 connected components. 2 Claim 2 If G is e -far from connected, it has ≥ e 𝑒𝑜 connected components 4 of size at most 4/ e d . By Claim 1, there are at least e 𝑒𝑜 • connected components. 2 𝑜 2 • Their average size is at most e 𝑒𝑜/2 = e 𝑒 . • By an averaging argument (or Markov inequality), at least half of the components are of size at most twice the average. 9
Testing if a Graph is Connected [Goldreich Ron] Input: a graph 𝐻 = (𝑊, 𝐹) on 𝑜 vertices • in adjacency lists representation (a list of neighbors for each vertex) • maximum degree d Connected or 𝜁 -far from connected? 1 𝑃 𝜁 2 𝑒 time (no dependence on 𝑜 ) 10
Randomized Approximation in sublinear time A Simple Example
Randomized Approximation: a Toy Example Input: a string 𝑥 ∈ 0,1 𝑜 0 0 0 1 … 0 1 0 0 Goal: Estimate the fraction of 1’s in 𝑥 (like in polls) It suffices to sample 𝑡 = 1 ⁄ 𝜁 2 positions and output the average to get the fraction of 1’s ±𝜁 (i.e., additive error 𝜁 ) with probability ¸ 2/3 Hoeffding Bound Let Y 1 , … , Y s be independently distributed random variables in [0,1]. 𝑡 1 ≥ 𝜁 ≤ 2e −2𝑡𝜁 2 . Let Y = 𝑡 ⋅ ∑ Y i (called sample mean ). Then Pr Y − E Y 𝑗=1 𝑡 1 Y i = value of sample 𝑗 . Then E[Y] = 𝑡 ⋅ ∑ E[Y i ] = (fraction of 1’s in 𝑥 ) 𝑗=1 Pr (sample mean) − fraction of 1′s in 𝑥 ≥ 𝜁 ≤ 2e −2𝑡𝜁 2 = 2𝑓 −2 < 1/3 substitute 𝑡 = 1 ⁄ 𝜁 2 Apply Hoeffding Bound 12
Approximating # of Connected Components [Chazelle Rubinfeld Trevisan] Input: a graph 𝐻 = (𝑊, 𝐹) on n vertices • in adjacency lists representation (a list of neighbors for each vertex) • maximum degree d Exact Answer: W (dn) time Additive approximation: # of CC ± ε n with probability ¸ 2/3 Time: 𝑒 1 𝑒 𝜁 , W • Known: 𝑃 𝜁 2 log 𝜁 2 𝑒 𝜁 3 . No dependence on n! • Today: 𝑃 Partially based on slides by Ronitt Rubinfeld: 13 http://stellar.mit.edu/S/course/6/fa10/6.896/courseMaterial/topics/topic3/lectureNotes/lecst11/lecst11.pdf
Approximating # of CCs: Main Idea Let 𝐷 = number of components • For every vertex 𝑣 , define • 𝑜 𝑣 = number of nodes in u’s component Breaks C up into for each component A : ∑ 𝑣∈𝐵 1 – 𝑜 𝑣 = 1 contributions 1 of different nodes ∑ = 𝐷 𝑜 𝑣 𝑣∈𝑊 • Estimate this sum by estimating 𝑜 𝑣 ’s for a few random nodes – If 𝑣 ’s component is small, its size can be computed by BFS. – If 𝑣 ’s component is big, then 1/𝑜 𝑣 is small, so it does not contribute much to the sum – Can stop BFS after a few steps Similar to property tester for connectedness [Goldreich Ron] 14
Approximating # of CCs: Algorithm Estimating 𝑜 𝑣 = the number of nodes in 𝑣 ’s component : 2 Let estimate ො 𝑜 𝑣 = min 𝑜 𝑣 , • 𝜁 𝑏 – When 𝑣 ’s component has · 2/ e nodes , ො 𝑜 𝑣 = 𝑜 𝑣 1 − 1 ≤ 𝜁 𝑐 𝑜 𝑣 = 2/ e , and so 0 < 1 𝑜 𝑣 − 1 𝑜 𝑣 < 1 𝑜 𝑣 = 𝜁 ൢ – Else ො 𝑑 𝑜 𝑣 ො 𝑜 𝑣 2 ො ො 2 1 Corresponding estimate for C is መ 𝐷 = ∑ 𝑣∈𝑊 • 𝑜 𝑣 . It is a good estimate: ො 1 1 1 1 𝜁𝑜 መ 𝐷 − 𝐷 = ∑ 𝑣∈𝑊 𝑜 𝑣 − ∑ 𝑣∈𝑊 𝑜 𝑣 ≤ ∑ 𝑣∈𝑊 𝑜 𝑣 − 𝑜 𝑣 ≤ ො ො 2 APPROX_#_CCs (n, d, ε , query access to G) Repeat s= Θ (1/ e 2 ) times: 1. pick a random vertex 𝑣 2. 𝑜 𝑣 via BFS from 𝑣 , stopping after at most 2/ e new nodes 3. compute ො Return ሚ 𝐷 = (average of the values 1/ො 𝑜 𝑣 ) ∙ 𝑜 4. Run time: O(d / e 3 ) 15
Approximating # of CCs: Analysis 𝜁𝑜 1 𝐷 − መ ሚ Want to show: Pr 𝐷 > ≤ 2 3 Hoeffding Bound Let Y 1 , … , Y s be independently distributed random variables in [0,1]. 𝑡 1 ≥ 𝜁 ≤ 2e −2𝑡𝜁 2 . Let Y = 𝑡 ⋅ ∑ Y i (called sample mean ). Then Pr Y − E Y 𝑗=1 𝑜 𝑣 for the i th vertex 𝑣 in the sample Let Y i = 1/ො 𝑡 ሚ 1 𝐷 • Y = 𝑡 ⋅ ∑ Y i = 𝑜 𝑗=1 𝑡 መ 1 1 1 𝐷 • E[Y] = 𝑡 ⋅ ∑ E[Y i ] = E[Y 1 ] = 𝑜 ∑ 𝑣∈𝑊 𝑜 𝑣 = ො 𝑜 𝑗=1 2 ≤ 2𝑓 − 𝜁2𝑡 𝜁𝑜 𝜁𝑜 𝜁 𝐷 − መ ሚ Pr 𝐷 > = Pr 𝑜𝑍 − 𝑜𝐹 𝑍 > = Pr Y − E Y > 2 2 2 1 1 • Need 𝑡 = Θ 𝜁 2 samples to get probability ≤ 3 16
Approximating # of CCs: Analysis 𝜁𝑜 መ 𝐷 − 𝐷 ≤ So far: 2 𝜁𝑜 1 𝐷 − መ ሚ Pr 𝐷 > ≤ 2 3 2 • With probability ≥ 3 , 𝐷 − 𝐷 ≤ 𝜁𝑜 2 + 𝜁𝑜 ሚ 𝐷 − መ ሚ መ 𝐷 − 𝐷 ≤ 𝐷 + 2 ≤ 𝜁𝑜 Summary: The number of connected components in 𝑜 -vetex graphs of 𝑒 𝜁 3 . degree at most 𝑒 can be estimated within ±𝜁𝑜 in time 𝑃 17
Minimum spanning tree (MST) • What is the cheapest way to connect all the dots? Input: a weighted graph 3 with n vertices and m edges 4 2 7 1 5 • Exact computation: – Deterministic 𝑃(𝑛 ∙ inverse-Ackermann (𝑛)) time [Chazelle] – Randomized 𝑃(𝑛) time [Karger Klein Tarjan] Partially based on slides by Ronitt Rubinfeld: 18 http://stellar.mit.edu/S/course/6/fa10/6.896/courseMaterial/topics/topic3/lectureNotes/lecst11/lecst11.pdf
Recommend
More recommend