sublinear algorithms
play

Sublinear Algorithms Lecture 3 Sofya Raskhodnikova Penn State - PowerPoint PPT Presentation

Sublinear Algorithms Lecture 3 Sofya Raskhodnikova Penn State University 1 Graph Properties Testing if a Graph is Connected [Goldreich Ron] Input: a graph = (, ) on vertices in adjacency lists representation (a list of


  1. Sublinear Algorithms Lecture 3 Sofya Raskhodnikova Penn State University 1

  2. Graph Properties

  3. Testing if a Graph is Connected [Goldreich Ron] Input: a graph 𝐻 = (π‘Š, 𝐹) on π‘œ vertices β€’ in adjacency lists representation (a list of neighbors for each vertex) β€’ maximum degree d , i.e., adjacency lists of length d with some empty entries Query (𝑀, 𝑗) , where 𝑀 ∈ π‘Š and 𝑗 ∈ [𝑒] : entry 𝑗 of adjacency list of vertex 𝑀 Exact Answer: W (dn) time β€’ Approximate version: Is the graph connected or Β² -far from connected? # 𝑝𝑔 π‘“π‘œπ‘’π‘—π‘ π‘“π‘‘ π‘—π‘œ π‘π‘’π‘˜π‘π‘‘π‘“π‘œπ‘‘π‘§ π‘šπ‘—π‘‘π‘’π‘‘ π‘π‘œ π‘₯β„Žπ‘—π‘‘β„Ž 𝐻 1 π‘π‘œπ‘’ 𝐻 2 𝑒𝑗𝑔𝑔𝑓𝑠 dist 𝐻 1 , 𝐻 2 = π‘’π‘œ 1 Time: 𝑃 𝜁 2 𝑒 today No dependence on n! + improvement on HW 3

  4. Testing Connectedness: Algorithm Connectedness Tester(G, d, Ξ΅ ) Repeat s=16/ e d times: 1. pick a random vertex 𝑣 2. determine if connected component of 𝑣 is small: 3. perform BFS from 𝑣 , stopping after at most 8/ e d new nodes Reject if a small connected component was found, otherwise accept. 4. Run time: O( 𝑒 / e 2 𝑒 2 )=O(1/ e 2 𝑒 ) Analysis: β€’ Connected graphs are always accepted. β€’ Remains to show: 2 If a graph is Β² -far from connected, it is rejected with probability β‰₯ 3 4

  5. Testing Connectedness: Analysis Claim 1 If G is e -far from connected, it has β‰₯ e π‘’π‘œ 4 connected components. Claim 2 If G is e -far from connected, it has β‰₯ e π‘’π‘œ 8 connected components of size at most 8/ e d . If Claim 2 holds, at least e π‘’π‘œ 8 nodes are in small connected components. β€’ 2β‹…8 16 e π‘’π‘œ/π‘œ = e 𝑒 nodes to detect one β€’ By Witness lemma, it suffices to sample from a small connected component. 5

  6. Testing Connectedness: Proof of Claim 1 Claim 1 If G is e -far from connected, it has β‰₯ e π‘’π‘œ 4 connected components. We prove the contrapositive: If G has < e π‘’π‘œ 4 connected components, one can make G connected by modifying < e fraction of its representation, i.e., < e π‘’π‘œ entries. β€’ If there are no degree restrictions, k components can be connected by adding k-1 edges, each affecting 2 nodes. Here, k < e π‘’π‘œ 4 , so 2k-2 < e π‘’π‘œ . β€’ What if adjacency lists of all vertices in a component are full, i.e., all vertex degrees are d? 6

  7. Freeing up an Adjacency List Entry Claim 1 If G is e -far from connected, it has β‰₯ e π‘’π‘œ 4 connected components. What if adjacency lists of all vertices in a component are full, i.e., all vertex degrees are d? 𝑀 β€’ Consider an MST of this component. Let 𝑀 be a leaf of the MST. β€’ Disconnect 𝑀 from a node other than its parent in the MST. β€’ β€’ Two entries are changed while keeping the same number of components. β€’ Thus, k components can be connected by adding 2k-1 edges, each affecting 2 nodes. Here, k < e π‘’π‘œ 4 , so 4k-2 < e π‘’π‘œ . 7

  8. Testing Connectedness: Proof of Claim 2 Claim 1 If G is e -far from connected, it has β‰₯ e π‘’π‘œ 4 connected components. Claim 2 If G is e -far from connected, it has β‰₯ e π‘’π‘œ 8 connected components of size at most 8/ e d . If Claim 1 holds, there are at least e π‘’π‘œ 4 connected components. β€’ π‘œ 4 Their average size ≀ e π‘’π‘œ/4 = β€’ e π‘œ . β€’ By an averaging argument (or Markov inequality), at least half of the components are of size at most twice the average. 8

  9. Testing if a Graph is Connected [Goldreich Ron] Input: a graph 𝐻 = (π‘Š, 𝐹) on π‘œ vertices β€’ in adjacency lists representation (a list of neighbors for each vertex) β€’ maximum degree d Connected or 𝜁 -far from connected? 1 𝑃 𝜁 2 𝑒 time (no dependence on π‘œ ) 9

  10. Randomized Approximation in sublinear time Simple Examples

  11. Randomized Approximation: a Toy Example Input: a string π‘₯ ∈ 0,1 π‘œ 0 0 0 1 … 0 1 0 0 Goal: Estimate the fraction of 1’s in π‘₯ (like in polls) It suffices to sample 𝑑 = 1 ⁄ 𝜁 2 positions and output the average to get the fraction of 1’s ±𝜁 (i.e., additive error 𝜁 ) with probability ΒΈ 2/3 Hoeffding Bound Let Y 1 , … , Y s be independently distributed random variables in [0,1] and 𝑑 β‰₯ Ξ΄ ≀ 2e βˆ’2πœ€ 2 /𝑑 . let Y = βˆ‘ Y i (sample sum). Then Pr Y βˆ’ E Y 𝑗=1 𝑑 Y i = value of sample 𝑗 . Then E[Y] = βˆ‘ E[Y i ] = 𝑑 β‹… (fraction of 1’s in π‘₯ ) 𝑗=1 Pr (sample average) βˆ’ fraction of 1β€²s in π‘₯ β‰₯ 𝜁 = Pr Y βˆ’ E Y β‰₯ πœπ‘‘ ≀ 2e βˆ’2πœ€ 2 /𝑑 = 2𝑓 βˆ’2 < 1/3 substitute 𝑑 = 1 ⁄ 𝜁 2 Apply Hoeffding Bound with πœ€ = πœπ‘‘ 11

  12. Approximating # of Connected Components [Chazelle Rubinfeld Trevisan] Input: a graph 𝐻 = (π‘Š, 𝐹) on n vertices β€’ in adjacency lists representation (a list of neighbors for each vertex) β€’ maximum degree d Exact Answer: W (dn) time Additive approximation: # of CC Β± Ξ΅ n with probability ΒΈ 2/3 Time: 𝑒 1 𝑒 β€’ Known: 𝑃 𝜁 2 log 𝜁 , W 𝜁 2 𝑒 β€’ Today: 𝑃 𝜁 3 . No dependence on n! Partially based on slides by Ronitt Rubinfeld: 12 http://stellar.mit.edu/S/course/6/fa10/6.896/courseMaterial/topics/topic3/lectureNotes/lecst11/lecst11.pdf

  13. Approximating # of CCs: Main Idea Let 𝐷 = number of components β€’ For every vertex 𝑣 , define β€’ π‘œ 𝑣 = number of nodes in u’s component Breaks C up into 1 for each component A : βˆ‘ π‘œ 𝑣 = 1 – π‘£βˆˆπ΅ contributions 1 of different nodes βˆ‘ = 𝐷 π‘œ 𝑣 π‘£βˆˆπ‘Š β€’ Estimate this sum by estimating π‘œ 𝑣 ’s for a few random nodes – If 𝑣 ’s component is small, its size can be computed by BFS. – If 𝑣 ’s component is big, then 1/π‘œ 𝑣 is small, so it does not contribute much to the sum – Can stop BFS after a few steps Similar to property tester for connectedness [Goldreich Ron] 13

  14. Approximating # of CCs: Algorithm Estimating π‘œ 𝑣 = the number of nodes in 𝑣 ’s component : 2 Let estimate π‘œ 𝑣 = min π‘œ 𝑣 , β€’ 𝜁 𝑏 – When 𝑣 ’s component has Β· 2/ e nodes , π‘œ 𝑣 = π‘œ 𝑣 𝑑 1 βˆ’ 1 ≀ 𝜁 𝑐 𝑣 = 2/ e , and so 0 < 1 𝑣 βˆ’ 1 π‘œ 𝑣 < 1 𝑣 = 𝜁 2 – Else π‘œ 2 π‘œ 𝑣 π‘œ 𝑣 π‘œ π‘œ 1 = βˆ‘ Corresponding estimate for C is 𝐷 β€’ . It is a good estimate: π‘£βˆˆπ‘Š 𝑣 π‘œ 1 1 1 1 πœπ‘œ βˆ’ 𝐷 = βˆ‘ βˆ’ βˆ‘ ≀ βˆ‘ 𝐷 𝑣 βˆ’ π‘œ 𝑣 ≀ π‘£βˆˆπ‘Š π‘£βˆˆπ‘Š π‘£βˆˆπ‘Š π‘œ 𝑣 π‘œ 𝑣 π‘œ 2 APPROX_#_CCs (G, d, Ξ΅ ) Repeat s= Θ (1/ e 2 ) times: 1. pick a random vertex 𝑣 2. 𝑣 via BFS from 𝑣 , stopping after at most 2/ e new nodes compute π‘œ 3. = (average of the values 1/π‘œ 4. Return 𝐷 𝑣 ) βˆ™ π‘œ Run time: O(d / e 3 ) 14

  15. Approximating # of CCs: Analysis πœπ‘œ 1 βˆ’ 𝐷 > Want to show: Pr 𝐷 ≀ 3 2 Hoeffding Bound Let Y 1 , … , Y s be independently distributed random variables in [0,1] and 𝑑 β‰₯ Ξ΄ ≀ 2e βˆ’2πœ€ 2 /𝑑 . let Y = βˆ‘ Y i (sample sum). Then Pr Y βˆ’ E Y 𝑗=1 Let Y i = 1/π‘œ 𝑣 for the i th vertex 𝑣 in the sample 𝑑 𝑑 𝑑𝐷 1 1 𝑑𝐷 π‘œ βˆ‘ β€’ Y = βˆ‘ Y i = π‘œ and E[Y] = βˆ‘ E[Y i ] = 𝑑 β‹… E[Y 1 ] = 𝑑 β‹… = π‘œ π‘£βˆˆπ‘Š π‘œ 𝑀 𝑗=1 𝑗=1 2 ≀ 2𝑓 βˆ’ 𝜁2𝑑 πœπ‘œ π‘œ π‘œ πœπ‘œ πœπ‘‘ βˆ’ 𝐷 > Pr 𝐷 = Pr 𝑑 𝑍 βˆ’ 𝑑 𝐹 𝑍 > = Pr Y βˆ’ E Y > 2 2 2 1 1 Need 𝑑 = Θ 𝜁 2 samples to get probability ≀ β€’ 3 15

  16. Approximating # of CCs: Analysis πœπ‘œ βˆ’ 𝐷 ≀ So far: 𝐷 2 πœπ‘œ 1 βˆ’ 𝐷 > Pr 𝐷 ≀ 3 2 2 β€’ With probability β‰₯ 3 , βˆ’ 𝐷 ≀ πœπ‘œ 2 + πœπ‘œ βˆ’ 𝐷 ≀ 𝐷 βˆ’ 𝐷 + 𝐷 𝐷 2 ≀ πœπ‘œ Summary: The number of connected components in π‘œ -vetex graphs of 𝑒 degree at most 𝑒 can be estimated within Β±πœπ‘œ in time 𝑃 𝜁 3 . 16

  17. Minimum spanning tree (MST) β€’ What is the cheapest way to connect all the dots? Input: a weighted graph 3 with n vertices and m edges 4 2 7 1 5 β€’ Exact computation: – Deterministic 𝑃(𝑛 βˆ™ inverse-Ackermann (𝑛)) time [Chazelle] – Randomized 𝑃(𝑛) time [Karger Klein Tarjan] Partially based on slides by Ronitt Rubinfeld: 17 http://stellar.mit.edu/S/course/6/fa10/6.896/courseMaterial/topics/topic3/lectureNotes/lecst11/lecst11.pdf

  18. Approximating MST Weight in Sublinear Time [Chazelle Rubinfeld Trevisan] Input: a graph 𝐻 = (π‘Š, 𝐹) on n vertices β€’ in adjacency lists representation β€’ maximum degree d and maximum allowed weight w β€’ weights in {1,2,…, w } Output: (1+ Ξ΅ )-approximation to MST weight, π‘₯ π‘π‘‡π‘ˆ Time: 𝑒π‘₯ 𝑒π‘₯ 𝑒π‘₯ No dependence on n! , W β€’ Known: 𝑃 𝜁 3 log 𝜁 2 𝜁 𝑒π‘₯ 3 log π‘₯ β€’ Today: 𝑃 𝜁 3 18

  19. Idea Behind Algorithm β€’ Characterize MST weight in terms of number of connected components in certain subgraphs of G β€’ Already know that number of connected components can be estimated quickly 19

  20. MST and Connected Components: Warm-up β€’ Recall Kruskal’s algorithm for computing MST exactly. Suppose all weights are 1 or 2. Then MST weight = (# weight-1 edges in MST) + 2 β‹… (# weight-2 edges in MST) = π‘œ – 1 + (# of weight-2 edges in MST) MST has π‘œ βˆ’ 1 edges = π‘œ – 1 + (# of CCs induced by weight-1 edges) βˆ’1 By Kruskal weight 1 MST connected components weight 2 induced by weight-1 edges

Recommend


More recommend