mapreduce with parallelizable reduce
play

Mapreduce With Parallelizable Reduce S. Muthu Muthukrishnan - PowerPoint PPT Presentation

Mapreduce With Parallelizable Reduce S. Muthu Muthukrishnan Some Premises At a deliberately high level, we know the MapReduce system. Some Premises At a deliberately high level, we know the


  1. Mapreduce With Parallelizable Reduce S. Muthu Muthukrishnan

  2. ■ ■ ■ ■ ■ Some Premises ■ At a deliberately high level, we know the MapReduce system.

  3. ■ ■ ■ ■ Some Premises ■ At a deliberately high level, we know the MapReduce system. ■ Parallel. Map and Reduce functions. Used when data is large. Changing system.

  4. ■ ■ ■ Some Premises ■ At a deliberately high level, we know the MapReduce system. ■ Parallel. Map and Reduce functions. Used when data is large. Changing system. ■ There is nice PRAM theory of parallel algorithms.

  5. ■ ■ Some Premises ■ At a deliberately high level, we know the MapReduce system. ■ Parallel. Map and Reduce functions. Used when data is large. Changing system. ■ There is nice PRAM theory of parallel algorithms. ■ NC, prefix sums, list ranking, and more.

  6. ■ Some Premises ■ At a deliberately high level, we know the MapReduce system. ■ Parallel. Map and Reduce functions. Used when data is large. Changing system. ■ There is nice PRAM theory of parallel algorithms. ■ NC, prefix sums, list ranking, and more. ■ Goal: Develop a useful theory of MapReduce algorithms.

  7. Some Premises ■ At a deliberately high level, we know the MapReduce system. ■ Parallel. Map and Reduce functions. Used when data is large. Changing system. ■ There is nice PRAM theory of parallel algorithms. ■ NC, prefix sums, list ranking, and more. ■ Goal: Develop a useful theory of MapReduce algorithms. ■ An algorithmus role. Interesting problems, algorithms. Bridge from the other side.

  8. ❬ ❀ ✿ ✿ ✿ ❀ ❪ ✮ ❬ ❀ ✁ ✁ ✁ ❀ ❪ ■ ❬ ❪ ❂ P ❬ ❪ ✔ ■ ✰ ❀ ✁ ✁ ✁ ❀ ✭ ✰ ✮ ♣ ❪ ❬ ♣ ■ ❬ ❀ ♣ ❪ ■ ❬ ❪ ❂ P ✭ ✰ ✮ ♣ ❬ ❪ ♣ ✰ ❬ � ❪ ■ ✭ ✮ ■ ✭ ✮ ■ Thoughts Circa 2006 ■ Prefix sum in O ✭ 1 ✮ rounds.

  9. ■ ✰ ❀ ✁ ✁ ✁ ❀ ✭ ✰ ✮ ♣ ❪ ❬ ♣ ■ ❬ ❀ ♣ ❪ ■ ❬ ❪ ❂ P ✭ ✰ ✮ ♣ ❬ ❪ ♣ ✰ ❬ � ❪ ■ ✭ ✮ ■ ✭ ✮ ■ Thoughts Circa 2006 ■ Prefix sum in O ✭ 1 ✮ rounds. ■ Problem: A ❬ 1 ❀ ✿ ✿ ✿ ❀ n ❪ ✮ PA ❬ 1 ❀ ✁ ✁ ✁ ❀ n ❪ where PA ❬ i ❪ ❂ P j ✔ i A ❬ j ❪ .

  10. ❬ ❀ ♣ ❪ ■ ❬ ❪ ❂ P ✭ ✰ ✮ ♣ ❬ ❪ ♣ ✰ ❬ � ❪ ■ ✭ ✮ ■ ✭ ✮ ■ Thoughts Circa 2006 ■ Prefix sum in O ✭ 1 ✮ rounds. ■ Problem: A ❬ 1 ❀ ✿ ✿ ✿ ❀ n ❪ ✮ PA ❬ 1 ❀ ✁ ✁ ✁ ❀ n ❪ where PA ❬ i ❪ ❂ P j ✔ i A ❬ j ❪ . ■ Solution: ■ Assign A ❬ i ♣ n ✰ 1 ❀ ✁ ✁ ✁ ❀ ✭ i ✰ 1 ✮ ♣ n ❪ to key i .

  11. ❬ � ❪ ■ ✭ ✮ ■ ✭ ✮ ■ Thoughts Circa 2006 ■ Prefix sum in O ✭ 1 ✮ rounds. ■ Problem: A ❬ 1 ❀ ✿ ✿ ✿ ❀ n ❪ ✮ PA ❬ 1 ❀ ✁ ✁ ✁ ❀ n ❪ where PA ❬ i ❪ ❂ P j ✔ i A ❬ j ❪ . ■ Solution: ■ Assign A ❬ i ♣ n ✰ 1 ❀ ✁ ✁ ✁ ❀ ✭ i ✰ 1 ✮ ♣ n ❪ to key i . ■ Solve problem on B ❬ 1 ❀ ♣ n ❪ with one proc, B ❬ i ❪ ❂ P ✭ i ✰ 1 ✮ ♣ n A ❬ j ❪ . Doable? i ♣ n ✰ 1

  12. ✭ ✮ ■ ✭ ✮ ■ Thoughts Circa 2006 ■ Prefix sum in O ✭ 1 ✮ rounds. ■ Problem: A ❬ 1 ❀ ✿ ✿ ✿ ❀ n ❪ ✮ PA ❬ 1 ❀ ✁ ✁ ✁ ❀ n ❪ where PA ❬ i ❪ ❂ P j ✔ i A ❬ j ❪ . ■ Solution: ■ Assign A ❬ i ♣ n ✰ 1 ❀ ✁ ✁ ✁ ❀ ✭ i ✰ 1 ✮ ♣ n ❪ to key i . ■ Solve problem on B ❬ 1 ❀ ♣ n ❪ with one proc, B ❬ i ❪ ❂ P ✭ i ✰ 1 ✮ ♣ n A ❬ j ❪ . Doable? i ♣ n ✰ 1 ■ Solve problem for key i with PB ❬ i � 1 ❪ . Doable?

  13. Thoughts Circa 2006 ■ Prefix sum in O ✭ 1 ✮ rounds. ■ Problem: A ❬ 1 ❀ ✿ ✿ ✿ ❀ n ❪ ✮ PA ❬ 1 ❀ ✁ ✁ ✁ ❀ n ❪ where PA ❬ i ❪ ❂ P j ✔ i A ❬ j ❪ . ■ Solution: ■ Assign A ❬ i ♣ n ✰ 1 ❀ ✁ ✁ ✁ ❀ ✭ i ✰ 1 ✮ ♣ n ❪ to key i . ■ Solve problem on B ❬ 1 ❀ ♣ n ❪ with one proc, B ❬ i ❪ ❂ P ✭ i ✰ 1 ✮ ♣ n A ❬ j ❪ . Doable? i ♣ n ✰ 1 ■ Solve problem for key i with PB ❬ i � 1 ❪ . Doable? ■ List ranking in O ✭ 1 ✮ rounds? ■ Some graph algorithms in O ✭ 1 ✮ rounds recently.

  14. ■ ✭ ❀ ✮ ✭ ❀ ❀ ✮ ■ ❀ ■ ✭ ❀ ❀ ✮ ■ P ✕ ✕ ■ ■ ■ ❂ ✕ ✕ ✕ ■ ■ SIROCCO Challenge ■ Problem: Given graph G ❂ ✭ V ❀ E ✮ , count the number of triangles. 1 1 For ex, see. Fast Counting of Triangles in Large Real Networks without counting: Algorithms and Laws, ICDM 08, by C. Tsourakakis.

  15. P ✕ ✕ ■ ■ ■ ❂ ✕ ✕ ✕ ■ ■ SIROCCO Challenge ■ Problem: Given graph G ❂ ✭ V ❀ E ✮ , count the number of triangles. 1 ■ Solution: ■ For each edge ✭ u ❀ v ✮ , generate a tuple ✭ u ❀ v ❀ 0 ✮ . ■ For each vertex v and for each pair of neighbors x ❀ z of v , generate a tuple ✭ x ❀ z ❀ 1 ✮ . ■ Presence of both 0 and 1 tuple for an edge is a triangle. 1 For ex, see. Fast Counting of Triangles in Large Real Networks without counting: Algorithms and Laws, ICDM 08, by C. Tsourakakis.

  16. SIROCCO Challenge ■ Problem: Given graph G ❂ ✭ V ❀ E ✮ , count the number of triangles. 1 ■ Solution: ■ For each edge ✭ u ❀ v ✮ , generate a tuple ✭ u ❀ v ❀ 0 ✮ . ■ For each vertex v and for each pair of neighbors x ❀ z of v , generate a tuple ✭ x ❀ z ❀ 1 ✮ . ■ Presence of both 0 and 1 tuple for an edge is a triangle. P i ✕ 3 ■ Solution: The number of triangles is i where ✕ i are 6 eigenvalues of adjacency matrix A of G in sorted order. ■ A 3 ii is the number of triangles involving i . ■ The trace is 6 times the number of triangles. ■ If ✕ is eigenvalue of A , ie., Ax ❂ ✕ x , then ✕ 3 is eigenvalue of A 3 . ■ In practice, computing top few eigenvalues suffices. 1 For ex, see. Fast Counting of Triangles in Large Real Networks without counting: Algorithms and Laws, ICDM 08, by C. Tsourakakis.

  17. ✂ ❁❁ ■ ✭ ✮ Eigenvalue Estimation A is a n ✂ n real valued matrix. ■ Lanczos method.

  18. Eigenvalue Estimation A is a n ✂ n real valued matrix. ■ Lanczos method. ■ Sketches. Ar for pseudo random n ✂ d vector r , d ❁❁ n . Will O ✭ nd ✮ sketch fit into one machine?

  19. Special Case Motivation: Logs processing. x = inputrecord; x-squared = x * x; aggregator: table sum; emit aggregator <- x-squared; MUD Algorithm m ❂ ✭✟ ❀ ✟ ❀ ✑ ✮ . ■ Local function ✟ ✿ ✝ ✦ Q maps input item to a message. ■ Aggregator ✟ ✿ Q ✂ Q ✦ Q maps two messages to a single message. ■ Post-processing operator ✑ ✿ Q ✦ ✝ produces the final output, applying m ❚ ✭ x ✮ . ■ Computes a function f if ✑ ✭ m ❚ ✭ ✁ ✮✮ ❂ f for all trees ❚ .

  20. MUD Examples ✟✭ x ✮ ❂ ❤ x ❀ x ✐ ✟ ✭ ❤ a 1 ❀ b 1 ✐ ❀ ❤ a 2 ❀ b 2 ✐ ✮ ❂ ❤ min ✭ a 1 ❀ a 2 ✮ ❀ max ✭ b 1 ❀ b 2 ✮ ✐ ✑ ✭ ❤ a ❀ b ✐ ✮ ❂ b � a Figure: mud algorithm for computing the total span (left)

  21. MUD Examples ✟✭ x ✮ ❂ ❤ x ❀ h ✭ x ✮ ❀ 1 ✐ ✟ ✭ ❤ a 1 ❀ h ✭ a 1 ✮ ❀ c 1 ✐ ❀ ❤ a 2 ❀ h ✭ a 2 ✮ ❀ c 2 ✐ ✮ ✭ ❤ a i ❀ h ✭ a i ✮ ❀ c i ✐ if h ✭ a i ✮ ❁ h ✭ a j ✮ = ❤ a 1 ❀ h ✭ a 1 ✮ ❀ c 1 ✰ c 2 ✐ otherwise ✑ ✭ ❤ a ❀ b ❀ c ✐ ✮ ❂ a if c ❂ 1 Figure: Mud algorithms for computing a uniform random sample of the unique items in a set (right). Here h is an approximate minwise hash function.

  22. Streaming ■ streaming algorithm s ❂ ✭ ✛❀ ✑ ✮ . ■ operator ✛ ✿ Q ✂ ✝ ✦ Q ■ ✑ ✿ Q ✦ ✝ converts the final state to the output. ■ On input x ✷ ✝ n , the streaming algorithm computes f ❂ ✑ ✭ s 0 ✭ x ✮✮ , where 0 is the starting state, and s q ✭ x ✮ ❂ ✛ ✭ ✛ ✭ ✿ ✿ ✿ ✛ ✭ ✛ ✭ q ❀ x 1 ✮ ❀ x 2 ✮ ❀ ✿ ✿ ✿ ❀ x k � 1 ✮ ❀ x k ✮ . ■ Communication complexity is log ❥ Q ❥

  23. ■ ■ ■ ❂ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✿ ✿ ✿ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✿ ✿ ✿ ❀ ✭ ❀ ❀ ❀ ✮ ✿ ❂ ✭ ✮ ❂ MUD vs Streaming ■ For a mud algorithm m ❂ ✭✟ ❀ ✟ ❀ ✑ ✮ , there is a streaming algorithm s ❂ ✭ ✛❀ ✑ ✮ of the same complexity with same output, by setting ✛ ✭ q ❀ x ✮ ❂ ✟ ✭ q ❀ ✟✭ x ✮✮ .

  24. ■ ■ ❂ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✿ ✿ ✿ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✿ ✿ ✿ ❀ ✭ ❀ ❀ ❀ ✮ ✿ ❂ ✭ ✮ ❂ MUD vs Streaming ■ For a mud algorithm m ❂ ✭✟ ❀ ✟ ❀ ✑ ✮ , there is a streaming algorithm s ❂ ✭ ✛❀ ✑ ✮ of the same complexity with same output, by setting ✛ ✭ q ❀ x ✮ ❂ ✟ ✭ q ❀ ✟✭ x ✮✮ . ■ Central question: Can MUD simulate streaming?

  25. ■ ❂ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✿ ✿ ✿ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✭ ❀ ❀ ❀ ✮ ❀ ✿ ✿ ✿ ❀ ✭ ❀ ❀ ❀ ✮ ✿ ❂ ✭ ✮ ❂ MUD vs Streaming ■ For a mud algorithm m ❂ ✭✟ ❀ ✟ ❀ ✑ ✮ , there is a streaming algorithm s ❂ ✭ ✛❀ ✑ ✮ of the same complexity with same output, by setting ✛ ✭ q ❀ x ✮ ❂ ✟ ✭ q ❀ ✟✭ x ✮✮ . ■ Central question: Can MUD simulate streaming? ■ Count the occurrences of the first odd number on the stream.

Recommend


More recommend