Algorithmic Frontiers of Modern Massively Parallel Computation - PowerPoint PPT Presentation

Algorithmic Frontiers of Modern Massively Parallel Computation Introduction Ashish Goel, Sergei Vassilvitskii, Grigory Yaroslavtsev June 14, 2015

Schedule 9:00 - 9:30 Introduction 9:30 - 10:15 Distributed Machine Learning (Nina Balcan) 10:15 - 11:00 Randomized Composable Coresets (Vahab Mirrokni) 11:00 - 11:30 Co ff ee Break 11:30 - 12:15 Algorithms for Graphs on V. Large Number of Nodes (Krzysztof Onak) 12:15 - 2:15 Lunch (on your own) 2:15 - 3:00 Massively Parallel Communication and Query Evaluation (Paul Beame) 3:00 - 3:30 Graph Clustering in a few Rounds (Ravi Kumar) 3:30 - 4:00 Co ff ee Break 4:00 - 4:45 Sample & Prune: For Submodular Optimization (Ben Moseley) 4:45 - 5:00 Conclusion & Discussion 2

Modern Parallelism (Practice) BigQuery Hadoop GraphLab `91 MPI Naiad S4 Pregel Pig MapReduce Storm Giraph Hive Spark 2010 `14 2005 Azure EC2 GCE Mahout *All dates approximate 3

Modern Parallelism (Theory) MPC(2) PRAM Key-Complexity `90 BSP Big Data IO-MR MUD MPC(1) MR MRC 2012 2015 2007 Coordinator `03 Congested Clique `00 Local * Plus Streaming, External Memory, and others 4

Bird’s Eye View – 0. Input is partitioned across many machines 5

Bird’s Eye View – 0. Input is partitioned across many machines Computation proceeds in synchronous rounds. In every round, every machine: – 1. Receives data – 2. Does local computation on the data it has – 3. Sends data out to others 6

Bird’s Eye View – 0. Input is partitioned across many machines Computation proceeds in synchronous rounds. In every round, every machine: – 1. Receives data – 2. Does local computation on the data it has – 3. Sends data out to others Success Measures: – Number of Rounds – Total work, speedup – Communication 7

Devil in the Details 0. Data partitioned across machines – Either randomly or arbitrarily – How many machines? – How much slack in the system? 8

Devil in the Details 0. Data partitioned across machines 1. Receive Data – How much data can be received? – Bounds on data received per link (from each machine) or in total. – Often called ‘memory,’ or ‘space.’ M, m, µ, s, n/p 1 − ✏ – Denoted by – Has emerged as an important parameter. – Lower and upper bounds with this as a parameter 9

Devil in the Details 0. Data partitioned across machines 1. Receive Data 2. Do local processing – Relatively uncontroversial 10

Devil in the Details 0. Data partitioned across machines 1. Receive Data 2. Do local processing 3. Send data to others – How much data to send? Limitations per link? per machine? For the whole system? – Which machines to send it to? Any? Limited topology? 11

Devil in the Details 0. Data partitioned across machines 1. Receive Data 2. Do local processing 3. Send data to others Di ff erent parameter settings lead to di ff erent models. ˜ – Receive , poly machines, all connected: PRAM O (1) – Receive, send unbounded, specific network topology: LOCAL ˜ ˜ – Receive , send , machines, specific topology: CONGEST O (1) O (1) n s = n/p 1 − ✏ p – Receive , machines, all connected: MPC(1) s = n 1 − ✏ n 1 − ✏ – Receive , machines, all connected: MRC – ... 12

Details: Success Metrics Number of Rounds: – Well established – Few (if any?) trade-o ff s on number of rounds vs. computation per round Work E ffi ciency – Important ! – See “Scalability! But at What COST? [McSherry, Isard, Murray `15] Communication – Matrix transpose -- linear communication yet very e ffi cient – Care more about skew, limited by input size 13

Consensus Emerging: Parameters: – Problem size : n – Per machine, per round input size : s Metric: – Number of rounds: r ( s, n ) – Ideal: - e.g. group by key O (1) – Sometimes : sorting, dense connectivity Θ (log s n ) – Less ideal : sparse connectivity O (poly log n ) 14

Simulations Theorem: Every round of an EREW PRAM Algorithm can be simulated with two rounds. – Direct extensions to CREW, CRCW Algorithms Proof Idea: – Divide the shared memory of the PRAM among the machines, and simulate updates. 15

Simulations (cont) Proof Idea: – Divide the shared memory of the PRAM among the machines. Perform computation in one round, update memory in next. Memory: 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 16

Simulations (cont) Proof Idea: – Have “memory” machines and “compute machines.” – Memory machines simulate PRAM’s shared memory – Compute machines update the state 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 – EREW PRAM: Every at most two outputs & inputs (one for memory, one for compute) 17

Simulations (cont) Proof Idea: – Have “memory” machines and “compute machines.” – Memory machines simulate PRAM’s shared memory – Compute machines update the state 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 – EREW PRAM: Every at most two outputs & inputs (one for memory, one for compute) 18

Simulations (cont) Proof Idea: – Have “memory” machines and “compute machines.” – Memory machines simulate PRAM’s shared memory – Compute machines update the state 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 – EREW PRAM: Every at most two outputs & inputs (one for memory, one for compute) 19

Simulations (cont) Proof Idea: – Have “memory” machines and “compute machines.” – Memory machines simulate PRAM’s shared memory – Compute machines update the state 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 – EREW PRAM: Every at most two outputs & inputs (one for memory, one for compute) 20

Simulations Theorem: Every round of an EREW PRAM Algorithm can be simulated with two rounds. – Direct extensions to CREW, CRCW Algorithms But, stronger than PRAMs. i X – Subset sum. Given an array , compute for all . B [ i ] = A [ j ] A i – Requires rounds in PRAM j =0 O (log n ) – Can be done in rounds with space O (log s n ) s 21

Algorithms One Technique: Coresets! – Reduce input size from to in parallel n s – Solve the problem in a single round on one machine Very Practical! – : Peta/Tetabytes n s ≈ √ n – : Giga/Megabytes Talks today about coresets for: – Clustering: k-means, k-median, k-center, correlation – Graph Problems: connectivity, matchings – Submodular Maximization 22

Lower Bounds Some progress! – Good bounds on what is computable in one round – Multi-round lower bounds for restricted models (talks today) Canonical problem: – Given a two-regular graph, decide if it is connected or not. – Best upper bounds for s = o ( n ) O (log n ) – Best lower bounds by circuit complexity reductions. Ω (log s n ) • To improve must take number of machines into consideration 23

Schedule 9:00 - 9:30 Introduction 9:30 - 10:15 Distributed Machine Learning (Nina Balcan) 10:15 - 11:00 Randomized Composable Coresets (Vahab Mirrokni) 11:00 - 11:30 Co ff ee Break 11:30 - 12:15 Algorithms for Graphs on V. Large Number of Nodes (Krzysztof Onak) 12:15 - 2:15 Lunch (on your own) 2:15 - 3:00 Massively Parallel Communication and Query Evaluation (Paul Beame) 3:00 - 3:30 Graph Clustering in a few Rounds (Ravi Kumar) 3:30 - 4:00 Co ff ee Break 4:00 - 4:45 Sample & Prune: For Submodular Optimization (Ben Moseley) 4:45 - 5:00 Conclusion & Discussion 24

References: Models BSP: Valiant. A bridging model for parallel computation. Communications ACM 1990. MUD: Feldman, Muthukrishnan, Sidiropoulos, Stein, Svitkina. On Distributing Symmetric Streaming Computations. ACM TALG 2010. MRC: Karlo ff , Suri, Vassilvitskii. A Model of Computation for MapReduce, SODA 2010. IO-MR: Goodrich, Sitchinava, Zhang. Sorting, Searching, and Simulation in the MapReduce Framework. ISAAC 2011. Key-Complexity: Goel, Munagala. Complexity Measures for MapReduce, and Comparison to Parallel Sorting. ArXiV 2012. MR: Pietracaprina, Pucci, Riondato, Silvestri, Upfal. Space Round Tradeo ff s for MapReduce Computations. ICS 2012 MPC(1): Beame, Koutris, Suciu. Communication Steps for Parallel Query Processing. PODS 2013. MPC(2): Andoni, Nikolov, Onak, Yaroslavtsev. Parallel Algorithms for Geometric Graph Problems. STOC 2014. Big Data: Klauck, Nanongkai, Pandurangan, Robinson. Distributed Computation of Large Scale Graph Problems. SODA 2015

Algorithmic Frontiers of Modern Massively Parallel Computation - PowerPoint PPT Presentation

Algorithmic Frontiers of Modern Massively Parallel Computation Introduction Ashish Goel, Sergei Vassilvitskii, Grigory Yaroslavtsev June 14, 2015 Schedule 9:00 - 9:30 Introduction 9:30 - 10:15 Distributed Machine Learning (Nina Balcan)

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

MODERN 1 MODERN 2 MODERN 3 MODERN 4 MODERN A peep at some distant orb has power to raise

Breaking the Linear-Memory Barrier in Massively Parallel Computing MIS on Trees with Strongly

Loosely Dependent Parallel Processes Complementary Paradigms Massively Parallel Task

PASCAL A Parallel Algorithmic SCALable Framework A Parallel Algorithmic SCALable Framework for

Massively Parallel A* Search on a GPU Yichao Zhou Jianyang Zeng Institute for Interdisciplinary

Massively Parallel Communication and Query Evaluation Paul Beame U. of Washington Based on

MPMPLAPACK: A Massively Parallel Multi-Precision Linear Algebra Package Jason Martin

Algorithmic Complexity Algorithmic Complexity "Algorithmic Complexity", also called

Scalable Parallel I/O Alternatives for Massively Parallel Partitioned Solver Systems Jing Fu,

Parallel Algorithms and CS260 Algorithmic Engineering Implementations Yihan Sun Algorithmic

Algorithmic Meta-Theorems for Restrictions of Treewidth Michael Lampis Computer Science Dept.

Algorithmic Aspects of Example: How to . . . Algorithmic Aspects of . . . Analysis, Prediction,

Treewidth reduction and algorithmic applications Treewidth reduction and algorithmic applications

Modern Risk Modern Risk Modern Risk Management Modern Risk Management anagement Concepts:

Massively Parallel Optimization on a Cluster Environment Stratis Ioannidis Data, Networks, and

Announcement Slides for Worship 9/20/20 Slide 1 Thank you for your donations for the fire

CHAPTER 9: REPORTING DEHCR BUREAU OF COMMUNITY DEVELOPMENT Required

Off Equatorial Analysis of Several Commonly Used Magnetic Field Models Student: Matthew Igel

RTCP for Feedback Storm Suppression draft-wu-avt-retransmission-supression-rtp-00 Qin Wu Frank

FAQs Quiz 1 Pseudocode should be interpretable as a MapReduce Your code should be

Growing Pains Describe a community you used to enjoy CS 278 | Stanford University | Michael

Because Advocacy Never Stops: New Tools for Taking Action August 13, 2015 WebJunction and

Meeting of the Board of Visitors Finance Committee September 17, 2015 Agenda I. CONSENT AGENDA

Algorithmic Frontiers of Modern Massively Parallel Computation - PowerPoint PPT Presentation

Algorithmic Frontiers of Modern Massively Parallel Computation Introduction Ashish Goel, Sergei Vassilvitskii, Grigory Yaroslavtsev June 14, 2015 Schedule 9:00 - 9:30 Introduction 9:30 - 10:15 Distributed Machine Learning (Nina Balcan)

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

MODERN 1 MODERN 2 MODERN 3 MODERN 4 MODERN A peep at some distant orb has power to raise

Breaking the Linear-Memory Barrier in Massively Parallel Computing MIS on Trees with Strongly

Loosely Dependent Parallel Processes Complementary Paradigms Massively Parallel Task

PASCAL A Parallel Algorithmic SCALable Framework A Parallel Algorithmic SCALable Framework for

Massively Parallel A* Search on a GPU Yichao Zhou Jianyang Zeng Institute for Interdisciplinary

Massively Parallel Communication and Query Evaluation Paul Beame U. of Washington Based on

MPMPLAPACK: A Massively Parallel Multi-Precision Linear Algebra Package Jason Martin

Algorithmic Complexity Algorithmic Complexity &quot;Algorithmic Complexity&quot;, also called

Scalable Parallel I/O Alternatives for Massively Parallel Partitioned Solver Systems Jing Fu,

Parallel Algorithms and CS260 Algorithmic Engineering Implementations Yihan Sun Algorithmic

Algorithmic Meta-Theorems for Restrictions of Treewidth Michael Lampis Computer Science Dept.

Algorithmic Aspects of Example: How to . . . Algorithmic Aspects of . . . Analysis, Prediction,

Treewidth reduction and algorithmic applications Treewidth reduction and algorithmic applications

Modern Risk Modern Risk Modern Risk Management Modern Risk Management anagement Concepts:

Massively Parallel Optimization on a Cluster Environment Stratis Ioannidis Data, Networks, and

Announcement Slides for Worship 9/20/20 Slide 1 Thank you for your donations for the fire

CHAPTER 9: REPORTING DEHCR BUREAU OF COMMUNITY DEVELOPMENT Required

Off Equatorial Analysis of Several Commonly Used Magnetic Field Models Student: Matthew Igel

RTCP for Feedback Storm Suppression draft-wu-avt-retransmission-supression-rtp-00 Qin Wu Frank

FAQs Quiz 1 Pseudocode should be interpretable as a MapReduce Your code should be

Growing Pains Describe a community you used to enjoy CS 278 | Stanford University | Michael

Because Advocacy Never Stops: New Tools for Taking Action August 13, 2015 WebJunction and

Meeting of the Board of Visitors Finance Committee September 17, 2015 Agenda I. CONSENT AGENDA

Algorithmic Complexity Algorithmic Complexity "Algorithmic Complexity", also called