big data i graph processing distributed machine learning
play

Big Data I: Graph Processing, Distributed Machine Learning CS 240: - PowerPoint PPT Presentation

Big Data I: Graph Processing, Distributed Machine Learning CS 240: Computing Systems and Concurrency Lecture 21 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Selected content adapted from J.


  1. Big Data I: Graph Processing, Distributed Machine Learning CS 240: Computing Systems and Concurrency Lecture 21 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Selected content adapted from J. Gonzalez.

  2. Patient ate which Also sold contains Diagnoses Patient presents to abdominal purchased pain. from Diagnosis? with E. Coli infection

  3. Big Data is Everywhere 6 Billion 900 Million 72 Hours a Minute 28 Million Flickr Photos Facebook Users YouTube Wikipedia Pages • Machine learning is a reality • How will we design and implement “Big Learning” systems? 3

  4. We could use …. Threads, Locks, & Messages “Low-level parallel primitives”

  5. Shift Towards Use Of Parallelism in ML GPUs Multicore Clusters Clouds Supercomputers • Programmers repeatedly solve the same parallel design challenges: – Race conditions, distributed state, communication… • Resulting code is very specialized : – Difficult to maintain, extend, debug… Idea: Avoid these problems by using high-level abstractions 5

  6. ... a better answer: MapReduce / Hadoop Build learning algorithms on top of high-level parallel abstractions

  7. MapReduce – Map Phase 4 2 2 1 2 1 5 CPU 1 CPU 2 CPU 3 CPU 4 2 . . . . 3 3 8 9 Embarrassingly Parallel independent computation No Communication needed 7

  8. MapReduce – Map Phase 8 1 8 2 4 8 4 CPU 1 CPU 2 CPU 3 CPU 4 4 . . . . 3 4 4 1 1 4 2 2 2 2 1 5 . . . . 9 3 3 8 Image Features 8

  9. MapReduce – Map Phase 6 1 3 1 7 4 4 CPU 1 CPU 2 CPU 3 CPU 4 7 . . . . 5 9 3 5 8 1 8 1 2 4 2 2 4 8 4 2 4 2 1 5 . . . . . . . . 3 4 4 9 1 3 3 8 Embarrassingly Parallel independent computation 9

  10. MapReduce – Reduce Phase Outdoor Picture Indoor Statistics Picture Statistics 17 22 Outdoor Indoor 26 CPU 1 26 CPU 2 Pictures Pictures . . 31 26 1 2 1 4 8 6 2 1 1 2 8 3 2 4 7 2 4 7 1 8 4 5 4 4 . . . . . . . . . . . . 9 1 5 3 3 5 3 4 9 8 4 3 I O O I I I O O I O I I Image Features 10

  11. Map-Reduce for Data-Parallel ML • Excellent for large data-parallel tasks! Data-Parallel Graph-Parallel Is there more to Machine Learning Map Reduce ? Label Propagation Feature Algorithm Lasso Belief Extraction Tuning Kernel Propagation Methods Basic Data Processing Tensor PageRank Factorization Neural Deep Belief Networks Networks 11

  12. Exploiting Dependencies

  13. Graphs are Everywhere Collaborative Filtering Social Network Users Netflix Movies Probabilistic Analysis Text Analysis Wiki Docs Words

  14. Concrete Example Label Propagation

  15. Label Propagation Algorithm • Social Arithmetic: Sue Ann 50% What I list on my profile 80% Cameras 40% 40% Sue Ann Likes 20% Biking + 10% Carlos Like I Like: 60% Cameras, 40% Biking Profile 50% • Recurrence Algorithm: 50% Cameras Me 50% Biking ∑ Likes [ i ] = W ij × Likes [ j ] j ∈ Friends [ i ] Carlos – iterate until convergence 30% Cameras 10% 70% Biking • Parallelism: – Compute all Likes[i] in parallel

  16. Properties of Graph Parallel Algorithms Dependency Factored Iterative Graph Computation Computation What I Like What My Friends Like

  17. Map-Reduce for Data-Parallel ML • Excellent for large data-parallel tasks! Data-Parallel Graph-Parallel MapReduce MapReduce? Label Propagation Feature Algorithm Lasso Belief Extraction Tuning Kernel Propagation Methods Basic Data Processing Tensor PageRank Factorization Neural Deep Belief Networks Networks 17

  18. Problem: Data Dependencies • MapReduce doesn’t efficiently express data dependencies – User must code substantial data transformations – Costly data replication Independent Data Rows

  19. Iterative Algorithms • MR doesn’t efficiently express iterative algorithms: Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data Processor Slow CPU 2 CPU 2 CPU 2 Data Data Data Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data Barrier Barrier Barrier

  20. MapAbuse: Iterative MapReduce • Only a subset of data needs computation: Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data Barrier Barrier Barrier

  21. MapAbuse: Iterative MapReduce • System is not optimized for iteration: Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Startup Penalty Startup Penalty Startup Penalty Disk Penalty Disk Penalty Disk Penalty Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data

  22. ML Tasks Beyond Data-Parallelism Data-Parallel Graph-Parallel Map Reduce ? Feature Cross Graphical Models Semi-Supervised Extraction Validation Gibbs Sampling Learning Computing Sufficient Belief Propagation Label Propagation Statistics Variational Opt. CoEM Collaborative Graph Analysis Filtering PageRank Triangle Counting Tensor Factorization 22

  23. ML Tasks Beyond Data-Parallelism Data-Parallel Graph-Parallel Map Reduce Feature Cross Extraction Validation Pregel Computing Sufficient Statistics 23

  24. • Limited CPU Power • Limited Memory • Limited Scalability 24

  25. Distributed Cloud Scale up computational resources! Challenges: - Distribute state - Keep data consistent - Provide fault tolerance 25

  26. The GraphLab Framework Graph Based Update Functions Data Representation User Computation Consistency Model 26

  27. Data Graph Data is associated with both vertices and edges Graph: • Social Network Vertex Data: • User profile • Current interests estimates Edge Data: • Relationship (friend, classmate, relative) 27

  28. Distributed Data Graph Partition the graph across multiple machines: 28

  29. Distributed Data Graph • Ghost vertices maintain adjacency structure and replicate remote data. “ghost” vertices 29

  30. Distributed Data Graph • Cut efficiently using HPC Graph partitioning tools (ParMetis / Scotch / …) “ghost” vertices 30

  31. The GraphLab Framework Graph Based Update Functions Data Representation User Computation Consistency Model 31

  32. Update Function A user-defined program, applied to a vertex ; transforms data in scope of vertex Pagerank(scope){ // Update the current vertex data Update function applied (asynchronously) vertex.PageRank = α in parallel until convergence ForEach inPage: vertex.PageRank += (1 − α ) × inPage.PageRank Many schedulers available to prioritize computation // Reschedule Neighbors if needed if vertex.PageRank changes then reschedule_all_neighbors; } Selectively triggers computation at neighbors 32

  33. Distributed Scheduling Each machine maintains a schedule over the vertices it owns a f b a b c d c h g e f g i j k h i j Distributed Consensus used to identify completion 33

  34. Ensuring Race-Free Code • How much can computation overlap? 34

  35. The GraphLab Framework Graph Based Update Functions Data Representation User Computation Consistency Model 35

  36. PageRank Revisited Pagerank(scope) { vertex.PageRank = α ForEach inPage: vertex.PageRank += (1 − α ) × inPage.PageRank vertex.PageRank = tmp … } 36

  37. PageRank data races confound convergence 37

  38. Racing PageRank: Bug Pagerank(scope) { vertex.PageRank = α ForEach inPage: vertex.PageRank += (1 − α ) × inPage.PageRank vertex.PageRank = tmp … } 38

  39. Racing PageRank: Bug Fix Pagerank(scope) { tmp vertex.PageRank = α ForEach inPage: tmp vertex.PageRank += (1 − α ) × inPage.PageRank vertex.PageRank = tmp … } 39

  40. Throughput != Performance Higher Throughput (#updates/sec) No Consistency Potentially Slower Convergence of ML 40

  41. Serializability For every parallel execution , there exists a sequential execution of update functions which produces the same result. time CPU 1 Parallel CPU 2 Single Sequential CPU 41

  42. Serializability Example Write Stronger / Weaker consistency levels available Read User-tunable consistency levels trades off parallelism & consistency Overlapping regions are only read. Update functions one vertex apart can be run in parallel. Edge Consistency 42

  43. Distributed Consistency • Solution 1: Chromatic Engine – Edge Consistency via Graph Coloring • Solution 2: Distributed Locking

  44. Chromatic Distributed Engine Execute tasks Execute tasks on all vertices of on all vertices of color 0 color 0 Ghost Synchronization Completion + Barrier Time Execute tasks on all vertices of Execute tasks color 1 on all vertices of color 1 Ghost Synchronization Completion + Barrier 44

  45. Matrix Factorization • Netflix Collaborative Filtering – Alternating Least Squares Matrix Factorization Model: 0.5 million nodes, 99 million edges Users Movies Users Netflix D D Movies 45

Recommend


More recommend