parallelizing
play

Parallelizing Machine Learning- Functionally A F RAMEWORK and A - PowerPoint PPT Presentation

Parallelizing Machine Learning- Functionally A F RAMEWORK and A BSTRACTIONS for Parallel Graph Processing Philipp H ALLER | Heather M ILLER Friday, June 3, 2011 Data is growing. At the same time, there is a growing desire to do


  1. Parallelizing Machine Learning- Functionally A F RAMEWORK and A BSTRACTIONS for Parallel Graph Processing Philipp H ALLER | Heather M ILLER Friday, June 3, 2011

  2. Data is growing. At the same time, there is a growing desire to do with that data. MORE Friday, June 3, 2011

  3. As an example, M ACHINE L EARNING (ML) ✗ ✗ has provided elegant and sophisticated solutions to many complex problems on a small scale, Friday, June 3, 2011

  4. As an example, M ACHINE L EARNING (ML) ✗ ✗ has provided elegant and sophisticated solutions to many complex problems on a small scale, could open up N EW A PPLICATIONS + N EW A VENUES OF R ESEARCH if ported to a larger scale Friday, June 3, 2011

  5. As an example, M ACHINE L EARNING (ML) ✗ ✗ has provided elegant and sophisticated solutions to many complex problems on a small scale, ✗ ✗ but efforts are routinely limited by complexity and running time of algorithms. SEQUENTIAL Friday, June 3, 2011

  6. As an example, M ACHINE L EARNING (ML) ✗ ✗ has provided elegant and sophisticated solutions to many complex problems on a small scale, ✗ ✗ but efforts are routinely limited by complexity and running time of algorithms. SEQUENTIAL described as, a community full of “ ENTRENCHED PROCEDURAL PROGRAMMERS ” typically focus on optimizing sequential algorithms when faced with scaling problems. Friday, June 3, 2011

  7. As an example, M ACHINE L EARNING (ML) ✗ ✗ has provided elegant and sophisticated solutions to many complex problems on a small scale, ✗ ✗ but efforts are routinely limited by complexity and running time of algorithms. SEQUENTIAL described as, a community full of “ ENTRENCHED PROCEDURAL PROGRAMMERS ” need to make it easier to typically focus on optimizing sequential algorithms when faced experiment with parallelism with scaling problems. Friday, June 3, 2011

  8. What about MapReduce? Friday, June 3, 2011

  9. What about MapReduce? Poor support for iteration. MapReduce instances must be chained together in order to achieve iteration. ✗ ✗ Not always straightforward. Even building non-cyclic pipelines is hard (e.g., FlumeJava, PLDI’10). ✗ Overhead is significant. ✗ Communication, serialization (e.g., Phoenix, IISWC’09). Friday, June 3, 2011

  10. Menthor ... Friday, June 3, 2011

  11. Menthor ... ✗ is a framework for parallel graph processing. ✗ (But it is not limited to graphs.) Friday, June 3, 2011

  12. Menthor ... ✗ is a framework for parallel graph processing. ✗ (But it is not limited to graphs.) is inspired by BSP. ✗ ✗ With functional reduction/aggregation mechanisms. Friday, June 3, 2011

  13. Menthor ... ✗ is a framework for parallel graph processing. ✗ (But it is not limited to graphs.) is inspired by BSP. ✗ ✗ With functional reduction/aggregation mechanisms. avoids an inversion of control ✗ ✗ of other BSP-inspired graph-processing frameworks. Friday, June 3, 2011

  14. Menthor ... ✗ is a framework for parallel graph processing. ✗ (But it is not limited to graphs.) is inspired by BSP. ✗ ✗ With functional reduction/aggregation mechanisms. avoids an inversion of control ✗ ✗ of other BSP-inspired graph-processing frameworks. is implemented in Scala, ✗ ✗ and there is a preliminary experimental evaluation. Friday, June 3, 2011

  15. Menthor’s Model of Computation. Friday, June 3, 2011

  16. Data. Friday, June 3, 2011

  17. Data. Split into data items managed by vertices. and sizes range from primitives to large matrices Friday, June 3, 2011

  18. Data. Split into data items managed by vertices. Relationships expressed using edges between vertices. Friday, June 3, 2011

  19. Algorithms. Friday, June 3, 2011

  20. Algorithms. ✗ ✗ Data items stored inside of vertices iteratively updated. Friday, June 3, 2011

  21. Algorithms. ✗ ✗ Data items stored inside of vertices iteratively updated. ✗ ✗ Iterations happen as S YNCHRONIZED S UPERSTEPS . (inspired by the BSP model) Friday, June 3, 2011

  22. Algorithms. ✗ ✗ Data items stored inside of vertices iteratively updated. ✗ ✗ Iterations happen as S YNCHRONIZED S UPERSTEPS . time Friday, June 3, 2011

  23. Algorithms. ✗ ✗ Data items stored inside of vertices iteratively updated. ✗ ✗ Iterations happen as S YNCHRONIZED S UPERSTEPS . def update 1. update each vertex in def update parallel. def update def update def update def update def update def update def update superstep #1 time Friday, June 3, 2011

  24. Algorithms. ✗ ✗ Data items stored inside of vertices iteratively updated. ✗ ✗ Iterations happen as S YNCHRONIZED S UPERSTEPS . 1. update each vertex in parallel. 2. update produces outgoing messages to other vertices superstep #1 time Friday, June 3, 2011

  25. Algorithms. ✗ ✗ Data items stored inside of vertices iteratively updated. ✗ ✗ Iterations happen as S YNCHRONIZED S UPERSTEPS . 1. update each vertex in parallel. 2. update produces outgoing messages to other vertices 3. incoming messages available at the beginning of the next S UPERSTEP . superstep #2 time Friday, June 3, 2011

  26. Substeps. (and Messages) S UBSTEPS are computations that, Friday, June 3, 2011

  27. Substeps. (and Messages) S UBSTEPS are computations that, 1. update the value of this Vertex Friday, June 3, 2011

  28. Substeps. (and Messages) S UBSTEPS are computations that, 1. update the value of this Vertex 2. return a list of messages: case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data) Friday, June 3, 2011

  29. Substeps. (and Messages) S UBSTEPS are computations that, 1. update the value of this Vertex 2. return a list of messages: case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data) E XAMPLES ... { value = ... List() } Friday, June 3, 2011

  30. Substeps. (and Messages) S UBSTEPS are computations that, 1. update the value of this Vertex 2. return a list of messages: case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data) E XAMPLES ... { { ... value = ... for (nb <- neighbors) List() yield Message(this, nb, value) } } Friday, June 3, 2011

  31. Substeps. (and Messages) S UBSTEPS are computations that, 1. update the value of this Vertex 2. return a list of messages: case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data) E XAMPLES ... { { Each is implicitly converted to a Substep[Data] ... value = ... for (nb <- neighbors) List() yield Message(this, nb, value) } } Friday, June 3, 2011

  32. Some Examples... Friday, June 3, 2011

  33. PageRank. class PageRankVertex extends Vertex[Double](0.0d) { def update() = { var sum = incoming.foldLeft(0)(_ + _.value) value = (0.15 / numVertices) + 0.85 * sum if (superstep < 30) { for (nb <- neighbors) yield Message(this, nb, value / neighbors.size) } else List() } } Friday, June 3, 2011

  34. Another Example. class PhasedVertex extends Vertex[MyData] { var phase = 1 def update() = { if (phase == 1) { ... if (condition) phase = 2 } else if (phase == 2) { ... } } } Friday, June 3, 2011

  35. Another Example. class PhasedVertex extends Vertex[MyData] { var phase = 1 I NVERSION OF C ONTROL !! def update() = { if (phase == 1) { T h u s , m a n u a l s t a c k ... m a n a g e m e n t . . . if (condition) phase = 2 } else if (phase == 2) { ... } } } Friday, June 3, 2011

  36. Inverting the Inversion. ✗ ✗ Use high-level combinators to build expressions of type Substep[Data] class PhasedVertex extends Vertex[MyData] { def update() = { thenUntil(condition) { ... } then { ... } } } Friday, June 3, 2011

  37. Inverting the Inversion. ✗ ✗ Use high-level combinators to build expressions of type Substep[Data] class PhasedVertex extends Vertex[MyData] { def update() = { thenUntil(condition) { ... } then { ... } } } Friday, June 3, 2011

  38. Inverting the Inversion. ✗ ✗ Use high-level combinators to build expressions of type Substep[Data] ✗ ✗ Thus avoiding manual stack management. class PhasedVertex extends Vertex[MyData] { def update() = { thenUntil(condition) { ... } then { ... } } } Friday, June 3, 2011

  39. Reduction Combinators: crunch steps. Friday, June 3, 2011

  40. Reduction Combinators: crunch steps. ✗ ✗ Reduction operations important. Replacement for shared data. Global decisions. Friday, June 3, 2011

  41. Reduction Combinators: crunch steps. ✗ ✗ Reduction operations important. Replacement for shared data. Global decisions. ✗ ✗ Provided as just another kind of Substep[Data] Friday, June 3, 2011

  42. Reduction Combinators: crunch steps. ✗ ✗ Reduction operations important. def update() = { Replacement for shared data. then { value = ... Global decisions. } crunch ((v1: Double, v2: Double) => v1 + v2) then { ✗ ✗ Provided as just another kind of Substep[Data] incoming match { case List(reduced) => ... } } ... } Friday, June 3, 2011

  43. Menthor’s Implementation Friday, June 3, 2011

  44. Actors. Implementation based upon Actors. G RAPH Central G RAPH instance is an F OREMEN actor, which manages a set of W ORKER actors W ORKERS } } } } Friday, June 3, 2011

Recommend


More recommend