Parallelizing Machine Learning- Functionally A F RAMEWORK and A BSTRACTIONS for Parallel Graph Processing Philipp H ALLER | Heather M ILLER Friday, June 3, 2011
Data is growing. At the same time, there is a growing desire to do with that data. MORE Friday, June 3, 2011
As an example, M ACHINE L EARNING (ML) ✗ ✗ has provided elegant and sophisticated solutions to many complex problems on a small scale, Friday, June 3, 2011
As an example, M ACHINE L EARNING (ML) ✗ ✗ has provided elegant and sophisticated solutions to many complex problems on a small scale, could open up N EW A PPLICATIONS + N EW A VENUES OF R ESEARCH if ported to a larger scale Friday, June 3, 2011
As an example, M ACHINE L EARNING (ML) ✗ ✗ has provided elegant and sophisticated solutions to many complex problems on a small scale, ✗ ✗ but efforts are routinely limited by complexity and running time of algorithms. SEQUENTIAL Friday, June 3, 2011
As an example, M ACHINE L EARNING (ML) ✗ ✗ has provided elegant and sophisticated solutions to many complex problems on a small scale, ✗ ✗ but efforts are routinely limited by complexity and running time of algorithms. SEQUENTIAL described as, a community full of “ ENTRENCHED PROCEDURAL PROGRAMMERS ” typically focus on optimizing sequential algorithms when faced with scaling problems. Friday, June 3, 2011
As an example, M ACHINE L EARNING (ML) ✗ ✗ has provided elegant and sophisticated solutions to many complex problems on a small scale, ✗ ✗ but efforts are routinely limited by complexity and running time of algorithms. SEQUENTIAL described as, a community full of “ ENTRENCHED PROCEDURAL PROGRAMMERS ” need to make it easier to typically focus on optimizing sequential algorithms when faced experiment with parallelism with scaling problems. Friday, June 3, 2011
What about MapReduce? Friday, June 3, 2011
What about MapReduce? Poor support for iteration. MapReduce instances must be chained together in order to achieve iteration. ✗ ✗ Not always straightforward. Even building non-cyclic pipelines is hard (e.g., FlumeJava, PLDI’10). ✗ Overhead is significant. ✗ Communication, serialization (e.g., Phoenix, IISWC’09). Friday, June 3, 2011
Menthor ... Friday, June 3, 2011
Menthor ... ✗ is a framework for parallel graph processing. ✗ (But it is not limited to graphs.) Friday, June 3, 2011
Menthor ... ✗ is a framework for parallel graph processing. ✗ (But it is not limited to graphs.) is inspired by BSP. ✗ ✗ With functional reduction/aggregation mechanisms. Friday, June 3, 2011
Menthor ... ✗ is a framework for parallel graph processing. ✗ (But it is not limited to graphs.) is inspired by BSP. ✗ ✗ With functional reduction/aggregation mechanisms. avoids an inversion of control ✗ ✗ of other BSP-inspired graph-processing frameworks. Friday, June 3, 2011
Menthor ... ✗ is a framework for parallel graph processing. ✗ (But it is not limited to graphs.) is inspired by BSP. ✗ ✗ With functional reduction/aggregation mechanisms. avoids an inversion of control ✗ ✗ of other BSP-inspired graph-processing frameworks. is implemented in Scala, ✗ ✗ and there is a preliminary experimental evaluation. Friday, June 3, 2011
Menthor’s Model of Computation. Friday, June 3, 2011
Data. Friday, June 3, 2011
Data. Split into data items managed by vertices. and sizes range from primitives to large matrices Friday, June 3, 2011
Data. Split into data items managed by vertices. Relationships expressed using edges between vertices. Friday, June 3, 2011
Algorithms. Friday, June 3, 2011
Algorithms. ✗ ✗ Data items stored inside of vertices iteratively updated. Friday, June 3, 2011
Algorithms. ✗ ✗ Data items stored inside of vertices iteratively updated. ✗ ✗ Iterations happen as S YNCHRONIZED S UPERSTEPS . (inspired by the BSP model) Friday, June 3, 2011
Algorithms. ✗ ✗ Data items stored inside of vertices iteratively updated. ✗ ✗ Iterations happen as S YNCHRONIZED S UPERSTEPS . time Friday, June 3, 2011
Algorithms. ✗ ✗ Data items stored inside of vertices iteratively updated. ✗ ✗ Iterations happen as S YNCHRONIZED S UPERSTEPS . def update 1. update each vertex in def update parallel. def update def update def update def update def update def update def update superstep #1 time Friday, June 3, 2011
Algorithms. ✗ ✗ Data items stored inside of vertices iteratively updated. ✗ ✗ Iterations happen as S YNCHRONIZED S UPERSTEPS . 1. update each vertex in parallel. 2. update produces outgoing messages to other vertices superstep #1 time Friday, June 3, 2011
Algorithms. ✗ ✗ Data items stored inside of vertices iteratively updated. ✗ ✗ Iterations happen as S YNCHRONIZED S UPERSTEPS . 1. update each vertex in parallel. 2. update produces outgoing messages to other vertices 3. incoming messages available at the beginning of the next S UPERSTEP . superstep #2 time Friday, June 3, 2011
Substeps. (and Messages) S UBSTEPS are computations that, Friday, June 3, 2011
Substeps. (and Messages) S UBSTEPS are computations that, 1. update the value of this Vertex Friday, June 3, 2011
Substeps. (and Messages) S UBSTEPS are computations that, 1. update the value of this Vertex 2. return a list of messages: case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data) Friday, June 3, 2011
Substeps. (and Messages) S UBSTEPS are computations that, 1. update the value of this Vertex 2. return a list of messages: case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data) E XAMPLES ... { value = ... List() } Friday, June 3, 2011
Substeps. (and Messages) S UBSTEPS are computations that, 1. update the value of this Vertex 2. return a list of messages: case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data) E XAMPLES ... { { ... value = ... for (nb <- neighbors) List() yield Message(this, nb, value) } } Friday, June 3, 2011
Substeps. (and Messages) S UBSTEPS are computations that, 1. update the value of this Vertex 2. return a list of messages: case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data) E XAMPLES ... { { Each is implicitly converted to a Substep[Data] ... value = ... for (nb <- neighbors) List() yield Message(this, nb, value) } } Friday, June 3, 2011
Some Examples... Friday, June 3, 2011
PageRank. class PageRankVertex extends Vertex[Double](0.0d) { def update() = { var sum = incoming.foldLeft(0)(_ + _.value) value = (0.15 / numVertices) + 0.85 * sum if (superstep < 30) { for (nb <- neighbors) yield Message(this, nb, value / neighbors.size) } else List() } } Friday, June 3, 2011
Another Example. class PhasedVertex extends Vertex[MyData] { var phase = 1 def update() = { if (phase == 1) { ... if (condition) phase = 2 } else if (phase == 2) { ... } } } Friday, June 3, 2011
Another Example. class PhasedVertex extends Vertex[MyData] { var phase = 1 I NVERSION OF C ONTROL !! def update() = { if (phase == 1) { T h u s , m a n u a l s t a c k ... m a n a g e m e n t . . . if (condition) phase = 2 } else if (phase == 2) { ... } } } Friday, June 3, 2011
Inverting the Inversion. ✗ ✗ Use high-level combinators to build expressions of type Substep[Data] class PhasedVertex extends Vertex[MyData] { def update() = { thenUntil(condition) { ... } then { ... } } } Friday, June 3, 2011
Inverting the Inversion. ✗ ✗ Use high-level combinators to build expressions of type Substep[Data] class PhasedVertex extends Vertex[MyData] { def update() = { thenUntil(condition) { ... } then { ... } } } Friday, June 3, 2011
Inverting the Inversion. ✗ ✗ Use high-level combinators to build expressions of type Substep[Data] ✗ ✗ Thus avoiding manual stack management. class PhasedVertex extends Vertex[MyData] { def update() = { thenUntil(condition) { ... } then { ... } } } Friday, June 3, 2011
Reduction Combinators: crunch steps. Friday, June 3, 2011
Reduction Combinators: crunch steps. ✗ ✗ Reduction operations important. Replacement for shared data. Global decisions. Friday, June 3, 2011
Reduction Combinators: crunch steps. ✗ ✗ Reduction operations important. Replacement for shared data. Global decisions. ✗ ✗ Provided as just another kind of Substep[Data] Friday, June 3, 2011
Reduction Combinators: crunch steps. ✗ ✗ Reduction operations important. def update() = { Replacement for shared data. then { value = ... Global decisions. } crunch ((v1: Double, v2: Double) => v1 + v2) then { ✗ ✗ Provided as just another kind of Substep[Data] incoming match { case List(reduced) => ... } } ... } Friday, June 3, 2011
Menthor’s Implementation Friday, June 3, 2011
Actors. Implementation based upon Actors. G RAPH Central G RAPH instance is an F OREMEN actor, which manages a set of W ORKER actors W ORKERS } } } } Friday, June 3, 2011
Recommend
More recommend