Parallelizing Machine Learning- Functionally A F RAMEWORK and A - - PowerPoint PPT Presentation

parallelizing
SMART_READER_LITE
LIVE PREVIEW

Parallelizing Machine Learning- Functionally A F RAMEWORK and A - - PowerPoint PPT Presentation

Parallelizing Machine Learning- Functionally A F RAMEWORK and A BSTRACTIONS for Parallel Graph Processing Philipp H ALLER | Heather M ILLER Friday, June 3, 2011 Data is growing. At the same time, there is a growing desire to do


slide-1
SLIDE 1

Philipp HALLER | Heather MILLER

Parallelizing

Machine Learning-

A FRAMEWORK and ABSTRACTIONS for Parallel Graph Processing

Functionally

Friday, June 3, 2011

slide-2
SLIDE 2

Data is growing.

At the same time,

do with that data.

there is a growing desire

to

MORE

Friday, June 3, 2011

slide-3
SLIDE 3

As an example,

MACHINE LEARNING (ML)

has provided elegant and sophisticated solutions to many complex problems on a small scale,

✗ ✗

Friday, June 3, 2011

slide-4
SLIDE 4

As an example,

MACHINE LEARNING (ML)

has provided elegant and sophisticated solutions to many complex problems on a small scale,

✗ ✗ could open up NEW APPLICATIONS + NEW AVENUES OF RESEARCH if ported to a larger scale

Friday, June 3, 2011

slide-5
SLIDE 5

As an example,

MACHINE LEARNING (ML)

has provided elegant and sophisticated solutions to many complex problems on a small scale,

✗ ✗

but efforts are routinely limited by complexity and running time of algorithms.

✗ ✗

SEQUENTIAL

Friday, June 3, 2011

slide-6
SLIDE 6

As an example,

MACHINE LEARNING (ML)

has provided elegant and sophisticated solutions to many complex problems on a small scale,

✗ ✗

but efforts are routinely limited by complexity and running time of algorithms.

✗ ✗

SEQUENTIAL

described as,

a community full of “ENTRENCHED PROCEDURAL PROGRAMMERS” typically focus on optimizing sequential algorithms when faced with scaling problems.

Friday, June 3, 2011

slide-7
SLIDE 7

As an example,

MACHINE LEARNING (ML)

has provided elegant and sophisticated solutions to many complex problems on a small scale,

✗ ✗

but efforts are routinely limited by complexity and running time of algorithms.

✗ ✗

SEQUENTIAL

described as,

a community full of “ENTRENCHED PROCEDURAL PROGRAMMERS” typically focus on optimizing sequential algorithms when faced with scaling problems.

need to make it easier to experiment with parallelism

Friday, June 3, 2011

slide-8
SLIDE 8

What about MapReduce?

Friday, June 3, 2011

slide-9
SLIDE 9

What about MapReduce?

MapReduce instances must be chained together in order to achieve iteration. Not always straightforward. Overhead is significant.

✗ ✗ ✗ ✗

Poor support for iteration.

Even building non-cyclic pipelines is hard (e.g., FlumeJava, PLDI’10). Communication, serialization (e.g., Phoenix, IISWC’09).

Friday, June 3, 2011

slide-10
SLIDE 10

Menthor...

Friday, June 3, 2011

slide-11
SLIDE 11

Menthor...

is a framework for parallel graph processing. ✗ ✗

(But it is not limited to graphs.)

Friday, June 3, 2011

slide-12
SLIDE 12

Menthor...

is a framework for parallel graph processing. is inspired by BSP. ✗ ✗ ✗ ✗

(But it is not limited to graphs.) With functional reduction/aggregation mechanisms.

Friday, June 3, 2011

slide-13
SLIDE 13

Menthor...

is a framework for parallel graph processing. is inspired by BSP. ✗ ✗ ✗ ✗

(But it is not limited to graphs.) With functional reduction/aggregation mechanisms.

avoids an inversion of control ✗ ✗

  • f other BSP-inspired graph-processing frameworks.

Friday, June 3, 2011

slide-14
SLIDE 14

Menthor...

is a framework for parallel graph processing. is inspired by BSP. ✗ ✗ ✗ ✗

(But it is not limited to graphs.) With functional reduction/aggregation mechanisms.

avoids an inversion of control ✗ ✗

  • f other BSP-inspired graph-processing frameworks.

is implemented in Scala, ✗ ✗

and there is a preliminary experimental evaluation.

Friday, June 3, 2011

slide-15
SLIDE 15

Model of Computation.

Menthor’s

Friday, June 3, 2011

slide-16
SLIDE 16

Data.

Friday, June 3, 2011

slide-17
SLIDE 17

Data.

Split into data items managed by vertices.

and sizes range from primitives to large matrices

Friday, June 3, 2011

slide-18
SLIDE 18

Data.

Split into data items managed by vertices. Relationships expressed using edges between vertices.

Friday, June 3, 2011

slide-19
SLIDE 19

Algorithms.

Friday, June 3, 2011

slide-20
SLIDE 20

Algorithms.

Data items stored inside of vertices iteratively updated.

✗ ✗

Friday, June 3, 2011

slide-21
SLIDE 21

Algorithms.

Data items stored inside of vertices iteratively updated. Iterations happen as SYNCHRONIZED SUPERSTEPS.

✗ ✗ ✗ ✗

(inspired by the BSP model)

Friday, June 3, 2011

slide-22
SLIDE 22

Algorithms.

Data items stored inside of vertices iteratively updated. Iterations happen as SYNCHRONIZED SUPERSTEPS.

✗ ✗ ✗ ✗

time

Friday, June 3, 2011

slide-23
SLIDE 23

Algorithms.

Data items stored inside of vertices iteratively updated. Iterations happen as SYNCHRONIZED SUPERSTEPS.

✗ ✗ ✗ ✗

1.

def update

update each vertex in parallel.

def update def update def update def update def update def update def update def update

time

superstep #1

Friday, June 3, 2011

slide-24
SLIDE 24

Algorithms.

Data items stored inside of vertices iteratively updated. Iterations happen as SYNCHRONIZED SUPERSTEPS.

✗ ✗ ✗ ✗

1. 2.

update each vertex in parallel. update produces

  • utgoing messages to
  • ther vertices

time

superstep #1

Friday, June 3, 2011

slide-25
SLIDE 25

Algorithms.

Data items stored inside of vertices iteratively updated. Iterations happen as SYNCHRONIZED SUPERSTEPS.

✗ ✗ ✗ ✗

1. 2. 3.

update each vertex in parallel. update produces

  • utgoing messages to
  • ther vertices

incoming messages available at the beginning of the next SUPERSTEP.

time

superstep #2

Friday, June 3, 2011

slide-26
SLIDE 26
  • Substeps. (and Messages)

SUBSTEPS are computations that,

Friday, June 3, 2011

slide-27
SLIDE 27
  • Substeps. (and Messages)

SUBSTEPS are computations that,

  • 1. update the value of this Vertex

Friday, June 3, 2011

slide-28
SLIDE 28
  • Substeps. (and Messages)

SUBSTEPS are computations that,

  • 1. update the value of this Vertex
  • 2. return a list of messages:

case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data)

Friday, June 3, 2011

slide-29
SLIDE 29
  • Substeps. (and Messages)

SUBSTEPS are computations that,

  • 1. update the value of this Vertex
  • 2. return a list of messages:

case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data)

EXAMPLES...

{ value = ... List() }

Friday, June 3, 2011

slide-30
SLIDE 30
  • Substeps. (and Messages)

SUBSTEPS are computations that,

  • 1. update the value of this Vertex
  • 2. return a list of messages:

case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data)

EXAMPLES...

{ value = ... List() } { ... for (nb <- neighbors) yield Message(this, nb, value) }

Friday, June 3, 2011

slide-31
SLIDE 31
  • Substeps. (and Messages)

SUBSTEPS are computations that,

  • 1. update the value of this Vertex
  • 2. return a list of messages:

case class Message[Data](source: Vertex[Data], dest: Vertex[Data], value: Data)

EXAMPLES...

{ value = ... List() } { ... for (nb <- neighbors) yield Message(this, nb, value) }

Each is implicitly converted to a Substep[Data]

Friday, June 3, 2011

slide-32
SLIDE 32

Some Examples...

Friday, June 3, 2011

slide-33
SLIDE 33

PageRank.

class PageRankVertex extends Vertex[Double](0.0d) { def update() = { var sum = incoming.foldLeft(0)(_ + _.value) value = (0.15 / numVertices) + 0.85 * sum if (superstep < 30) { for (nb <- neighbors) yield Message(this, nb, value / neighbors.size) } else List() } }

Friday, June 3, 2011

slide-34
SLIDE 34

Another Example.

class PhasedVertex extends Vertex[MyData] { var phase = 1 def update() = { if (phase == 1) { ... if (condition) phase = 2 } else if (phase == 2) { ... } } }

Friday, June 3, 2011

slide-35
SLIDE 35

Another Example.

class PhasedVertex extends Vertex[MyData] { var phase = 1 def update() = { if (phase == 1) { ... if (condition) phase = 2 } else if (phase == 2) { ... } } }

INVERSION OF CONTROL!!

T h u s , m a n u a l s t a c k m a n a g e m e n t . . .

Friday, June 3, 2011

slide-36
SLIDE 36

Inverting the Inversion.

class PhasedVertex extends Vertex[MyData] { def update() = { thenUntil(condition) { ... } then { ... } } }

Use high-level combinators to build expressions of type Substep[Data]

✗ ✗

Friday, June 3, 2011

slide-37
SLIDE 37

Inverting the Inversion.

class PhasedVertex extends Vertex[MyData] { def update() = { thenUntil(condition) { ... } then { ... } } }

Use high-level combinators to build expressions of type Substep[Data]

✗ ✗

Friday, June 3, 2011

slide-38
SLIDE 38

Inverting the Inversion.

class PhasedVertex extends Vertex[MyData] { def update() = { thenUntil(condition) { ... } then { ... } } }

Use high-level combinators to build expressions of type Substep[Data] Thus avoiding manual stack management.

✗ ✗ ✗ ✗

Friday, June 3, 2011

slide-39
SLIDE 39

Reduction Combinators:

crunch steps.

Friday, June 3, 2011

slide-40
SLIDE 40

Reduction Combinators:

crunch steps.

Reduction operations important.

✗ ✗

Replacement for shared data. Global decisions.

Friday, June 3, 2011

slide-41
SLIDE 41

Reduction Combinators:

crunch steps.

Reduction operations important.

✗ ✗

Replacement for shared data. Global decisions.

Provided as just another kind of Substep[Data]

✗ ✗

Friday, June 3, 2011

slide-42
SLIDE 42

Reduction Combinators:

crunch steps.

Reduction operations important.

✗ ✗

Replacement for shared data. Global decisions.

Provided as just another kind of Substep[Data]

✗ ✗

def update() = { then { value = ... } crunch ((v1: Double, v2: Double) => v1 + v2) then { incoming match { case List(reduced) => ... } } ... }

Friday, June 3, 2011

slide-43
SLIDE 43

Implementation

Menthor’s

Friday, June 3, 2011

slide-44
SLIDE 44

Actors.

Implementation based upon Actors.

} } } }

GRAPH WORKERS FOREMEN

Central GRAPH instance is an actor, which manages a set of WORKER actors

Friday, June 3, 2011

slide-45
SLIDE 45

Actors.

Implementation based upon Actors.

} } } }

GRAPH WORKERS FOREMEN

Central GRAPH instance is an actor, which manages a set of WORKER actors

Friday, June 3, 2011

slide-46
SLIDE 46

Actors.

Implementation based upon Actors.

} } } }

GRAPH WORKERS FOREMEN

Central GRAPH instance is an actor, which manages a set of WORKER actors GRAPH synchronizes workers using supersteps.

Friday, June 3, 2011

slide-47
SLIDE 47

Actors.

Implementation based upon Actors.

} } } }

GRAPH WORKERS FOREMEN

Each WORKER manages a partition of the graph’s vertices,

Deliver incoming messages that were sent in the previous superstep; Select and execute update step on each vertex in its partition; Forward outgoing messages generated by its vertices in the current superstep.

Friday, June 3, 2011

slide-48
SLIDE 48

Implementing Reduction.

} } } }

GRAPH WORKERS FOREMEN

Friday, June 3, 2011

slide-49
SLIDE 49

Implementing Reduction.

1.

WORKER reduces the values of all

vertices in its partition.

} } } }

GRAPH WORKERS FOREMEN

reduced✔ reduced✔ reduced✔ reduced✔

Friday, June 3, 2011

slide-50
SLIDE 50

Implementing Reduction.

1. 2.

WORKER reduces the values of all

vertices in its partition. The result and the closure that was used to compute it is sent to the GRAPH actor, which computes the final reduced value.

} } } }

GRAPH WORKERS FOREMEN

Friday, June 3, 2011

slide-51
SLIDE 51

Implementing Reduction.

1. 2. 3.

WORKER reduces the values of all

vertices in its partition. The result and the closure that was used to compute it is sent to the GRAPH actor, which computes the final reduced value. The final result is passed to all

WORKERS which make it available to

their vertices as incoming messages (at the beginning of the next superstep)

} } } }

GRAPH WORKERS FOREMEN

Friday, June 3, 2011

slide-52
SLIDE 52

Implementation Principles.

Friday, June 3, 2011

slide-53
SLIDE 53

Implementation Principles.

A pure Scala library

✗ ✗

No staging and code generation. No dependency on language virtualization.

Friday, June 3, 2011

slide-54
SLIDE 54

Implementation Principles.

A pure Scala library

✗ ✗

No staging and code generation. No dependency on language virtualization.

Benefits

✗ ✗

Compatible with mainline Scala compiler. Fast compilation. Simple debugging and troubleshooting. Framework developer-friendly.

Friday, June 3, 2011

slide-55
SLIDE 55

Implementation Principles.

A pure Scala library

✗ ✗

No staging and code generation. No dependency on language virtualization.

Benefits

✗ ✗

Compatible with mainline Scala compiler. Fast compilation. Simple debugging and troubleshooting. Framework developer-friendly.

Drawbacks

✗ ✗

No aggressive optimizations. No support for heterogeneous hardware platforms.

Friday, June 3, 2011

slide-56
SLIDE 56

GOOGLE’S PREGEL

CONTROL

Inverted

REQUIRES STAGING

OPTIMIZATIONS

Aggressive

Related Work.

GRAPHLAB SPARK

No graph support

Non-determinism

SIGNAL/COLLECT OPTIML

MAIN INSPIRATION

Graphs/BSP

ASYNC EXECUTION

Non-determinism

DEBUGGING

Not optimal, yet Designed for Iteration

Cluster support

Friday, June 3, 2011

slide-57
SLIDE 57

GOOGLE’S PREGEL

CONTROL

Inverted

REQUIRES STAGING

OPTIMIZATIONS

Aggressive

Related Work.

GRAPHLAB SPARK

No graph support

Non-determinism

Be sure to see their talk!

SIGNAL/COLLECT OPTIML

MAIN INSPIRATION

Graphs/BSP

ASYNC EXECUTION

Non-determinism

DEBUGGING

Not optimal, yet Designed for Iteration

Cluster support

Friday, June 3, 2011

slide-58
SLIDE 58

GOOGLE’S PREGEL

CONTROL

Inverted

REQUIRES STAGING

OPTIMIZATIONS

Aggressive

Related Work.

GRAPHLAB SPARK

No graph support

Non-determinism

Be sure to see their talk!

(Many more discussed in the paper.)

SIGNAL/COLLECT OPTIML

MAIN INSPIRATION

Graphs/BSP

ASYNC EXECUTION

Non-determinism

DEBUGGING

Not optimal, yet Designed for Iteration

Cluster support

Friday, June 3, 2011

slide-59
SLIDE 59

Conclusions

Friday, June 3, 2011

slide-60
SLIDE 60

Can avoid inversion of control in vertex-based BSP using closures. ✗ ✗

Conclusions

Friday, June 3, 2011

slide-61
SLIDE 61

Can avoid inversion of control in vertex-based BSP using closures. ✗ ✗

Conclusions

Higher-order functions useful for reductions, in an imperative model. ✗ ✗

Friday, June 3, 2011

slide-62
SLIDE 62

Can avoid inversion of control in vertex-based BSP using closures. ✗ ✗

Conclusions

Higher-order functions useful for reductions, in an imperative model. Explicit parallelism feasible if computational model simple (cf. MapReduce) ✗ ✗ ✗ ✗

Friday, June 3, 2011

slide-63
SLIDE 63

Can avoid inversion of control in vertex-based BSP using closures. ✗ ✗

Conclusions

Higher-order functions useful for reductions, in an imperative model. Explicit parallelism feasible if computational model simple (cf. MapReduce) The puzzle pieces are there to make analyzing bigger data easier. ✗ ✗ ✗ ✗ ✗ ✗

Friday, June 3, 2011

slide-64
SLIDE 64

Questions?

Can avoid inversion of control in vertex-based BSP using closures. ✗ ✗

Conclusions

Higher-order functions useful for reductions, in an imperative model. Explicit parallelism feasible if computational model simple (cf. MapReduce) The puzzle pieces are there to make analyzing bigger data easier. ✗ ✗ ✗ ✗ ✗ ✗

http://lamp.epfl.ch/~phaller/menthor/

Friday, June 3, 2011

slide-65
SLIDE 65

Experimental Results.

Applications

✗ ✗

PageRank on (subset of) Wikipedia Hierarchical clustering

Very preliminary results

✗ ✗

Implementation details changing Parallel collections (extensions) Evaluating BSP-based model Loopy belief propagation

Friday, June 3, 2011