1 18 straightforward parallelization of polynomial
play

1/18 Straightforward parallelization of polynomial multiplication - PowerPoint PPT Presentation

1/18 Straightforward parallelization of polynomial multiplication using parallel collections in Scala Raphal Jolly Databeans EOOPS 2013 Barcelona 2/18 Parallelization of symbolic computations * Numeric computations Several arithmetic


  1. 1/18 Straightforward parallelization of polynomial multiplication using parallel collections in Scala Raphaël Jolly Databeans EOOPS 2013 Barcelona

  2. 2/18 Parallelization of symbolic computations * Numeric computations Several arithmetic operations executed in parallel Linear algebra CPU-intensive * Symbolic computations : polynomials Same as above, and: Arithmetic operation itself is parallelized Multiplication, division, gcd Reduction, Gröbner bases (multivariate) CPU and memory-intensive (cache issues)

  3. 3/18 Polynomial multiplication Multivariate polynomials Distributive representation Product

  4. 4/18 Polynomial multiplication : sequential x 0 + x + + x 1 n y 0 * + y 1 * + + + + + + y m *

  5. 5/18 Polynomial multiplication : parallel x 0 + x + + x 1 n y 0 * + = y 1 * + = + = + = + = + = + = y m *

  6. 6/18 Polynomial multiplication : sequential type T = List[(Array[N], C)] def times(x: T, y: T) = (zero /: y) { (l, r) => val (a, b) = r l + multiply(x, a, b) }

  7. 6/18 Polynomial multiplication : sequential type T = List[(Array[N], C)] def times(x: T, y: T) = y.foldLeft(zero)({ (l, r) => val (a, b) = r l + multiply(x, a, b) })

  8. 6/18 Polynomial multiplication : sequential type T = List[(Array[N], C)] def times(x: T, y: T) = y.foldLeft(zero)({ (l, r) => val (a, b) = r l + multiply(x, a, b) }) def multiply(x: T, m: Array[N], c: C) = x.map { r => val (s, a) = r (s * m, a * c) } filter { r => val (_, a) = r !a.isZero }

  9. 7/18 Polynomial multiplication : parallel type T = List[(Array[N], C)] def times(x: T, y: T) = y.par.aggregate(zero)({ (l, r) => val (a, b) = r l + multiply(x, a, b) }, _ + _) def multiply(x: T, m: Array[N], c: C) = x.map { r => val (s, a) = r (s * m, a * c) } filter { r => val (_, a) = r !a.isZero }

  10. 7/18 Polynomial multiplication : parallel type T = List[(Array[N], C)] def times(x: T, y: T) = y.par.aggregate(zero)({ (l, r) => val (a, b) = r l + multiply(x, a, b) }, _ + _) def multiply(x: T, m: Array[N], c: C) = x.par.map { r => val (s, a) = r (s * m, a * c) } filter { r => val (_, a) = r !a.isZero }

  11. 8/18 Experimental setup Intel Atom D410 at 1.66Ghz with ((32K, 24K), 512K) cache Single core Hyper-threading Parallel timings should not be worse than sequential Could be eventually better (20 %) Further experiments need to be done on multicore hardware

  12. 9/18 Experimental setup Logical Logical Logical Logical processor 1 processor 1 Physical Physical Physical Physical processor 2 processor 2 processor 1 processor 1 processor 2 processor 2 Arch states Arch states Arch states Arch states Arch states Arch states Arch states Arch states (registers) (registers) (registers) (registers) (registers) (registers) (registers) (registers) ALU’s ALU’s ALU’s ALU’s ALU’s ALU’s Cache(s) Cache(s) Cache(s) Cache(s) Cache(s) Cache(s) System bus System bus System bus System bus Main memory Main memory Main memory Main memory Hyper-threading Dual-processor (Chen et al. Media Applications on Hyper-Threading Technology - Intel Technology Journal, Q1, 2002)

  13. 10/18 Test case Squaring a sparse polynomial with and sufficiently large : (Fateman, R. J. DRAFT: Comparing the speed of programs for sparse polynomial multiplication, 2002)

  14. 11/18 Test case : implementation import scas._ import Implicits.ZZ implicit val r = Polynomial(ZZ, 'x, 'y, 'z) val Array(x, y, z) = r.generators val p = 1 + x + y + z val q = pow(p, 20) val q1 = 1 + q val q2 = q * q1

  15. 12/18 Timings n seq par(2) speedup Timings 20 10 7 1.38 160 24 27 19 1.37 28 63 48 1.32 140 32 139 109 1.27 120 100 s econds 80 s eq par(2) 60 40 20 0 20 24 28 32 n

  16. 13/18 Fine-grained and exponential task splitting "stolen tasks are divided into exponentially smaller tasks until a threshold is reached and then handled sequentially starting from the smallest one, while tasks that came from the processor's own queue are handled sequentially straight away" (Prokopec, A.; Bawgell, P.; Rompf, T. & Odersky, M. On a Generic Parallel Collection Framework, 2011)

  17. 14/18 Collection base classes hierarchy Traversable Iterable Set Map Seq

  18. 14/18 Collection base classes hierarchy Traversable Iterable Collection Map Set Map Seq Set List

  19. 15/18 Traversable[A] def map[B, That](f: A => B): That def flatMap[B, That](f: A => GenTraversableOnce[B]): That def filter(p: A => Boolean): Traversable[A] def foreach[U](f: A => U): Unit def forall(p: A => Boolean): Boolean def exists(p: A => Boolean): Boolean def count(p: A => Boolean): Int def reduce[A1 >: A](op: (A1, A1) => A1): A1 def aggregate[B](z: B)(seqop: (B, A) => B, combop: (B, B) => B): B def sum[B >: A](implicit num: Numeric[B]): B def product[B >: A](implicit num: Numeric[B]): B def min[B >: A](implicit cmp: Ordering[B]): A def max[B >: A](implicit cmp: Ordering[B]): A

  20. 16/18 Other data structures (n = 20) structure seq par(2) par(1) 60 tree 17 tree.mutable 9 8 50 list 10 7 array 17 12 40 stream 19 40 48 30 s eq par(2) 20 par(1) 10 0 t y m e e l s e i a b a l r r a e t r t a r u t m s . e e r t

  21. 17/18 Data paralellism x 0 + x + + x 1 n y 0 * + = y 1 * + = + = + = + = + = + = y m *

  22. 18/18 Task paralellism x * 0 y + + + + + + + y y 1 m * * x x

  23. 18/18 Task paralellism x * 0 y + + + + + + + y y 1 m * * x x Thank you ! http://github.com/rjolly/scas

Recommend


More recommend