CS240A: Parallelism in CSE Applications Tao Yang Slides revised from James Demmel and Kathy Yelick www.cs.berkeley.edu/~demmel/cs267_Spr11 1
Category of CSE Simulation Applications discrete • Discrete event systems • Time and space are discrete • Particle systems • Important special case of lumped systems • Ordinary Differentiation Equations (ODEs) • Location/entities are discrete, time is continuous • Partial Differentiation Equations (PDEs) • Time and space are continuous continuous CS267 Lecture 4 2
Basic Kinds of CSE Simulation • Discrete event systems: • “ Game of Life, ” Manufacturing systems, Finance, Circuits, Pacman • Particle systems: • Billiard balls, Galaxies, Atoms, Circuits, Pinball … • Ordinary Differential Equations (ODEs), • Lumped variables depending on continuous parameters • system is “ lumped ” because we are not computing the voltage/current at every point in space along a wire, just endpoints • Structural mechanics, Chemical kinetics, Circuits, Star Wars: The Force Unleashed • Partial Differential Equations (PDEs) • Continuous variables depending on continuous parameters • Heat, Elasticity, Electrostatics, Finance, Circuits, Medical Image Analysis, Terminator 3: Rise of the Machines • For more on simulation in games, see • www.cs.berkeley.edu/b-cam/Papers/Parker-2009-RTD CS267 Lecture 4 3
Table of Cotent • ODE • PDE • Discrete Events and Particle Systems
Finite-Difference Method for ODE/PDE • Discretize domain of a function • For each point in the discretized domain, name it with a variable, setup equations. • The unknown values of those points form equations. Then solve these equations
Euler’s method for ODE Initial-Value Problems dy y f ( x , y ); y(x ) y 0 0 dx Straight line approximation y 0 x h x h x h x 0 1 2 3
Euler Method Approximate: y x ( y x h y x ) / h 0 0 Then: 2 ' y y h y O h n 1 n n 2 f( , ) y y h x y O h n 1 n n n Thus starting from an initial value y 0 error 2 f( , ) y y h x y with O h n 1 n n n
Example dy 1 0 x y y dx f( , ) ( ) y y h x y y h x y n 1 n n n n n n Exact Error h 0 . 02 x y n y' n hy' n Solution n 0 1.00000 1.00000 0.02000 1.00000 0.00000 0.02 1.02000 1.04000 0.02080 1.02040 -0.00040 0.04 1.04080 1.08080 0.02162 1.04162 -0.00082 0.06 1.06242 1.12242 0.02245 1.06367 -0.00126 0.08 1.08486 1.16486 0.02330 1.08657 -0.00171 0.1 1.10816 1.20816 0.02416 1.11034 -0.00218 0.12 1.13232 1.25232 0.02505 1.13499 -0.00267 0.14 1.15737 1.29737 0.02595 1.16055 -0.00318 0.16 1.18332 1.34332 0.02687 1.18702 -0.00370 0.18 1.21019 1.39019 0.02780 1.21443 -0.00425 0.2 1.23799 1.43799 0.02876 1.24281 -0.00482
ODE with boundary value 5 8 2 d u 1 d u u 0 2 2 dr r dr r u ( 5 ) 0 . 0038731 " , u ( 8 ) 0 . 0030769 " http://numericalmethods.eng.usf.edu 9
Solution Using the approximation of 2 dy y y d y y 2 y y i 1 i i 1 i 1 i 1 and 2 2 dx x dx 2 x Gives you u 2 u u u u u 1 i 1 i i 1 i 1 i 1 i 0 2 2 r 2 r r r i i 1 1 2 1 1 1 u u u 0 i 1 i i 1 2 2 2 2 2 r r 2 r r r r r r i i i http://numericalmethods.eng.usf.edu 10
Solution Cont Step 1 At node i 0 , r a 5 0 0 u 0 . 0038731 i 1 , r r r 5 0 . 6 5 . 6 " Step 2 At node 1 0 1 1 2 1 1 1 0 u u u 0 1 2 2 2 2 2 2 5 . 6 0 . 6 2 5 . 6 0 . 6 0 . 6 0 . 6 0 . 6 5 . 6 2 . 6290 u 5 . 5874 u 2 . 9266 u 0 0 1 2 r r r 5 . 6 0 . 6 6 . 2 Step 3 At node i 2 , 2 1 1 1 2 1 1 1 u u u 0 1 2 3 2 2 2 2 2 6 . 2 0 . 6 0 . 6 0 . 6 6 . 2 0 . 6 2 6 . 2 0 . 6 2 . 6434 u 5 . 5816 u 2 . 9122 u 0 1 2 3 http://numericalmethods.eng.usf.edu 11
Solution Cont Step 4 At node i 3 , r r r 6 . 2 0 . 6 6 . 8 3 2 1 1 2 1 1 1 u u u 0 2 3 4 2 2 2 2 2 6 . 8 0 . 6 2 6 . 8 0 . 6 0 . 6 0 . 6 6 . 8 0 . 6 2 . 6552 u 5 . 5772 u 2 . 9003 u 0 2 3 4 r r r 6 . 8 0 . 6 7 . 4 Step 5 At node i 4 , 4 3 1 1 2 1 1 1 0 u u u 3 4 5 2 2 2 2 2 7 . 4 0 . 6 2 7 . 4 0 . 6 0 . 6 0 . 6 0 . 6 7 . 4 2 . 6651 u 5 . 6062 u 2 . 8903 u 0 3 4 5 r r r 7 . 4 0 . 6 8 Step 6 At node i 5 , 5 4 u u / 0 . 0030769 b 5 r http://numericalmethods.eng.usf.edu 12
Solving system of equations 1 0 0 0 0 0 0 . 0038731 u 0 0 2 . 6290 5 . 5874 2 . 9266 0 0 0 u 1 0 0 2 . 6434 5 . 5816 2 . 9122 0 0 u 2 = 0 0 0 2 . 6552 5 . 5772 2 . 9003 0 u 3 0 0 0 0 2 . 6651 5 . 6062 2 . 8903 u 4 0 . 0030769 0 0 0 0 0 1 u 5 0 3 u 0 . 0038731 u 0 . 0032689 Graph and “ stencil ” 1 4 u 0 . 0036115 u 0 . 0031586 2 5 u 0 . 0034159 u 0 . 0030769 x x x http://numericalmethods.eng.usf.edu 13
Compressed Sparse Row (CSR) Format SpMV: y = y + A*x, only store, do arithmetic, on nonzero entries x Representation of A A y Matrix-vector multiply kernel: y (i) y (i) + A (i,j) × x (j) Matrix-vector multiply kernel: y (i) y (i) + A (i,j) × x (j) Matrix-vector multiply kernel: y (i) y (i) + A (i,j) × x (j) for each row i for each row i for k = ptr[i] to ptr[i + 1]-1 do for k = ptr[i] to ptr[i + 1]-1 do y[i] = y[i] + val[k] * x[ind[k]] y[i] = y[i] + val[k] * x[ind[k]] CS267 Lecture 4 15
Parallel Sparse Matrix-vector multiplication • y = A*x, where A is a sparse n x n matrix x P1 y P2 • Questions • which processors store P3 • y[i], x[i], and A[i,j] P4 • which processors compute • y[i] = sum (from 1 to n) A[i,j] * x[j] = (row i of A) * x … a sparse dot product • Partitioning May require • Partition index set {1,…,n} = N1 N2 … Np. communication • For all i in Nk, Processor k stores y[i], x[i], and row i of A • For all i in Nk, Processor k computes y[i] = (row i of A) * x • “ owner computes ” rule: Processor k compute the y[i]s it owns. CS267 Lecture 4 16
Matrix-processor mapping vs graph partitioning • Relationship between matrix and graph 1 2 3 4 5 6 1 1 1 1 3 2 4 2 1 1 1 1 3 1 1 1 4 1 1 1 1 1 5 1 1 1 1 5 6 6 1 1 1 1 • A “ good ” partition of the graph has • equal (weighted) number of nodes in each part (load and storage balance). • minimum number of edges crossing between (minimize communication). • Reorder the rows/columns by putting all nodes in one partition together. 02/09/2010 CS267 Lecture 7 17
Matrix Reordering via Graph Partitioning • “ Ideal ” matrix structure for parallelism: block diagonal • p (number of processors) blocks, can all be computed locally. • If no non-zeros outside these blocks, no communication needed • Can we reorder the rows/columns to get close to this? • Most nonzeros in diagonal blocks, few outside P0 P1 P2 P3 P4 P0 P1 P2 = * P3 P4 CS267 Lecture 4 18
Recommend
More recommend