Parallel Jacobian Accumulation Ebadollah Varnik Uwe Naumann RWTH Aachen University
Content Introduction Definitions Jacobian Accumulation Parallel Approach General Idea Data Race Problem Atomic Sub-Graphs Implementation Extended Jacobian Compressed Row Storage
Definitions R n =2 → I R m =2 with Consider the vector function f : I � y 0 � � exp(( v 0 · v 1 ) + sin( v 0 · v 1 )) � = f ( v ) = y 1 cos(( v 0 · v 1 ) + sin( v 0 · v 1 )) The code list of f is the following v 2 := v 0 · v 1 ; v 3 := sin( v 2 ); v 4 := v 2 + v 3 ; v 5 := exp( v 4 ); v 6 := cos( v 4 ); y 0 := v 5 ; y 1 := v 6 ;
Jacobian Accumulation f ′ by elimination of intermediate vertices v 4 , v 3 , v 2 : v 5 v 5 v 6 v 6 exp cos c 5 , 4 c 6 , 4 v 5 + v 6 � � c 5 , 3 � � � � �� �� v 4 � � � � �� �� c 5 , 4 · c 4 , 2 c 6 , 3 = c 4 , 3 c 6 , 2 c 5 , 1 c 5 , 2 �� �� �� �� elim( v 4 ) v 3 elim( v 3 ) �� �� sin �� �� v 3 c 4 , 2 c 5 , 0 �� �� �� �� c 6 , 1 elim( v 2 ) c 3 , 2 c 3 , 2 c 6 , 0 �� �� �� �� v 2 v 2 �� �� �� �� * c 2 , 1 =[ v 0 ] c 2 , 1 =[ v 0 ] c 2 , 0 c 2 , 0 v 0 v 1 v 1 v 1 v 0 v 0
General Idea (1) 1. Graph Decomposition into Sub-graphs G i ◮ local independent and dependent vertices 2. Parallel Vertex Elimination on Sub-graphs ◮ back-elimination of out-edges of local intermediate vertices 3. Main Focus is on ◮ Correctness → Data Race caused by out-of-range edges ◮ Load Balancing
General Idea (2) 11 12 13 12 13 11 3*2 G ′ 3 + * 18 48 10 + 9 9 10 * 3*3 3*4 (28) (10) c 9 , 7 c 10 , 7 c 10 , 8 c 9 , 4 c 9 , 5 c 9 , 4 G 2 sin c 10 , 6 8 2*3 7 * (8,7) c 10 , 5 c 10 , 4 c 7 , 4 c 7 , 5 c 8 , 6 24 54 2*4 3*4 4 5 6 4 5 6 (12) (26) G ′ 3*4 1 0 1 2 3 0 1 2 3
General Idea (3) Master Reduction Reduction Elimination Elimination Slave 1 Slave 2
Data Race Problem v 5 v 6 v 5 v 6 c 5 , 4 c 5 , 4 c 6 , 4 c 6 , 4 � � elim( v 4 ) � � � � v 4 v 4 t 2 � � � � c 4 , 3 c 4 , 3 c 4 , 2 �� �� �� �� �� �� �� �� v 3 v 3 c 4 , 2 �� �� �� �� t 3 c 3 , 2 �� �� �� �� elim( v 2 ) v 2 c 3 , 1 v 2 �� �� �� �� t 1 c 3 , 0 c 2 , 1 c 2 , 1 c 2 , 0 c 2 , 0 v 0 v 1 v 1 v 0
Atomic Sub-Graphs (1) v i v i v i v i 5 6 5 6 c i c i 5 , 4 6 , 4 � � c i v i c i 5 , 3 � � 6 , 3 4 � � c i � � c i 4 , 3 c i v i 5 , 2 � � 6 , 2 elim( v i �� �� v i 4 ) t i c i 3 4 , 2 3 �� �� c i 3 , 2 c i �� �� �� �� 3 , 2 v i v i �� �� �� �� 2 2 �� �� �� �� c i c i c i c i 2 , 1 2 , 1 2 , 0 2 , 0 v i v i v i v i 1 1 0 0 v 5 v 6 v 5 v 6 c 5 , 4 �� �� c 6 , 4 �� �� c 5 , 3 c 6 , 3 v 4 �� �� c 4 , 3 � � c 5 , 2 c 6 , 2 v 3 �� �� elim( v 4 ) � � v 3 t 1 c 4 , 2 �� �� c 3 , 2 c 3 , 2 � � v 2 � � � � v 2 � � � � c 2 , 1 c 2 , 0 c 2 , 0 c 2 , 1 v 1 v 0 v 0 v 1
Atomic Sub-Graphs (1) v i v i v i v i 5 6 5 6 c i c i 5 , 3 6 , 3 c i 5 , 1 �� �� v i c i c i �� �� 3 6 , 2 c i elim( v i 3 ) t i 5 , 2 5 , 0 c i c i 6 , 1 3 , 2 elim( v i 2 ) �� �� v i c i 2 �� �� 6 , 0 c i 2 , 0 v i v i v i v i 0 1 1 0 v 5 v 5 v 6 v 6 c 5 , 3 c 6 , 3 c 5 , 1 � � � � c 5 , 2 c 6 , 2 c 5 , 0 v 3 � � elim( v 3 ) c 6 , 1 t 1 c 3 , 2 elim( v 2 ) � � v 2 c 6 , 0 � � c 2 , 0 c 2 , 1 v 0 v 1 v 1 v 0
Atomic Code Example Overloaded function with atomic call: 1. void foo (int n, active [2] x) { 2. for (int i=0; i < n; i++) { 3. atomic(); 4. x[0] = exp( (x[0] ∗ x[1]) + sin(x[0] ∗ x[1]) ); 5. x[1] = cos( (x[0] ∗ x[1]) + sin(x[0] ∗ x[1]) ); 6. } 7. }
Implementation 1. Pattern Detection Mode ◮ Generation of Binary Pattern of C ′ by overloading ◮ Symbolic elimination on Binary Pattern for fill-in detection, ◮ Allocation of Compressed Row Storage CRS 2. Accumulation Mode ◮ Initialization of CRS by overloading ◮ Row Elimination on CRS ◮ Jacobian extraction.
Extended Jacobian The extended Jacobian C ′ of f is the following v 0 0 v 1 c 2 , 0 c 2 , 1 v 2 C ′ = 0 0 c 3 , 2 v 3 0 0 c 4 , 2 c 4 , 3 v 4 0 0 0 0 c 5 , 4 v 5 0 0 0 0 c 6 , 4 0 v 6 f ′ by elimination of intermediate rows v 4 , v 3 , v 2 : v 0 0 v 1 c 2 , 0 c 2 , 1 v 2 elim ( v 4 ) − → 0 0 c 3 , 2 v 3 0 0 0 0 v 4 0 0 c 5 , 4 . c 4 , 2 c 5 , 4 . c 4 , 3 0 v 5 0 0 c 6 , 4 · c 4 , 2 c 6 , 4 · c 4 , 3 0 0 v 6
Compressed Row Storage v 0 0 v 1 c 2 , 0 c 2 , 1 v 2 C ′ = 0 0 c 3 , 2 v 3 0 0 c 4 , 2 c 4 , 3 v 4 0 0 0 0 c 5 , 4 v 5 0 0 0 0 0 c 6 , 4 v 6 CRS scheme for C ′ with Fill-in: α =[ c 2 , 0 , c 2 , 1 , c 3 , 2 , c 4 , 2 , c 4 , 3 , 0 , 0 , 0 , 0 , c 5 , 4 , 0 , 0 , 0 , 0 , c 6 , 4 ] κ =[0 , 1 , 2 , 2 , 3 , 0 , 1 , 2 , 3 , 4 , 0 , 1 , 2 , 3 , 4] ρ =[ 0 , 0 0 2 3 5 9 , 13] , , , , , ���� ���� ���� ���� ���� ���� v 2 v 3 v 4 v 5 v 6 v 0 , v 1
Recommend
More recommend