Motivation Gilbert Evaluation Conclusion Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems Till Rohrmann 1 Sebastian Schelter 2 Tilmann Rabl 2 Volker Markl 2 1 Apache Software Foundation 2 Technische Universität Berlin March 8, 2017 1 / 25
Motivation Gilbert Evaluation Conclusion Motivation 2 / 25
Motivation Gilbert Evaluation Conclusion Information Age Collected data grows exponentially Valuable information stored in data Need for scalable analytical methods 3 / 25
Motivation Gilbert Evaluation Conclusion Distributed Computing and Data Analytics Writing parallel algorithms is tedious and error-prone Huge existing code base in form of libraries Need for parallelization tool 4 / 25
Motivation Gilbert Evaluation Conclusion Requirements Linear algebra is lingua franca of analytics Parallelize programs automatically to simplify development Sparse operations to support sparse problems efficiently Goal Development of distributed sparse linear algebra system 5 / 25
Motivation Gilbert Evaluation Conclusion Gilbert 6 / 25
Motivation Gilbert Evaluation Conclusion Gilbert in a Nutshell 7 / 25
Motivation Gilbert Evaluation Conclusion System architecture 8 / 25
Motivation Gilbert Evaluation Conclusion Gilbert Language 1 A = rand (10 , 2 ) ; � language Subset of MATLAB R 2 B = eye ( 1 0 ) ; Support of basic linear algebra 3 A’ ∗ B; operations 4 f = @( x ) x . ^ 2 . 0 ; Fixpoint operator serves as side-effect 5 eps = 0 . 1 ; free loop abstraction 6 c = @(p , c ) norm (p − c , 2 ) < eps ; 7 f i x p o i n t (1/2 , f , 10 , c ) ; Expressive enough to implement a wide variety of machine learning algorithms 9 / 25
Motivation Gilbert Evaluation Conclusion Gilbert Typer Matlab is dynamically typed Dataflow systems require type knowledge at compile type Automatic type inference using the Hindley-Milner type inference algorithm Infer also matrix dimensions for optimizations 1 A = rand (10 , 2 ) : Matrix ( Double , 10 , 2) 2 B = eye ( 1 0 ) : Matrix ( Double , 10 , 10) 3 A’ ∗ B: Matrix ( Double , 2 , 10) 4 f = @( x ) x . ^ 2 . 0 : N − > N 5 eps = 0 . 1 : Double 6 c = @(p , c ) norm (p − c , 2 ) < eps : (N,N) − > Boolean 7 f i x p o i n t (1/2 , f , 10 , c ) : Double 10 / 25
Motivation Gilbert Evaluation Conclusion Intermediate Representation & Gilbert Optimizer Language independent representation of linear algebra programs Abstraction layer facilitates easy extension with new programming languages (such as R) Enables language independent optimizations Transpose push down Matrix multiplication re-ordering 11 / 25
Motivation Gilbert Evaluation Conclusion Distributed Matrices (a) Row partitioning (b) Quadratic block partitioning Which partitioning is better suited for matrix multiplications? n 2 √ n � n 3 � � � io _ cost row = O io _ cost block = O 12 / 25
Motivation Gilbert Evaluation Conclusion Distributed Operations: Addition Apache Flink and Apache Spark offer MapReduce-like API with additional operators: join , coGroup , cross 13 / 25
Motivation Gilbert Evaluation Conclusion Evaluation 14 / 25
Motivation Gilbert Evaluation Conclusion Gaussian Non-Negative Matrix Factorization Given V ∈ R d × w find W ∈ R d × t and H ∈ R t × w such that V ≈ WH Used in many fields: Computer vision, document clustering and topic modeling Efficient distributed implementation for MapReduce systems Algorithm H ← randomMatrix ( t , w ) W ← randomMatrix ( d , t ) while � V − WH � 2 > eps do H ← H · ( W T V / W T WH ) W ← W · ( VH T / WHH T ) end while 15 / 25
Motivation Gilbert Evaluation Conclusion Testing Setup Set t = 10 and w = 100000 V ∈ R d × 100000 with sparsity 0 . 001 Block size 500 × 500 Numbers of cores 64 Flink 1.1.2 & Spark 2.0.0 Gilbert implementation: 5 lines Distributed GNMF on Flink: 70 lines 1 V = rand ( $rows , 100000 , 0 , 1 , 0 . 0 0 1 ) ; 2 H = rand (10 , 100000 , 0 , 1 ) ; 3 W = rand ( $rows , 10 , 0 , 1 ) ; 4 nH = H. ∗ ( (W’ ∗ V) . / (W’ ∗ W ∗ H)) 5 nW = W. ∗ (V ∗ nH ’ ) . / (W ∗ nH ∗ nH ’ ) 16 / 25
Motivation Gilbert Evaluation Conclusion Gilbert Optimizations 300 Optimized Spark Optimized Flink Non-optimized Spark Execution time t in s Non-optimized Flink 200 100 0 10 3 10 4 Rows d of V 17 / 25
Motivation Gilbert Evaluation Conclusion Optimizations Explained Matrix updates H ← H · ( W T V / W T WH ) W ← W · ( VH T / WHH T ) Optimized matrix multiplications Non-optimized matrix multiplications ∈ R 10 × 100000 ∈ R 10 × 100000 � �� � � �� � � � � � W T W W T W H H � �� � � �� � ∈ R 10 × 10 ∈ R 10 × 10 ∈ R d × 10 ∈ R d × 10 � �� � � �� � � HH T � H T ( WH ) W � �� � � �� � ∈ R d × 100000 ∈ R 10 × 10 18 / 25
Motivation Gilbert Evaluation Conclusion GNMF Step: Scaling Problem Size Flink SP Flink Spark SP Spark Execution time t in s 10 2 Local 10 1 10 3 10 4 10 5 Number of rows of matrix V Distributed Gilbert execution handles much larger problem sizes than local execution Specialized implementation is slightly faster than Gilbert 19 / 25
Motivation Gilbert Evaluation Conclusion GNMF Step: Weak Scaling 60 Flink Spark Execution time t in s 40 20 0 10 0 10 1 10 2 Number of cores Both distributed backends show good weak scaling behaviour 20 / 25
Motivation Gilbert Evaluation Conclusion PageRank Ranking between entities with reciprocal quotations and references PR ( p j ) D ( p j ) + 1 − d � PR ( p i ) = d N p j ∈ L ( p i ) N - number of pages d - damping factor L ( p i ) - set of pages being linked by p i D ( p i ) - number of linked pages by p i M - transition matrix derived from adjacency matrix R = d · MR + 1 − d · ✶ N 21 / 25
Motivation Gilbert Evaluation Conclusion PageRank Implementation � MATLAB R Gilbert 1 i t = 10; 1 i t = 10; 2 d = sum (A, 2) ; 2 d = sum (A, 2) ; 3 M = ( diag (1 . / d ) ∗ A) ’ ; 3 M = ( diag (1 . / d ) ∗ A) ’ ; 4 r_0 = ones (n , 1) / n ; 4 r_0 = ones (n , 1) / n ; 5 e = ones (n , 1) / n ; 5 e = ones (n , 1) / n ; 6 i = 1: i t 6 f i x p o i n t ( r_0 , f o r 7 r = .85 ∗ M ∗ r + .15 ∗ e 7 @( r ) .85 ∗ M ∗ r + .15 ∗ e , 8 8 i t ) end 22 / 25
Motivation Gilbert Evaluation Conclusion PageRank: 10 Iterations 10 4 Spark Flink Execution time t in s SP Flink 10 3 SP Spark 10 2 10 1 10 4 10 5 Number of vertices n Gilbert backends show similar performance Specialized implementation faster because it can fuse operations 23 / 25
Motivation Gilbert Evaluation Conclusion Conclusion 24 / 25
Motivation Gilbert Evaluation Conclusion Conclusion Easy to use sparse linear algebra environment for people familiar with � MATLAB R Scales to data sizes exceeding a single computer High-level linear algebra optimizations improve runtime Slower than specialized implementations due to abstraction overhead 25 / 25
Recommend
More recommend