Matrix Multiply in Hadoop Botong Huang and You Wu (Will)
Content • Dense Matrix Multiplication Previous Work Our Approach and Strategy Analysis Experiment • Sparse Matrix Multiplication Our Approach Experiment
Previous Work in Dense Mtrx • Hama project -- a distributed scientific package based on Hadoop for massive matrix and graph data. “HAMA: An Efficient Matrix Computation with the MapReduce Framework”, IEEE 2010 CloudCom Workshop • “A MapReduce Algorithm for Matrix Multiplication”, John Norstad, Northwestern University
Our Approach • Try to push the computation ahead into map phase without data preprocessing. Finish the task in one map/reduce job • Provide Mapper with information from two matrix files when generating the splits • Modified classes include: FileSplit, FileInputFormat and RecordReader
Strategy 1 M: matrix size n: # of blocks per line/column N: # of physical map slots
Strategy 2
Strategy 3
ANALYSIS
Comparing the Three Strategies Strategy 1 Strategy 2 Strategy 3 2M 2 n 2M 2 n M 2 n Mapper input traffic (total) 2M 2 /n 2 2M 2 /n M 2 /n Mapper input traffic (average) M 2 n M 2 M 2 n Shuffle traffic Computation per mapper 1 n n 3M 2 /n 2 2M 2 /n 2M 2 /n Memory per mapper n 3 n 2 n 2 Number of (logical) mappers
Comparing the Three Strategies Strategy 1 Strategy 2 Strategy 3 N 1/3 N 1/2 N 1/2 n 2M 2 N 1/3 2M 2 N 1/2 M 2 N 1/2 Mapper input traffic (total) M 2 N -2/3 2M 2 N -1/2 M 2 N -1/2 Mapper input traffic (average) M 2 N 1/3 M 2 M 2 N 1/2 Shuffle traffic N 1/2 N 1/2 Computation per mapper 1 3M 2 N -2/3 2M 2 N -1/2 2M 2 N -1/2 Memory per mapper • Fix number of physical map slots = N
EXPERIMENTS
Impact of block size on running time • 4 nodes – 12 map slots Running time (sec) 1600 1400 1200 M=1000 1000 M=2000 800 M=3000 M=4000 600 M=5000 400 200 0 Blocks 2 4 6 8 10
Impact of block size on running time • 8 nodes – 24 map slots Running time ( sec ) 1600 1400 1200 M=1000 1000 M=2000 800 M=3000 M=4000 600 M=5000 400 200 0 Blocks 2 4 6 8 10
Impact of block size on running time • 16 nodes – 48 map slots Running time (sec) 1600 1400 1200 M=1000 1000 M=2000 800 M=3000 M=4000 600 M=5000 400 200 0 Blocks 2 4 6 8 10
Impact of map slots on running time Running time (sec) 1600 1400 1200 1000 1000 2000 800 3000 4000 600 5000 400 200 0 # of nodes 4 8 16
Comparing the three strategies Running time (sec) 600 500 400 Strategy 1 300 Strategy 2 Strategy 3 200 100 0 M 1000 2000 3000 4000 5000
Others • Comparing with existing work – Northwestern’s 2-job algorithm: an analogy to strategy 1 – Took them 2365s (40 mins) to multiply two 5000-by-5000 matrices, with 48 map/reduce slots – Took our program only 485s (8 mins) • Scalability – Took us 3916sec to multiply two 10k-by-10k matrices, with 48 map/reduce slots
SPARSE MATRIX
An Example Saved in file: 0 1 2 20 1 2 0 18 3 25 2 1 1 28 3 1 3 30
Our Approach A B Each Mapper will be assigned some number of lines in A, so that the total number of non-zero values in those lines are about the same among different Mappers
Experiment • We set the number of Mappers to be slightly smaller than the number of physical map slots. So the map phase could be done in one wave. Minimizing the overhead. • M log 2 M non-zero values
Impact of Matrix Size on Running Time Running time (sec) 1600 1400 1200 1000 4nodes 8nodes 800 16nodes 600 400 200 Matrix size 0 100000 200000 300000 400000
Future Work • Use more nodes to run the experiment to differentiate the performance between the three strategies • In Dense Mtrx Multiply, take the number of physical slots into account when generating splits. Finish the map phase in one wave. Minimize the overhead • Run experiment on real world data, both dense and sparse • More work could be done in sparse
Recommend
More recommend