matrix multiply in hadoop

Matrix Multiply in Hadoop Botong Huang and You Wu (Will) Content - PowerPoint PPT Presentation

Matrix Multiply in Hadoop Botong Huang and You Wu (Will) Content Dense Matrix Multiplication Previous Work Our Approach and Strategy Analysis Experiment Sparse Matrix Multiplication Our Approach Experiment Previous

  1. Matrix Multiply in Hadoop Botong Huang and You Wu (Will)

  2. Content • Dense Matrix Multiplication  Previous Work  Our Approach and Strategy  Analysis  Experiment • Sparse Matrix Multiplication  Our Approach  Experiment

  3. Previous Work in Dense Mtrx • Hama project -- a distributed scientific package based on Hadoop for massive matrix and graph data. “HAMA: An Efficient Matrix Computation with the MapReduce Framework”, IEEE 2010 CloudCom Workshop • “A MapReduce Algorithm for Matrix Multiplication”, John Norstad, Northwestern University

  4. Our Approach • Try to push the computation ahead into map phase without data preprocessing. Finish the task in one map/reduce job • Provide Mapper with information from two matrix files when generating the splits • Modified classes include: FileSplit, FileInputFormat and RecordReader

  5. Strategy 1 M: matrix size n: # of blocks per line/column N: # of physical map slots

  6. Strategy 2

  7. Strategy 3


  9. Comparing the Three Strategies Strategy 1 Strategy 2 Strategy 3 2M 2 n 2M 2 n M 2 n Mapper input traffic (total) 2M 2 /n 2 2M 2 /n M 2 /n Mapper input traffic (average) M 2 n M 2 M 2 n Shuffle traffic Computation per mapper 1 n n 3M 2 /n 2 2M 2 /n 2M 2 /n Memory per mapper n 3 n 2 n 2 Number of (logical) mappers

  10. Comparing the Three Strategies Strategy 1 Strategy 2 Strategy 3 N 1/3 N 1/2 N 1/2 n 2M 2 N 1/3 2M 2 N 1/2 M 2 N 1/2 Mapper input traffic (total) M 2 N -2/3 2M 2 N -1/2 M 2 N -1/2 Mapper input traffic (average) M 2 N 1/3 M 2 M 2 N 1/2 Shuffle traffic N 1/2 N 1/2 Computation per mapper 1 3M 2 N -2/3 2M 2 N -1/2 2M 2 N -1/2 Memory per mapper • Fix number of physical map slots = N


  12. Impact of block size on running time • 4 nodes – 12 map slots Running time (sec) 1600 1400 1200 M=1000 1000 M=2000 800 M=3000 M=4000 600 M=5000 400 200 0 Blocks 2 4 6 8 10

  13. Impact of block size on running time • 8 nodes – 24 map slots Running time ( sec ) 1600 1400 1200 M=1000 1000 M=2000 800 M=3000 M=4000 600 M=5000 400 200 0 Blocks 2 4 6 8 10

  14. Impact of block size on running time • 16 nodes – 48 map slots Running time (sec) 1600 1400 1200 M=1000 1000 M=2000 800 M=3000 M=4000 600 M=5000 400 200 0 Blocks 2 4 6 8 10

  15. Impact of map slots on running time Running time (sec) 1600 1400 1200 1000 1000 2000 800 3000 4000 600 5000 400 200 0 # of nodes 4 8 16

  16. Comparing the three strategies Running time (sec) 600 500 400 Strategy 1 300 Strategy 2 Strategy 3 200 100 0 M 1000 2000 3000 4000 5000

  17. Others • Comparing with existing work – Northwestern’s 2-job algorithm: an analogy to strategy 1 – Took them 2365s (40 mins) to multiply two 5000-by-5000 matrices, with 48 map/reduce slots – Took our program only 485s (8 mins) • Scalability – Took us 3916sec to multiply two 10k-by-10k matrices, with 48 map/reduce slots


  19. An Example Saved in file: 0 1 2 20 1 2 0 18 3 25 2 1 1 28 3 1 3 30

  20. Our Approach A B Each Mapper will be assigned some number of lines in A, so that the total number of non-zero values in those lines are about the same among different Mappers

  21. Experiment • We set the number of Mappers to be slightly smaller than the number of physical map slots. So the map phase could be done in one wave. Minimizing the overhead. • M log 2 M non-zero values

  22. Impact of Matrix Size on Running Time Running time (sec) 1600 1400 1200 1000 4nodes 8nodes 800 16nodes 600 400 200 Matrix size 0 100000 200000 300000 400000

  23. Future Work • Use more nodes to run the experiment to differentiate the performance between the three strategies • In Dense Mtrx Multiply, take the number of physical slots into account when generating splits. Finish the map phase in one wave. Minimize the overhead • Run experiment on real world data, both dense and sparse • More work could be done in sparse


More recommend