hta s
play

HTAs PROGRAMMING FOR PARALLELISM AND LOCALITY WITH PAPER PUBLISHED - PowerPoint PPT Presentation

HTAs PROGRAMMING FOR PARALLELISM AND LOCALITY WITH PAPER PUBLISHED AT PPOPP MARCH 2006 PRESENTATION BY ROMAN FRIGG Written at UIUC 1 , Universidade da Coruna 2 and IBM T.J. Watson Research Center 3 by 30 Ganesh Bikshandi 1 , Jia Guo, Daniel


  1. function C = cannon(A,B,C) for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); CANNON’S B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end ALGORITHM for k=1:m C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end 3 HTA OPERATIONS | 12 & APPLICATIONS

  2. function C = cannon(A,B,C) Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); CANNON’S B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end ALGORITHM for k=1:m C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end 3 HTA OPERATIONS | 12 & APPLICATIONS

  3. function C = cannon(A,B,C) Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); CANNON’S B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end ALGORITHM for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end 3 HTA OPERATIONS | 12 & APPLICATIONS

  4. Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 12 B 13 A 21 A 22 A 23 B 21 B 22 B 23 A 31 A 32 A 33 B 31 B 32 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  5. i=2 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 12 B 13 A 21 A 22 A 23 B 21 B 22 B 23 A 31 A 32 A 33 B 31 B 32 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  6. i=2 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 12 B 13 A 22 A 23 A 21 B 21 B 22 B 23 A 31 A 32 A 33 B 31 B 32 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  7. i=2 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 12 B 13 A 22 A 23 A 21 B 21 B 22 B 23 A 31 A 32 A 33 B 31 B 32 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  8. i=2 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 22 B 13 A 22 A 23 A 21 B 21 B 32 B 23 A 31 A 32 A 33 B 31 B 12 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  9. i=3 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 22 B 13 A 22 A 23 A 21 B 21 B 32 B 23 A 31 A 32 A 33 B 31 B 12 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  10. i=3 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 22 B 13 A 22 A 23 A 21 B 21 B 32 B 23 A 32 A 33 A 31 B 31 B 12 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  11. i=3 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 22 B 13 A 22 A 23 A 21 B 21 B 32 B 23 A 33 A 31 A 32 B 31 B 12 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  12. i=3 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 22 B 13 A 22 A 23 A 21 B 21 B 32 B 23 A 33 A 31 A 32 B 31 B 12 B 33 3 HTA OPERATIONS | 13 & APPLICATIONS

  13. i=3 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 22 B 23 A 22 A 23 A 21 B 21 B 32 B 33 A 33 A 31 A 32 B 31 B 12 B 13 3 HTA OPERATIONS | 13 & APPLICATIONS

  14. i=3 Initialization for i=2:m A{i,:} = circshift(A{i,:}, [0, -(i-1)]); B(:,i} = circshift(B{:,i}, [-(i-1), 0]); end A 11 A 12 A 13 B 11 B 22 B 33 A 22 A 23 A 21 B 21 B 32 B 13 A 33 A 31 A 32 B 31 B 12 B 23 3 HTA OPERATIONS | 13 & APPLICATIONS

  15. for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 11 A 12 A 13 C 11 C 12 C 13 B 11 B 22 B 33 A 22 A 23 A 21 C 21 C 22 C 23 B 21 B 32 B 13 A 33 A 31 A 32 C 31 C 32 C 33 B 31 B 12 B 23 3 HTA OPERATIONS | 14 & APPLICATIONS

  16. k=1 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 11 A 12 A 13 C 11 C 12 C 13 B 11 B 22 B 33 A 22 A 23 A 21 C 21 C 22 C 23 B 21 B 32 B 13 A 33 A 31 A 32 C 31 C 32 C 33 B 31 B 12 B 23 3 HTA OPERATIONS | 14 & APPLICATIONS

  17. k=1 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 11 A 12 A 13 C 11 C 12 C 13 B 11 B 22 B 33 A 22 A 23 A 21 C 21 C 22 C 23 B 21 B 32 B 13 A 33 A 31 A 32 C 31 C 32 C 33 B 31 B 12 B 23 3 HTA OPERATIONS | 14 & APPLICATIONS

  18. k=1 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 12 A 13 A 11 C 11 C 12 C 13 B 11 B 22 B 33 A 23 A 21 A 22 C 21 C 22 C 23 B 21 B 32 B 13 A 31 A 32 A 33 C 31 C 32 C 33 B 31 B 12 B 23 3 HTA OPERATIONS | 14 & APPLICATIONS

  19. k=1 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 12 A 13 A 11 C 11 C 12 C 13 B 11 B 22 B 33 A 23 A 21 A 22 C 21 C 22 C 23 B 21 B 32 B 13 A 31 A 32 A 33 C 31 C 32 C 33 B 31 B 12 B 23 3 HTA OPERATIONS | 14 & APPLICATIONS

  20. k=1 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 12 A 13 A 11 C 11 C 12 C 13 B 21 B 32 B 13 A 23 A 21 A 22 C 21 C 22 C 23 B 31 B 12 B 23 A 31 A 32 A 33 C 31 C 32 C 33 B 11 B 22 B 33 3 HTA OPERATIONS | 14 & APPLICATIONS

  21. k=2 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 12 A 13 A 11 C 11 C 12 C 13 B 21 B 32 B 13 A 23 A 21 A 22 C 21 C 22 C 23 B 31 B 12 B 23 A 31 A 32 A 33 C 31 C 32 C 33 B 11 B 22 B 33 3 HTA OPERATIONS | 14 & APPLICATIONS

  22. k=2 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 12 A 13 C 11 C 12 C 13 B 21 B 32 A 11 B 13 A 23 A 21 B 31 C 21 C 22 C 23 B 12 A 22 B 23 A 31 C 31 B 11 A 32 C 32 B 22 A 33 C 33 B 33 3 HTA OPERATIONS | 14 & APPLICATIONS

  23. k=2 for k=1:m Iteration C = C + A * B; A = circshift(A, [0, -1]); B = circshift(B, [-1, 0]); end A 12 A 13 A 11 C 11 C 12 C 13 B 21 B 32 B 13 A 23 A 21 A 22 C 21 C 22 C 23 B 31 B 12 B 23 A 31 A 32 A 33 C 31 C 32 C 33 B 11 B 22 B 33 3 HTA OPERATIONS | 14 & APPLICATIONS

  24. 3 INTRO 1 TALK HTA OPERATIONS & APPLICATIONS 5 OVERVIEW CONCLUSIONS HOW HTA’s 4 WORK 2 EVALUATION | 15

  25. NASA ADVANCED SUPERCOMPUTING BENCHMARK Nprocs EP (CLASS C) FT (CLASS B) CG (CLASS C) MG (CLASS B) LU (CLASS B) Fortran+ Matlab + Fortran + Matlab + Fortran + Matlab + Fortran + Matlab + Fortran + Matlab + MPI HTA MPI HTA MPI HTA MPI HTA MPI HTA 1 901.6 3556.9 136.8 657.4 3606.9 3812.0 26.9 828.0 15.7 245.1 4 273.1 888.8 109.1 274.0 362.0 1750.9 17.0 273.8 6.3 60.5 8 136.3 447.0 65.5 159.3 123.4 823.6 9.6 151.3 2.9 29.9 16 68.6 224.8 37.2 87.2 89.5 375.2 4.8 87.0 1.2 16.0 32 34.7 112.0 20.7 42.9 48.4 250.3 3.3 54.9 1.1 9.8 64 17.1 56.7 10.4 24.0 44.5 148.0 1.6 50.4 1.3 7.1 image source: paper 128 8.5 29.1 5.9 15.6 30.8 123.0 1.4 38.5 1.6 N/A able 1. Execution times in seconds for some of the applications in the NAS benchmarks for Fortran+MPI versus MATLAB +HTA. 4 | 16 EVALUATION

  26. NASA ADVANCED SUPERCOMPUTING BENCHMARK Nprocs EP (CLASS C) FT (CLASS B) CG (CLASS C) MG (CLASS B) LU (CLASS B) Fortran+ Matlab + Fortran + Matlab + Fortran + Matlab + Fortran + Matlab + Fortran + Matlab + MPI HTA MPI HTA MPI HTA MPI HTA MPI HTA 1 901.6 3556.9 136.8 657.4 3606.9 3812.0 26.9 828.0 15.7 245.1 4 273.1 888.8 109.1 274.0 362.0 1750.9 17.0 273.8 6.3 60.5 8 136.3 447.0 65.5 159.3 123.4 823.6 9.6 151.3 2.9 29.9 16 68.6 224.8 37.2 87.2 89.5 375.2 4.8 87.0 1.2 16.0 32 34.7 112.0 20.7 42.9 48.4 250.3 3.3 54.9 1.1 9.8 64 17.1 56.7 10.4 24.0 44.5 148.0 1.6 50.4 1.3 7.1 image source: paper 128 8.5 29.1 5.9 15.6 30.8 123.0 1.4 38.5 1.6 N/A able 1. Execution times in seconds for some of the applications in the NAS benchmarks for Fortran+MPI versus MATLAB +HTA. Too many numbers! 4 | 16 EVALUATION

  27. 128 3.2 GHz Intel Xeons, Gigabit Ethernet speedup factor Matlab+HTA Fortran+MPI 128 EP ebarassingly parallel 96 sequential speed 100 % 64 25 % 32 Matlab+HTA Fortran+MPI 4 0 0 32 64 96 128 | 17 # processors EVALUATION

  28. 128 3.2 GHz Intel Xeons, Gigabit Ethernet speedup factor Matlab+HTA Fortran+MPI 128 EP ebarassingly parallel LINEAR 96 SPEEDUP sequential speed 100 % 64 25 % 32 Matlab+HTA Fortran+MPI 4 0 0 32 64 96 128 | 17 # processors EVALUATION

  29. 128 3.2 GHz Intel Xeons, Gigabit Ethernet speedup factor Matlab+HTA Fortran+MPI 128 FFT fast fourier transform 96 sequential speed 100 % 64 21 % 32 Matlab+HTA Fortran+MPI 4 0 0 32 64 96 128 | 18 # processors EVALUATION

Recommend


More recommend