parameter tuning of a hybrid treecode fmm on gpus
play

Parameter Tuning of a Hybrid Treecode-FMM on GPUs Rio Yokota, Lorena - PowerPoint PPT Presentation

Parameter Tuning of a Hybrid Treecode-FMM on GPUs Rio Yokota, Lorena Barba Department of Mechanical Engineering, Boston University Saturday, June 4, 2011 Previous Calculations N=3x10 9 : 6 sec (Yokota & Barba) N=3x10 9 : 20 sec 40 TFlops


  1. Parameter Tuning of a Hybrid Treecode-FMM on GPUs Rio Yokota, Lorena Barba Department of Mechanical Engineering, Boston University Saturday, June 4, 2011

  2. Previous Calculations N=3x10 9 : 6 sec (Yokota & Barba) N=3x10 9 : 20 sec 40 TFlops FMM Turbulence (Nitadori & Hamada) 100 TFlops Nagasaki University Treecode Astrophysics Power Consumption : 100 kW Theoretical Peak : 700 TFlops GTX295 x 380 = 760 GPUs DEGIMA cluster Saturday, June 4, 2011

  3. Treecode & FMM Saturday, June 4, 2011

  4. Multipole expansion i 1 1 1 = * j � � 1 − x j − x ∗ x i − x j x i − x ∗ x i − x ∗ and Taylor expansion p − 1 1 � t k 1 − t = k =0 gives N 1 � do i = 1,N do k = 1,p x i − x j j =1 ff = 0 gg = 0 do j = 1,N do j = 1,N � p − 1 N � k � ff = ff+1/ ( x( i )-x( j ) ) gg = gg+( x( j ) - xs )**( k-1 ) 1 � x j − x ∗ � � end do end do = f( i ) = ff g(k) = gg x i − x ∗ x i − x ∗ end do end do j =1 k =0 do i = 1,N ff = 0   p − 1 N do k = 1,p   � � ( x i − x ∗ ) − k − 1 ( x j − x ∗ ) k ff = ff+( x( i )-xs )**( -k )*g( k-1 ) = end do end do   k =0 j =1 Saturday, June 4, 2011

  5. Error Control Treecode FMM Complexity Error O ( θ p ) O ( p 3 θ − 3 ) Saturday, June 4, 2011

  6. Error Optimization Optimize p Better on GPUs? Optimize θ Saturday, June 4, 2011

  7. Stack based hybrid treewalk target source push pop push pop push Saturday, June 4, 2011

  8. Parameter study Complexity Error O ( p 3 θ − 3 ) O ( θ p ) Saturday, June 4, 2011

  9. Optimum parameters Saturday, June 4, 2011

  10. Parallel calculation Nagasaki University DEGIMA 760 GTX295 GPUs Peak : 0.7 PFlops Tokyo Institute of Technology TSUBAME 2.0 4224 M2050 GPUs Peak : 2.4 PFlops Saturday, June 4, 2011

  11. Strong Scaling N=10 8 400 400 DEGIMA TSUBAME 2.0 350 350 300 300 tree construction 250 250 mpisendp2p time x N procs [s] time x N procs [s] mpisendm2l P2Pkernel 200 200 P2Mkernel M2Mkernel M2Lkernel L2Lkernel 150 150 L2Pkernel 100 100 50 50 0 0 1 2 4 8 16 32 64 128 256 512 1 2 4 8 16 32 64 128 256 512 N procs N procs Saturday, June 4, 2011

  12. Weak Scaling "& 0.5 PFlops TSUBAME 2.0 #$ #& :.--/30A4@.*3@;0A :;+-/54-39 678/30++*A;3=@;0A ($ C7D/30++*A;3=@;0A BCC/-?=>*=@;0A <03=>/-?=>*=@;0A (& $ & ! "# #$% #&!' )*+,-./01/2.03-44-4/567849 Saturday, June 4, 2011

  13. Summary & Outlook 1. The stack based treewalk enables a simple but effective hybridization of treecodes and FMMs 3. More tests need to be performed in the higher accuracy range and the overall performance must be compared to other treecodes and FMMs 2. The optimum p and are different on CPUs and GPUs, but this difference is small θ Saturday, June 4, 2011

Recommend


More recommend