speeding up a finite element computation on gpu nelson
play

Speeding up a Finite Element Computation on GPU Nelson Inoue - PowerPoint PPT Presentation

Speeding up a Finite Element Computation on GPU Nelson Inoue Summary Introduction Finite element implementation on GPU Results Conclusions 2 University and Researchers Pontifical Catholic University of Rio de Janeiro


  1. Speeding up a Finite Element Computation on GPU Nelson Inoue

  2. Summary • Introduction • Finite element implementation on GPU • Results • Conclusions 2

  3. University and Researchers • Pontifical Catholic University of Rio de Janeiro – PUC- Rio • Group of Technology in Petroleum Engineering - GTEP • Research Team PhD Sergio Fontoura PhD Nelson Inoue PhD Carlos Emmanuel MSc Guilherme Righetto MSc Rafael Albuquerque Leader Researcher Senior Researcher Researcher Researcher Researcher 3

  4. Introduction • Research & Development (R&D) project with Petrobras • The project began in 2010 • The subject of the project is Reservoir Geomechanics • There are great interest by oil and gas industry in this subject • This subject is still little researched 4

  5. Introduction • What is Reservoir Geomechanics? – Branch of the petroleum engineering that studies the coupling between the problems of fluid flow and rock deformation (stress analysis) • Hydromechanical Coupling – Oil production causes rock deformation – Rock deformation contributes to oil production 5

  6. Motivation • Geomechanical effects during reservoir production 1. Surface subsidence 2. Bedding-parallel slip 3. Fault reactivation 4. Caprock integrity 5. Reservoir compaction 6

  7. Challenge • Evaluate geomechanical effects in a real reservoir • Overcome two major challenges 1. To use a reliable coupling scheme between fluid flow and stress analysis 2. To speed up the stress analysis (Finite Element Method) Finite Element Analysis spends most part of the simulation time 7

  8. Hydromechanical coupling • Theoretical Approach Coupling program flowchart 8

  9. Finite Element Method • Partial Differential Equations arise in the mathematical modelling of many engineering problems • Analytical solution or exact solution is very complicated • Alternative: Numerical Solution – Finite element method , finite difference method, finite volume method, boundary element method, discrete element method, etc. 9

  10. Finite Element Method • Finite element method (FEM) is widely applied in stress analysis • The domain is an assembly of finite elements (FEs) (http://www.mscsoftware.com/product/dytran) Finite Element Domain 10

  11. CHRONOS: FE Program • Chronos has been implemented on GPU CETUS Computer with 4 GPUs – Motivation : to reduce the simulation time in the hydromechanical analysis – Why to use GPU? Much more processing power CPU GPU 4 x GPUs >> 4 - 8 cores 2880 cores GeForce GTX Titan 11

  12. Motivation • GPU Features: (Cuda C Programming Guide) – Highly parallel, multithreaded and manycore processor – Tremendous computational horsepower and very high memory bandwidth Number of FLoating-point Operations Per Second Bandwidth 12

  13. Our Implementation • GPUs have good performance • We have developed and implemented an optimized and parallel finite element program on GPU • Programming Language CUDA is used to implement the finite element code • We have Implemented on GPU: – Assembly of the stiffness matrix – Solution of the system of linear equation – Evaluation of the strain state – Evaluation of the stress state 13

  14. Global Memory Access on GPU • Getting maximum performance on GPU Coalesced Access Sequential/Aligned Strided Random Good Not so good Bad – Memory accesses are fully coalesced as long as all threads in a warp access the same relative address 14

  15. Development on CPU • The assembly of the global stiffness matrix in the conventional FEM – Simple 1D problem – Element Stiffness Matrix a)         1 1 k k •   Element  1 1 11 12     k   1 1   k k Real model 21 22 b)         2 2 k k • 1 2 3 4   Element 2  2 11 12     k   Model discretization 2 2   k k 21 22 c) 1         3 3 k k   • Element  3 3  11 12    1 2 k 2   3 3   k k 21 22 1 2 3 • 1 2 Continuous model is discretized by elements Three Finite elements 15

  16. Development on CPU • In terms of CPU implementation For i=1 , i ≤ numel=3 i =1 i =2 i =3                     3 3   k k   2 2 Evaluate Element 1 1 k k   k k       3  11 12     2  11 12    k 1 11 12 k     k       element 3 3 element 2 2   Stiffness Matrix 1 1  k k  k k   k k 21 22 21 22 21 22                   1 1 1 1 k k 0 0 1 1 k k 0 0 k k 0 0  11 12  11 12  11 12                        Assembly Global   1 1 2 2 1 1 1 1 2 2   k k 0 0   k k k k 0    k k k k 0         21 22 11 12 21 22 11 12 21 22 k k k               Stiffness Matrix     global  global global 2 2 3 3 2 2 0 k k 0 0 k k k k 0 0 0 0    21 22  21 22 11 12           3 3      0 0 0 0  0 0 k k   0 0 0 0 21 22 – The Storage in the memory Memory access is not coalesced             element  1 1 1 1 i =1 k k k 0 0 k k 0 0 0 0 0 0 0 0 0 0 11 12 21 22                       1 1 1 1 2 1 1 1 i =2 k k k 0 0 k k k k 0 0 k k 0 0 0 0 0 element 11 12 21 22 11 12 21 22                                1 1 1 1 2 2 2 2 3 3 3 3 i =3 k k k 0 0 k k k k 0 0 k k k k 0 0 k k element 11 12 21 22 11 12 21 22 11 12 21 22 16

  17. Development on GPU • The assembly of the global stiffness matrix on GPU – Simple 1D problem – Each row of the global stiffness matrix         ] •      Node row 1 1 1 1 [ k ] [ k k k k 11 22 11 12 Real model •         ] Node    2 row 2 1 1 2 2 [ k ] [ k k k k 21 22 11 12 1 1 2 •         ] Node    3 row 3 2 2 3 3 [ k ] [ k k k k 21 22 11 12 2 1 1 2 3 •         ] Node      3 row 4 3 3 [ k ] [ k k k k 21 22 11 12 3 2 2 3 4 3 4 3 • Continuous model is discretized by nodes Four finite elements nodes 17

  18. Development on GPU • In terms of GPU implementation Thread = 1 Column = 1     ]   row 1 1 1 [ k ] [ 0 k k 11 12     0 Thread = 2 Thread = 1                ]      row 2 1 1 2 2 k 1 All the threads do the same calculation [ k ] [ k k k k k Thread = 2   21 22 11 12 global 21       2 k Thread = 3 Thread = 3   21       3           ] k    row 3 2 2 3 3 [ k ] [ k k k k 21 21 22 11 12 – The Storage in the memory Column =1                    1 2 3 k 0 k k k global 21 21 21 Thread = 1 Thread = 2 Thread = 3 The memory access is sequential and aligned 18

  19. Development on GPU • In terms of GPU implementation Thread = 1 Column = 2     ]   row 1 1 1 [ k ] [ 0 k k 11 12      1 0 k Thread = 1 Thread = 2  12                   ]  k 1 1 2   row 2 1 1 2 2  k k k  Thread = 2 [ k ] [ k k k k global 21 22 11 12 21 22 11         2 2 3 k k k Thread = 3   21 22 11 Thread = 3        3 3   k k         ]    row 3 2 2 3 3 [ k ] [ k k k k 21 22 21 22 11 12 – The Storage in the memory Memory access is coalesced Column =2                              1 2 3 1 1 2 2 3 3 k 0 k k k k k k k k k global 21 21 21 12 22 11 22 11 22 Thread = 1 Thread = 2 Thread = 3 19

  20. Development on GPU • Solution of the systems of linear equations Ax = b – Direct solver – Iterative Solver – A = stiffness matrix, x = nodal displacement vector (unknown values) and b = nodal force vector Conjugate Gradient Algorithm – A is a symmetric and positive-definite • It was chosen the Conjugate Gradient Method – Iterative algorithm – Parallelizable algorithm on GPU – The operations of a conjugate gradient algorithm is suitable to implement on GPU 20

Recommend


More recommend