automatic parallelisation of software using genetic
play

AUTOMATIC PARALLELISATION OF SOFTWARE USING GENETIC IMPROVEMENT - PowerPoint PPT Presentation

AUTOMATIC PARALLELISATION OF SOFTWARE USING GENETIC IMPROVEMENT Bobby R. Bruce INSPIRATION Samsung Galaxy S7 BOBBY R. BRUCE INSPIRATION Mali-T880 MP12 Samsung Galaxy S7 BOBBY R. BRUCE INSPIRATION Intel i7-2500K (overclocked to


  1. AUTOMATIC PARALLELISATION OF SOFTWARE USING GENETIC IMPROVEMENT Bobby R. Bruce

  2. INSPIRATION Samsung Galaxy S7 BOBBY R. BRUCE

  3. INSPIRATION Mali-T880 MP12 Samsung Galaxy S7 BOBBY R. BRUCE

  4. INSPIRATION Intel i7-2500K (overclocked to Mali-T880 MP12 5GHz) Samsung Galaxy S7 BOBBY R. BRUCE

  5. INSPIRATION Intel i7-2500K (overclocked to Mali-T880 MP12 5GHz) Samsung Galaxy S7 BOBBY R. BRUCE

  6. INSPIRATION Intel i7-2500K (overclocked to Mali-T880 MP12 5GHz) 70 GFLOPs Samsung Galaxy S7 BOBBY R. BRUCE

  7. INSPIRATION Intel i7-2500K (overclocked to Mali-T880 MP12 5GHz) 265.2 GFLOPs 70 GFLOPs Samsung Galaxy S7 BOBBY R. BRUCE

  8. INSPIRATION Intel i7-2500K (overclocked to nVidia GTX 1060 Mali-T880 MP12 5GHz) 4327 GFLOPs 265.2 GFLOPs 70 GFLOPs BOBBY R. BRUCE

  9. WHY DON’T WE UTILISE THIS POWERFUL HARDWARE? • Developers lack the skills • Hardware specialisation • Developers’ time is expensive; translating code to run on the GPU is expensive • Getting decent optimisation requires manual trial and error BOBBY R. BRUCE

  10. WHY DON’T WE UTILISE THIS POWERFUL HARDWARE? • Developers lack the skills • Hardware specialisation • Developers’ time is expensive; translating code to run on the GPU is expensive • Getting decent optimisation An Automated approach requires manual trial and would be ideal error BOBBY R. BRUCE

  11. BACKGROUND: WHAT’S CURRENTLY AVAILABLE? BOBBY R. BRUCE

  12. BACKGROUND: WHAT’S CURRENTLY AVAILABLE? Domain Pros Cons BOBBY R. BRUCE

  13. BACKGROUND: WHAT’S CURRENTLY AVAILABLE? Automatic Domain Parallelisation Compilers Does not require any skills, or Pros knowledge of, parallelisation Only targets very specific loops where Cons dependencies are fully understood BOBBY R. BRUCE

  14. BACKGROUND: WHAT’S CURRENTLY AVAILABLE? Automatic Domain Parallelisation CUDA/OpenCL Compilers Does not When require any implemented skills, or Pros well offers the knowledge of, best parallelisation performance Only targets Difficult to very specific learn, harder to loops where master. Cons dependencies are fully Very Manual understood BOBBY R. BRUCE

  15. BACKGROUND: WHAT’S CURRENTLY AVAILABLE? Automatic Domain Parallelisation CUDA/OpenCL Directive-based Compilers Does not When Considerably require any implemented easier to skills, or Pros well offers the implement. knowledge of, best parallelisation performance Only targets Difficult to Still requires very specific learn, harder to some skill, loops where master. practise, and Cons dependencies trial and error. are fully Very Manual understood BOBBY R. BRUCE

  16. BACKGROUND: WHAT’S CURRENTLY AVAILABLE? Automatic Domain Parallelisation CUDA/OpenCL Directive-based Compilers Does not When Considerably require any implemented easier to skills, or Pros well offers the implement. knowledge of, best parallelisation performance Only targets Difficult to Still requires very specific learn, harder to some skill, loops where master. practise, and Cons dependencies trial and error. are fully Very Manual understood BOBBY R. BRUCE

  17. BACKGROUND: WHAT’S CURRENTLY AVAILABLE? Automatic Domain Parallelisation CUDA/OpenCL Directive-based Compilers Does not When Considerably require any implemented easier to skills, or Pros well offers the implement. knowledge of, best parallelisation performance Only targets Difficult to Still requires very specific learn, harder to some skill, loops where master. practise, and Cons dependencies trial and error. are fully Very Manual understood BOBBY R. BRUCE

  18. BACKGROUND: WHAT’S CURRENTLY AVAILABLE? Automatic Domain Parallelisation CUDA/OpenCL Directive-based Compilers Does not When Considerably require any implemented easier to skills, or Pros well offers the implement. knowledge of, best parallelisation performance Only targets Difficult to Still requires very specific learn, harder to some skill, loops where master. practise, and Cons dependencies trial and error. are fully Very Manual understood BOBBY R. BRUCE

  19. BACKGROUND: OPENACC BOBBY R. BRUCE

  20. BACKGROUND: OPENACC BOBBY R. BRUCE

  21. BACKGROUND: OPENACC x20 Speed Up BOBBY R. BRUCE

  22. OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI BOBBY R. BRUCE

  23. OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI Creates Patch BOBBY R. BRUCE

  24. OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI Creates Patch CFG-GP BOBBY R. BRUCE

  25. OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI Creates Patch CFG-GP FITNESS FUNCTION Patch BOBBY R. BRUCE

  26. OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI Creates GRAMMAR Patch CFG-GP FITNESS FUNCTION Patch BOBBY R. BRUCE

  27. OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI OPENACC GRAMMAR Creates GRAMMAR Patch CFG-GP FITNESS FUNCTION Patch BOBBY R. BRUCE

  28. OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI PROGRAM OPENACC DATA GRAMMAR Creates GRAMMAR Patch CFG-GP FITNESS FUNCTION Patch BOBBY R. BRUCE

  29. OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI LEXICAL PROGRAM OPENACC SOURCE ANALYSER DATA CODE GRAMMAR Creates GRAMMAR Patch CFG-GP FITNESS FUNCTION Patch BOBBY R. BRUCE

  30. GRAMMAR <start> ::= <base> | <base> <start> <base> ::= "#pragma acc " <choice> <choice> ::= "loop "<private> <loop_line_number> <private> ::= "private(" <variables> ") " | " " <variables> ::= <variable> | <variable> "," <variables> <variable> ::= <variable_placeholder> <variable_placeholder> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" … BOBBY R. BRUCE

  31. GRAMMAR <start> ::= <base> | <base> <start> <base> ::= "#pragma acc " <choice> <choice> ::= "loop "<private> <loop_line_number> <private> ::= "private(" <variables> ") " | " " <variables> ::= <variable> | <variable> "," <variables> <variable> ::= <variable_placeholder> <variable_placeholder> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" … <loop_line_number> ::= "15@example1.c" | "145@example2.c" BOBBY R. BRUCE

  32. GRAMMAR <start> ::= <base> | <base> <start> <base> ::= "#pragma acc " <choice> <choice> ::= "loop "<private> <loop_line_number> <private> ::= "private(" <variables> ") " | " " <variables> ::= <variable> | <variable> "," <variables> <variable> ::= <variable_placeholder> <variable_placeholder> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" … <loop_line_number> ::= "15@example1.c" | "145@example2.c" #pragma acc loop private(1,2) 15@example1.c BOBBY R. BRUCE

  33. GRAMMAR <start> ::= <base> | <base> <start> <base> ::= "#pragma acc " <choice> <choice> ::= "loop "<private> <loop_line_number> <private> ::= "private(" <variables> ") " | " " <variables> ::= <variable> | <variable> "," <variables> <variable> ::= <variable_placeholder> <variable_placeholder> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" … <loop_line_number> ::= "15@example1.c" | "145@example2.c" BOBBY R. BRUCE

  34. GRAMMAR <start> ::= <base> | <base> <start> <base> ::= "#pragma acc " <choice> <choice> ::= "loop "<private> <loop_line_number> <private> ::= "private(" <variables> ") " | " " <variables> ::= <variable> | <variable> "," <variables> <variable> ::= <variable_placeholder> <variable_placeholder> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" … <loop_line_number> ::= "15@example1.c" | "145@example2.c" —- example1.c +++ example1.c @@ -15,0 +15,1 @@ + #pragma acc loop private(x,y) BOBBY R. BRUCE

  35. INITIAL INVESTIGATION • Chose to run a very small example as a sanity check • nVidia provide an n-body simulation example already containing OpenACC directives • These directives were stripped for openacc to replicate • Ran for 100 generations with population of 100

  36. RESULTS 350 300 250 Execution Time (ms) 200 150 100 50 0 sequential original gi_best BOBBY R. BRUCE

  37. Execution Time (ms) 11.6 11.8 12.0 12.2 12.4 12.6 12.8 13.0 original BOBBY R. BRUCE RESULTS gi_best

Recommend


More recommend