AUTOMATIC PARALLELISATION OF SOFTWARE USING GENETIC IMPROVEMENT Bobby R. Bruce
INSPIRATION Samsung Galaxy S7 BOBBY R. BRUCE
INSPIRATION Mali-T880 MP12 Samsung Galaxy S7 BOBBY R. BRUCE
INSPIRATION Intel i7-2500K (overclocked to Mali-T880 MP12 5GHz) Samsung Galaxy S7 BOBBY R. BRUCE
INSPIRATION Intel i7-2500K (overclocked to Mali-T880 MP12 5GHz) Samsung Galaxy S7 BOBBY R. BRUCE
INSPIRATION Intel i7-2500K (overclocked to Mali-T880 MP12 5GHz) 70 GFLOPs Samsung Galaxy S7 BOBBY R. BRUCE
INSPIRATION Intel i7-2500K (overclocked to Mali-T880 MP12 5GHz) 265.2 GFLOPs 70 GFLOPs Samsung Galaxy S7 BOBBY R. BRUCE
INSPIRATION Intel i7-2500K (overclocked to nVidia GTX 1060 Mali-T880 MP12 5GHz) 4327 GFLOPs 265.2 GFLOPs 70 GFLOPs BOBBY R. BRUCE
WHY DON’T WE UTILISE THIS POWERFUL HARDWARE? • Developers lack the skills • Hardware specialisation • Developers’ time is expensive; translating code to run on the GPU is expensive • Getting decent optimisation requires manual trial and error BOBBY R. BRUCE
WHY DON’T WE UTILISE THIS POWERFUL HARDWARE? • Developers lack the skills • Hardware specialisation • Developers’ time is expensive; translating code to run on the GPU is expensive • Getting decent optimisation An Automated approach requires manual trial and would be ideal error BOBBY R. BRUCE
BACKGROUND: WHAT’S CURRENTLY AVAILABLE? BOBBY R. BRUCE
BACKGROUND: WHAT’S CURRENTLY AVAILABLE? Domain Pros Cons BOBBY R. BRUCE
BACKGROUND: WHAT’S CURRENTLY AVAILABLE? Automatic Domain Parallelisation Compilers Does not require any skills, or Pros knowledge of, parallelisation Only targets very specific loops where Cons dependencies are fully understood BOBBY R. BRUCE
BACKGROUND: WHAT’S CURRENTLY AVAILABLE? Automatic Domain Parallelisation CUDA/OpenCL Compilers Does not When require any implemented skills, or Pros well offers the knowledge of, best parallelisation performance Only targets Difficult to very specific learn, harder to loops where master. Cons dependencies are fully Very Manual understood BOBBY R. BRUCE
BACKGROUND: WHAT’S CURRENTLY AVAILABLE? Automatic Domain Parallelisation CUDA/OpenCL Directive-based Compilers Does not When Considerably require any implemented easier to skills, or Pros well offers the implement. knowledge of, best parallelisation performance Only targets Difficult to Still requires very specific learn, harder to some skill, loops where master. practise, and Cons dependencies trial and error. are fully Very Manual understood BOBBY R. BRUCE
BACKGROUND: WHAT’S CURRENTLY AVAILABLE? Automatic Domain Parallelisation CUDA/OpenCL Directive-based Compilers Does not When Considerably require any implemented easier to skills, or Pros well offers the implement. knowledge of, best parallelisation performance Only targets Difficult to Still requires very specific learn, harder to some skill, loops where master. practise, and Cons dependencies trial and error. are fully Very Manual understood BOBBY R. BRUCE
BACKGROUND: WHAT’S CURRENTLY AVAILABLE? Automatic Domain Parallelisation CUDA/OpenCL Directive-based Compilers Does not When Considerably require any implemented easier to skills, or Pros well offers the implement. knowledge of, best parallelisation performance Only targets Difficult to Still requires very specific learn, harder to some skill, loops where master. practise, and Cons dependencies trial and error. are fully Very Manual understood BOBBY R. BRUCE
BACKGROUND: WHAT’S CURRENTLY AVAILABLE? Automatic Domain Parallelisation CUDA/OpenCL Directive-based Compilers Does not When Considerably require any implemented easier to skills, or Pros well offers the implement. knowledge of, best parallelisation performance Only targets Difficult to Still requires very specific learn, harder to some skill, loops where master. practise, and Cons dependencies trial and error. are fully Very Manual understood BOBBY R. BRUCE
BACKGROUND: OPENACC BOBBY R. BRUCE
BACKGROUND: OPENACC BOBBY R. BRUCE
BACKGROUND: OPENACC x20 Speed Up BOBBY R. BRUCE
OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI BOBBY R. BRUCE
OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI Creates Patch BOBBY R. BRUCE
OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI Creates Patch CFG-GP BOBBY R. BRUCE
OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI Creates Patch CFG-GP FITNESS FUNCTION Patch BOBBY R. BRUCE
OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI Creates GRAMMAR Patch CFG-GP FITNESS FUNCTION Patch BOBBY R. BRUCE
OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI OPENACC GRAMMAR Creates GRAMMAR Patch CFG-GP FITNESS FUNCTION Patch BOBBY R. BRUCE
OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI PROGRAM OPENACC DATA GRAMMAR Creates GRAMMAR Patch CFG-GP FITNESS FUNCTION Patch BOBBY R. BRUCE
OUR GOAL: AUTOMATICALLY ADD OPENACC DIRECTIVES OPENACC_GI LEXICAL PROGRAM OPENACC SOURCE ANALYSER DATA CODE GRAMMAR Creates GRAMMAR Patch CFG-GP FITNESS FUNCTION Patch BOBBY R. BRUCE
GRAMMAR <start> ::= <base> | <base> <start> <base> ::= "#pragma acc " <choice> <choice> ::= "loop "<private> <loop_line_number> <private> ::= "private(" <variables> ") " | " " <variables> ::= <variable> | <variable> "," <variables> <variable> ::= <variable_placeholder> <variable_placeholder> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" … BOBBY R. BRUCE
GRAMMAR <start> ::= <base> | <base> <start> <base> ::= "#pragma acc " <choice> <choice> ::= "loop "<private> <loop_line_number> <private> ::= "private(" <variables> ") " | " " <variables> ::= <variable> | <variable> "," <variables> <variable> ::= <variable_placeholder> <variable_placeholder> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" … <loop_line_number> ::= "15@example1.c" | "145@example2.c" BOBBY R. BRUCE
GRAMMAR <start> ::= <base> | <base> <start> <base> ::= "#pragma acc " <choice> <choice> ::= "loop "<private> <loop_line_number> <private> ::= "private(" <variables> ") " | " " <variables> ::= <variable> | <variable> "," <variables> <variable> ::= <variable_placeholder> <variable_placeholder> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" … <loop_line_number> ::= "15@example1.c" | "145@example2.c" #pragma acc loop private(1,2) 15@example1.c BOBBY R. BRUCE
GRAMMAR <start> ::= <base> | <base> <start> <base> ::= "#pragma acc " <choice> <choice> ::= "loop "<private> <loop_line_number> <private> ::= "private(" <variables> ") " | " " <variables> ::= <variable> | <variable> "," <variables> <variable> ::= <variable_placeholder> <variable_placeholder> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" … <loop_line_number> ::= "15@example1.c" | "145@example2.c" BOBBY R. BRUCE
GRAMMAR <start> ::= <base> | <base> <start> <base> ::= "#pragma acc " <choice> <choice> ::= "loop "<private> <loop_line_number> <private> ::= "private(" <variables> ") " | " " <variables> ::= <variable> | <variable> "," <variables> <variable> ::= <variable_placeholder> <variable_placeholder> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" … <loop_line_number> ::= "15@example1.c" | "145@example2.c" —- example1.c +++ example1.c @@ -15,0 +15,1 @@ + #pragma acc loop private(x,y) BOBBY R. BRUCE
INITIAL INVESTIGATION • Chose to run a very small example as a sanity check • nVidia provide an n-body simulation example already containing OpenACC directives • These directives were stripped for openacc to replicate • Ran for 100 generations with population of 100
RESULTS 350 300 250 Execution Time (ms) 200 150 100 50 0 sequential original gi_best BOBBY R. BRUCE
Execution Time (ms) 11.6 11.8 12.0 12.2 12.4 12.6 12.8 13.0 original BOBBY R. BRUCE RESULTS gi_best
Recommend
More recommend