Wea eak Memo emory y Be Beha havi vior ors s in in GPU PUs s Ap Appl plic ications ations Tyler Sorensen Supervisors: Alastair F. Donaldson and James Brotherston 15 July 2015 Imperial Concurrency Workshop 1
Overview • Current techniques for reasoning about GPU applications under weak memory models are limited to hand analysis 2
Overview • Current techniques for reasoning about GPU applications under weak memory models are limited to hand analysis • This is laborious, error prone, requires a formal model 3
Overview • Current techniques for reasoning about GPU applications under weak memory models are limited to hand analysis • This is laborious, error prone, requires a formal model • We propose e a n a new w methodo dolo logy y bas ased on stress ess an and fu fuzz z testin ting 4
Overview Add stressing/fuzzing hooks and postcondition GPU Annotated notated GPU application application • Run annotated application for many iterations and check for postcondition violations. 5
Overview • Buggy dot product routine 6
Overview • Buggy dot product routine • Running the program for 1 hour (~2 seconds per run) the number of failed postconditions are: No No stress ess Stress ess/fuzzing /fuzzing 0 7
Overview • Buggy dot product routine • Running the program for 1 hour (~2 seconds per run) the number of failed postconditions are: No No stress ess Stress ess/fuzzing /fuzzing 0 396 396 8
Roadmap • Background • Stress testing details • Results 9
Weak memory models • consider the test known as message passing (MP) 10
Weak memory models • consider the test known as message passing (MP) 11
Weak memory models • consider the test known as message passing (MP) 12
Weak memory models • consider the test known as message passing (MP) 13
Message passing (MP) test • T ests how to implement a handshake idiom Data Data 14
Message passing (MP) test • T ests how to implement a handshake idiom Flag Flag 15
Message passing (MP) test • T ests how to implement a handshake idiom Stale e Data 16
17
18
19
20
21
assertion this is known cannot as Lamport’s be satisfied by sequential interleavings consistency (or SC) 22
Weak memory models • can we assume assertion will never pass? 23
Weak memory models • can we assume assertion will never pass? No! 24
Weak memory models • Alglave et al. report this assertion passes 41 million times out of 5 billion test runs on T egra2 ARM processor 1 1 http://diy.inria.fr/cats/tables.html 25
Weak memory models • what happened? 26
Weak memory models • what happened? • architectures implement weak memory models where the hardware is allowed to re-order certain memory instructions. • weak memory models can allow weak behaviors (executions that do not correspond to an interleaving) 27
GPU programming Block 0 Block 1 Block n Within blocks, Threads threads are grouped into warps Shared memory Shared memory Shared memory for block 0 for block 1 for block n Global Memory 28
GPU programming Threads Global Memory 29
GPU programming Block 0 Block 1 Block n Threads Global Memory 30
GPU programming Block 0 Block 1 Block n Threads Shared memory Shared memory Shared memory for block 0 for block 1 for block n Global Memory 31
GPU programming Block 0 Block 1 Block n Within blocks, Threads threads are grouped into warps Shared memory Shared memory Shared memory for block 0 for block 1 for block n Global Memory 32
Roadmap • Background • Stress testing details • Results 33
GPU memory models • Previous work 1 showed that GPUs empirically have weak memory models. • Done using a tool which ran litmus tests on GPUs • Required heuristics for weak behaviors to appear 1 GPU concurrency: Weak behaviours and programming assumptions. ASPLOS ’15. 34
Litmus tests 35
Memory stress T0 T1 extra thread 1 extra thread n . . . . . run T0 run T1 loop: loop: test test read or write read or write program program to scratchpad to scratchpad 36
Memory stress T0 T1 extra thread 1 extra thread n . . . . . run T0 run T1 loop: loop: test test read or write read or write program program to scratchpad to scratchpad Memory 37
Memory stress T0 T1 extra thread 1 extra thread n . . . . . run T0 run T1 loop: loop: test test read or write read or write program program to scratchpad to scratchpad Memory X Y 38
Memory stress T0 T1 extra thread 1 extra thread n . . . . . run T0 run T1 loop: loop: test test read or write read or write program program to scratchpad to scratchpad Memory Scratch X Scratch Y Scratch 39
Memory stress • Can we extend memory stress for testing applications? 40
Memory stress block 0 block n extra block 0 extra block x . . . . . . . . . . Run Memory application stress Application Memory Scratchpad Memory 41
Memory stress block 0 block n extra block 0 extra block x . . . . . . . . . . Run Memory application stress Application memory Scratchpad Memory 42
Memory stress • Goal: design stress to reveal weak behaviors with no a priori knowledge about the application. Memory stress • We investigate using litmus tests, MP , SB, and LB 43
Memory stress Where to stres ess: s: 44
Memory stress Where to stres ess: s: X Y • For each distance D : 45
Memory stress Where to stres ess: s: X Y • For each distance D : 46
Memory stress Where to stres ess: s: X Y • For each distance D: 47
Memory stress Where to stres ess: s: X Y • For each distance D : 48
Memory stress Where to stres ess: s: X D Y • For each distance D : 49
Memory stress Where to stres ess: s: X D Y • For each distance D : • For each scratchpad location I: I 50
Memory stress Where to stres ess: s: X D Y • For each distance D : • For each scratchpad location I: I I 51
Memory stress Where to stres ess: s: X D Y • For each distance D : • For each scratchpad location I: I I 52
Memory stress Where to stres ess: s: • For each distance D : X D Y • For each scratchpad location I: I I • Run MP , SB, LB LB at at distan ance e D litmus us tests ts stressi ssing ng only locat atio ion n I I fo for 1000 0 iterat ratio ions ns 53
Memory stress 54
Memory stress Distance D 55
Memory stress X D Y Distance D 56
Memory stress Distance D Index I stressed 57
Memory stress I I Distance D Index I stressed 58
Memory stress Distance D Litmus test Index I stressed 59
Memory stress Vertical bar represents the magnitude of weak behaviors observed 60
Memory stress • Visualization samples 61
Memory stress • Visualization samples 62
Memory stress • Visualization samples 63
Memory stress • Visualization samples 64
Memory stress • What does this tell us? 65
Memory stress • What does this tell us? • T o reveal weak behaviors we only need to stress 1 in every 32 locations* • We call a contiguous region of 32 elements a pat atch *64 for some chips 66
Memory stress • How many patches can we effectively stress? • If D is unknown (as in applications), we would like to stress as many disjoint patches as possible 67
Memory stress • Scratchpad has size of 64 patches • We try stressing a randomly selected n patches for values 1 – 64 for n 68
69
Zoom in on first 8 70
71
Stressing 2 random patches is most effective 72
Memory stress • Now we have a memory stressing strategy! • Stress two random patches in the scratchpad • Patch size may change per chip 73
Roadmap • Background • Stress testing details • Results 74
Application N-body particle simulation in Lonestar GPU benchmark 1 1 see: http://iss.ices.utexas.edu/?p=projects/galois/lonestargpu 75
Application N-body particle simulation in Lonestar GPU benchmark 1 • Documented to have communication across blocks • No other information a priori needed for our testing • Post condition checks the final location of particles 1 see: http://iss.ices.utexas.edu/?p=projects/galois/lonestargpu 76
Application Executing the application for 1 hour (~2 seconds per run), the number of erroneous runs on a Quadro K5200: 77
Application Executing the application for 1 hour (~2 seconds per run), the number of erroneous runs on a Quadro K5200: No No stress ess With th stres ess 0 78
Application Executing the application for 1 hour (~2 seconds per run), the number of erroneous runs on a Quadro K5200: No No stress ess With th stres ess 0 48 48 79
Comparing stresses • Does it matter how we stress? • We compare our systematic stressing method to 2 other stressing strategies 80
Recommend
More recommend