GPU PUs s Ap Appl plic ications ations Tyler Sorensen - PowerPoint PPT Presentation

Wea eak Memo emory y Be Beha havi vior ors s in in GPU PUs s Ap Appl plic ications ations Tyler Sorensen Supervisors: Alastair F. Donaldson and James Brotherston 15 July 2015 Imperial Concurrency Workshop 1

Overview • Current techniques for reasoning about GPU applications under weak memory models are limited to hand analysis 2

Overview • Current techniques for reasoning about GPU applications under weak memory models are limited to hand analysis • This is laborious, error prone, requires a formal model 3

Overview • Current techniques for reasoning about GPU applications under weak memory models are limited to hand analysis • This is laborious, error prone, requires a formal model • We propose e a n a new w methodo dolo logy y bas ased on stress ess an and fu fuzz z testin ting 4

Overview Add stressing/fuzzing hooks and postcondition GPU Annotated notated GPU application application • Run annotated application for many iterations and check for postcondition violations. 5

Overview • Buggy dot product routine 6

Overview • Buggy dot product routine • Running the program for 1 hour (~2 seconds per run) the number of failed postconditions are: No No stress ess Stress ess/fuzzing /fuzzing 0 7

Overview • Buggy dot product routine • Running the program for 1 hour (~2 seconds per run) the number of failed postconditions are: No No stress ess Stress ess/fuzzing /fuzzing 0 396 396 8

Roadmap • Background • Stress testing details • Results 9

Weak memory models • consider the test known as message passing (MP) 10

Message passing (MP) test • T ests how to implement a handshake idiom Data Data 14

Message passing (MP) test • T ests how to implement a handshake idiom Flag Flag 15

Message passing (MP) test • T ests how to implement a handshake idiom Stale e Data 16

assertion this is known cannot as Lamport’s be satisfied by sequential interleavings consistency (or SC) 22

Weak memory models • can we assume assertion will never pass? 23

Weak memory models • can we assume assertion will never pass? No! 24

Weak memory models • Alglave et al. report this assertion passes 41 million times out of 5 billion test runs on T egra2 ARM processor 1 1 http://diy.inria.fr/cats/tables.html 25

Weak memory models • what happened? 26

Weak memory models • what happened? • architectures implement weak memory models where the hardware is allowed to re-order certain memory instructions. • weak memory models can allow weak behaviors (executions that do not correspond to an interleaving) 27

GPU programming Block 0 Block 1 Block n Within blocks, Threads threads are grouped into warps Shared memory Shared memory Shared memory for block 0 for block 1 for block n Global Memory 28

GPU programming Threads Global Memory 29

GPU programming Block 0 Block 1 Block n Threads Global Memory 30

GPU programming Block 0 Block 1 Block n Threads Shared memory Shared memory Shared memory for block 0 for block 1 for block n Global Memory 31

GPU programming Block 0 Block 1 Block n Within blocks, Threads threads are grouped into warps Shared memory Shared memory Shared memory for block 0 for block 1 for block n Global Memory 32

GPU memory models • Previous work 1 showed that GPUs empirically have weak memory models. • Done using a tool which ran litmus tests on GPUs • Required heuristics for weak behaviors to appear 1 GPU concurrency: Weak behaviours and programming assumptions. ASPLOS ’15. 34

Litmus tests 35

Memory stress T0 T1 extra thread 1 extra thread n . . . . . run T0 run T1 loop: loop: test test read or write read or write program program to scratchpad to scratchpad 36

Memory stress T0 T1 extra thread 1 extra thread n . . . . . run T0 run T1 loop: loop: test test read or write read or write program program to scratchpad to scratchpad Memory 37

Memory stress T0 T1 extra thread 1 extra thread n . . . . . run T0 run T1 loop: loop: test test read or write read or write program program to scratchpad to scratchpad Memory X Y 38

Memory stress T0 T1 extra thread 1 extra thread n . . . . . run T0 run T1 loop: loop: test test read or write read or write program program to scratchpad to scratchpad Memory Scratch X Scratch Y Scratch 39

Memory stress • Can we extend memory stress for testing applications? 40

Memory stress block 0 block n extra block 0 extra block x . . . . . . . . . . Run Memory application stress Application Memory Scratchpad Memory 41

Memory stress block 0 block n extra block 0 extra block x . . . . . . . . . . Run Memory application stress Application memory Scratchpad Memory 42

Memory stress • Goal: design stress to reveal weak behaviors with no a priori knowledge about the application. Memory stress • We investigate using litmus tests, MP , SB, and LB 43

Memory stress Where to stres ess: s: 44

Memory stress Where to stres ess: s: X Y • For each distance D : 45

Memory stress Where to stres ess: s: X Y • For each distance D: 47

Memory stress Where to stres ess: s: X D Y • For each distance D : 49

Memory stress Where to stres ess: s: X D Y • For each distance D : • For each scratchpad location I: I 50

Memory stress Where to stres ess: s: X D Y • For each distance D : • For each scratchpad location I: I I 51

Memory stress Where to stres ess: s: X D Y • For each distance D : • For each scratchpad location I: I I 52

Memory stress Where to stres ess: s: • For each distance D : X D Y • For each scratchpad location I: I I • Run MP , SB, LB LB at at distan ance e D litmus us tests ts stressi ssing ng only locat atio ion n I I fo for 1000 0 iterat ratio ions ns 53

Memory stress 54

Memory stress Distance D 55

Memory stress X D Y Distance D 56

Memory stress Distance D Index I stressed 57

Memory stress I I Distance D Index I stressed 58

Memory stress Distance D Litmus test Index I stressed 59

Memory stress Vertical bar represents the magnitude of weak behaviors observed 60

Memory stress • Visualization samples 61

Memory stress • What does this tell us? 65

Memory stress • What does this tell us? • T o reveal weak behaviors we only need to stress 1 in every 32 locations* • We call a contiguous region of 32 elements a pat atch *64 for some chips 66

Memory stress • How many patches can we effectively stress? • If D is unknown (as in applications), we would like to stress as many disjoint patches as possible 67

Memory stress • Scratchpad has size of 64 patches • We try stressing a randomly selected n patches for values 1 – 64 for n 68

Zoom in on first 8 70

Stressing 2 random patches is most effective 72

Memory stress • Now we have a memory stressing strategy! • Stress two random patches in the scratchpad • Patch size may change per chip 73

Application N-body particle simulation in Lonestar GPU benchmark 1 1 see: http://iss.ices.utexas.edu/?p=projects/galois/lonestargpu 75

Application N-body particle simulation in Lonestar GPU benchmark 1 • Documented to have communication across blocks • No other information a priori needed for our testing • Post condition checks the final location of particles 1 see: http://iss.ices.utexas.edu/?p=projects/galois/lonestargpu 76

Application Executing the application for 1 hour (~2 seconds per run), the number of erroneous runs on a Quadro K5200: 77

Application Executing the application for 1 hour (~2 seconds per run), the number of erroneous runs on a Quadro K5200: No No stress ess With th stres ess 0 78

Application Executing the application for 1 hour (~2 seconds per run), the number of erroneous runs on a Quadro K5200: No No stress ess With th stres ess 0 48 48 79

Comparing stresses • Does it matter how we stress? • We compare our systematic stressing method to 2 other stressing strategies 80

GPU PUs s Ap Appl plic ications ations Tyler Sorensen - PowerPoint PPT Presentation

Wea eak Memo emory y Be Beha havi vior ors s in in GPU PUs s Ap Appl plic ications ations Tyler Sorensen Supervisors: Alastair F. Donaldson and James Brotherston 15 July 2015 Imperial Concurrency Workshop 1 Overview Current

Tem pus 5 1 7 2 0 0 - Tem pus-1 -2 0 1 1 -1 -BE- Tem pus-SMGR Establishing and capacity building

John Muir Medical Center John Muir Medical Center Walnut Creek Cam pus Walnut Creek Cam pus

Mobile I Pv6 Service Mobile I Pv6 Service over the KAI ST Cam pus over the KAI ST Cam pus w

DTU Opportunities for students in Environmental Engineering DTU in the world and in Denmark

W elcom e to Clifton Cam pus HOW MANY STUDENTS STUDY AT CLI FTON CAMPUS? Clifton Cam pus

appl pplications ications By By Dal alia ia M. M. Sab abri, , PhD hD La Lab b Ma

for or Miss ssion ion-Cri Critic tical al Appl pplications ications Arjmand Samuel, Ph.D.

Le Lecture ture 3 AI AI ap appl plicatio ications, ns, Un Uninf nform ormed ed Se Sear

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Widening Participation Over erview ew of of A Adm dmiss ssions Con onte text xt Appl

ABC of Pr ABC of Pr ofe ssional ofe ssional Communic ations Communic ations Se ssion 1 Se

Ne w T e c hnologie s Ne w T e c hnologie s & Applic ations & Applic ations for for

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Th The e new ew tre rends ds of f nan anoma omater terials ials ap appl plica

MEDIA STRATEGIES IN POLITICAL CAMPAIGNS Week 10 Comm1A; Nov. 18-Dec. 4 Rational Candidates

Mobile Email Design 101 #WOWWEBINAR Private and Confidential. Property of Whereoware, LLC. MEET

Monitoring and controlling the mental states of others Stephen A. Butterfill & Ian A. Apperly

Web-Oriented Architecture (WOA) Introduction Dion Hinchcliffe ZDNets Enterprise Web 2.0

IDHEAS Method May 29 th , 2014 Jim Vaughn Shift Manager, Nine Mile Point Nuclear Power Plant

Personal Privacy in Ubiquitous Computing Marc Langheinrich ETH Zurich, Switzerland

Inventing Abstractions An Academic Perspective on Industrial Memory Models Susmit Sarkar with:

Reasoning about the C/C++ weak memory model Viktor Vafeiadis Max Planck Institute for Software

GPU PUs s Ap Appl plic ications ations Tyler Sorensen - PowerPoint PPT Presentation

Wea eak Memo emory y Be Beha havi vior ors s in in GPU PUs s Ap Appl plic ications ations Tyler Sorensen Supervisors: Alastair F. Donaldson and James Brotherston 15 July 2015 Imperial Concurrency Workshop 1 Overview Current

Tem pus 5 1 7 2 0 0 - Tem pus-1 -2 0 1 1 -1 -BE- Tem pus-SMGR Establishing and capacity building

John Muir Medical Center John Muir Medical Center Walnut Creek Cam pus Walnut Creek Cam pus

Mobile I Pv6 Service Mobile I Pv6 Service over the KAI ST Cam pus over the KAI ST Cam pus w

DTU Opportunities for students in Environmental Engineering DTU in the world and in Denmark

W elcom e to Clifton Cam pus HOW MANY STUDENTS STUDY AT CLI FTON CAMPUS? Clifton Cam pus

appl pplications ications By By Dal alia ia M. M. Sab abri, , PhD hD La Lab b Ma

for or Miss ssion ion-Cri Critic tical al Appl pplications ications Arjmand Samuel, Ph.D.

Le Lecture ture 3 AI AI ap appl plicatio ications, ns, Un Uninf nform ormed ed Se Sear

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Widening Participation Over erview ew of of A Adm dmiss ssions Con onte text xt Appl

ABC of Pr ABC of Pr ofe ssional ofe ssional Communic ations Communic ations Se ssion 1 Se

Ne w T e c hnologie s Ne w T e c hnologie s &amp; Applic ations &amp; Applic ations for for

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Th The e new ew tre rends ds of f nan anoma omater terials ials ap appl plica

MEDIA STRATEGIES IN POLITICAL CAMPAIGNS Week 10 Comm1A; Nov. 18-Dec. 4 Rational Candidates

Mobile Email Design 101 #WOWWEBINAR Private and Confidential. Property of Whereoware, LLC. MEET

Monitoring and controlling the mental states of others Stephen A. Butterfill &amp; Ian A. Apperly

Web-Oriented Architecture (WOA) Introduction Dion Hinchcliffe ZDNets Enterprise Web 2.0

IDHEAS Method May 29 th , 2014 Jim Vaughn Shift Manager, Nine Mile Point Nuclear Power Plant

Personal Privacy in Ubiquitous Computing Marc Langheinrich ETH Zurich, Switzerland

Inventing Abstractions An Academic Perspective on Industrial Memory Models Susmit Sarkar with:

Reasoning about the C/C++ weak memory model Viktor Vafeiadis Max Planck Institute for Software

Ne w T e c hnologie s Ne w T e c hnologie s & Applic ations & Applic ations for for

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Monitoring and controlling the mental states of others Stephen A. Butterfill & Ian A. Apperly