Memory Models and OpenMP Hans-J. Boehm 6/16/2010 1 Disclaimers: - PowerPoint PPT Presentation

Memory Models and OpenMP Hans-J. Boehm 6/16/2010 1

Disclaimers: • Much of this work was done by others or jointly. I’m relying particularly on: – Basic approach: Sarita Adve, Mark Hill, Ada 83 … – JSR 133: Also Jeremy Manson, Bill Pugh, Doug Lea – C++0x: Lawrence Crowl, Clark Nelson, Paul McKenney, Herb Sutter, … – Improved hardware models: Bratin Saha, Peter Sewell’s group, … • But some of it is still controversial. – This reflects my personal views. • I’m not an OpenMP expert (though I’m learning). • My experience is not primarily with numerical code. 6/16/2010 2

The problem • Shared memory parallel programs are built on shared variables visible to multiple threads of control. • But what do they mean? – Are concurrent accesses allowed? – What is a concurrent access? – When do updates become visible to other threads? – Can an update be partially visible? • There was much confusion. ~2006: – Standard compiler optimizations “broke” C code. – Posix committee members disagreed about basic rules. – Unclear the rules were implementable on e.g. X86. – … 6/16/2010 3

Outline • Emerging consensus: – Interleaving semantics (Sequential Consistency) – But only for data-race-free programs • Brief discussion of consequences – Software requirements – Hardware requirements • How OpenMP fits in: – Largely sequentially consistent for DRF. – But some remaining differences. – Current flush-based formulation is problematic. 6/16/2010 4

Naive threads programming model (Sequential Consistency) • Threads behave as though their memory accesses were simply interleaved. (Sequential consistency) Thread 1 Thread 2 �� – might be executed as �� 6/16/2010 5

Locks/barriers restrict interleavings Thread 1 Thread 2 �� – can only be executed as �� or �� since second lock(l) must follow first unlock(l) 6/16/2010 6

But this doesn’t quite work … • Limits reordering and other hardware/compiler transformations – “Dekker’s” example (everything initially zero) should allow �� = �� = 0: Thread 1 Thread 2 �� • Sensitive to memory access granularity: Thread 1 Thread 2 �� – may result in x = 356 with sequentially consistent byte accesses. 6/16/2010 7

Real threads programming model • An interleaving exhibits a data race if two consecutive steps – access the same scalar variable* conflict – at least one access is a store – are performed by different threads • Sequential consistency only for data-race-free programs! – Avoid anything else. • Data races are prevented by – locks (or atomic sections) to restrict interleaving – declaring synchronization variables (stay tuned …) *Bit-fields get special treatment 6/16/2010 8

Data Races • Are defined in terms of sequentially consistent executions. • If � and � are initially zero, this does not have a data race: Thread 1 Thread 2 �� 6/16/2010 9

Synchronization variables • Java: �� , �� . • C++0x: �� • C1x: �� !�� • OpenMP 4.0 proposal: " #$��%�� $ ��&�' �&�� • Guarantee indivisibility of operations. • “Don’t count” in determining whether there is a data race: – Programs with “races” on synchronization variables are still sequentially consistent. – Though there may be “escapes” (Java, C++0x, not discussed here). • Dekker’s algorithm “just works” with synchronization variables. 6/16/2010 10

SC for DRF programming model advantages over SC • Supports important hardware & compiler optimizations. • DRF restriction � Independence from memory access granularity. – Hardware independence. – Synchronization-free library calls are atomic. – Really a different and better programming model than SC. 6/16/2010 11

Basic SC for DRF implementation model (1) • Sync operations sequentially �� consistent. synch-free • Very restricted reordering of code memory operations around region synchronization operations: – Compiler either understands these, or �� treats them as opaque, potentially updating any location. synch-free – Synchronization operations include code instructions to limit or prevent region hardware reordering. • Usually “memory fences” (unfortunately?) 6/16/2010 12

SC for DRF implementation model (2) • Code may be reordered between �� synchronization operations. synch-free – Another thread can only tell if it code accesses the same data between reordered operations. region – Such an access would be a data race. • If data races are disallowed (e.g. �� Posix, Ada, C++0x, OpenMP 3.0, synch-free not Java), compiler may assume code that variables don’t change asynchronously. region 6/16/2010 13

Possible effect of “no asynchronous changes” compiler assumption: ��&�%��(�� • Assume switch statement compiled as branch table. )��* • May assume � is in + ,, async � change range. &-��.��* • Asynchronous change to ��&��/�+ � causes wild branch. ��&��/�+ – Not just wrong value. ��&��/�+ • Rare, but possible in 0 current compilers? 0 data races 6/16/2010 6/16/2010 14 14

Some variants SC for DRF*, C++ draft (C++0x) Data races are errors C draft (C1x) SC for DRF**, Java Complex race semantics SC for drf (sort of) Ada83+, Posix threads SC for drf OpenMP, Fortran 2008 (except atomics, sort of) Getting there, we hope ☺ .Net * Except explicitly specified memory ordering. ** Except some j.u.c.atomic. 6/16/2010 15

An important note • SC for DRF is a major improvement, but not the whole answer. • There are serious remaining problems for – Debugging. – Programs that need to support “sand-boxed” code, e.g. in Java. • We really want – sequential consistency for data-race-free programs. – at worst fail-stop behavior for data races. • But that’s a hard research problem, and a different talk. 6/16/2010 16

Outline • Emerging consensus: – Interleaving semantics (Sequential Consistency) – But only for data-race-free programs • Brief discussion of consequences – Software requirements – Hardware requirements • How OpenMP fits in: – Largely sequentially consistent for DRF. – But some remaining differences. – Current flush-based formulation is problematic. 6/16/2010 17

Compilers must not introduce data races • Single thread compilers currently may add data races: (PLDI 05) &�� *�.��.��10�� $ �� $�� 2�3� �� 2�3� ��$� " �� in parallel with ��1 �� may lose x.b update. • Still broken in gcc in bit-field-related cases. 6/16/2010 18 18

A more subtle way to introduce data races �� ,,�%��1��4�$�&&�1��&.��( + ��$��'��$�5��$��$�6�� $�6��(�� ,,�%��1��4�$�&&�1��&.��( + ��% �� $��'��$�5��$��$�6�� $�6��(��%� ��%� ,,��&$��&��&&�%�� 6/16/2010 6/16/2010 19 19 19

Synchronization primitives need careful definition • More on this later … 6/16/2010 20

Memory Models and OpenMP Hans-J. Boehm 6/16/2010 1 Disclaimers: - PowerPoint PPT Presentation

Memory Models and OpenMP Hans-J. Boehm 6/16/2010 1 Disclaimers: Much of this work was done by others or jointly. Im relying particularly on: Basic approach: Sarita Adve, Mark Hill, Ada 83 JSR 133: Also Jeremy Manson, Bill

Recommended Reading A Brief Introduction to OpenMP OpenMP FAQ http://openmp.org/openmp-faq.html

Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

Threaded Programming Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

OpenMP Paolo Burgio paolo.burgio@unimore.it A history of OpenMP 1997 OpenMP for

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

OpenMP 4.0 and Beyond! Aidan Chalk, Hartree Centre, STFC What is OpenMP? OpenMP is an API

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013

SHARED MEMORY PROGRAMMING WITH OPENMP Lecture 9: OpenMP Performance 2 A common scenario.....

OpenMP: a shared-memory parallel programming model Eduard Ayguad Computer Sciences Department

Speeding Up Reactive Transport Code Using OpenMP By Jared McLaughlin OpenMP A standard for

Parallel Programming using OpenMP Qin Liu The Chinese University of Hong Kong 1 Overview Why

Introduction to OpenMP Lecture 6: Further topics in OpenMP Nested parallelism Unlike most

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Advanced OpenMP Lecture 4: OpenMP and MPI Motivation In recent years there has been a trend

Programming Shared-memory Platforms with OpenMP Xu Liu Topics for Today Introduction to

GTI Diagonalization A. Ada, K. Sutner Carnegie Mellon University Fall 2017 Comments 1

COMP 516 Research Methods in Computer Science Dominik Wojtczak Department of Computer Science

Programming Language Ideas Escape the Lab: A Declarative Data Description Language Kathleen

in Social Media Context Zhongyu Wei 1 , Junwen Chen 1 , Wei Gao 2 , Binyang Li 1 Lanjun Zhou 1 ,

SecT Computer Security Seminar T echnische Universitt Berlin, Security in T elecommunications

BliStr: The Blind Strategymaker Josef Urban Czech Technical University in Prague October 18,

Teilchenphysik mit hchstenergetischen Beschleunigern (Higgs & Co) 2. Hadron Accelerators

How To Use This Unique Deal Who Is This Guy, Joe? Structure to Make $1,000,000 In Full-Time

Memory Models and OpenMP Hans-J. Boehm 6/16/2010 1 Disclaimers: - PowerPoint PPT Presentation

Memory Models and OpenMP Hans-J. Boehm 6/16/2010 1 Disclaimers: Much of this work was done by others or jointly. Im relying particularly on: Basic approach: Sarita Adve, Mark Hill, Ada 83 JSR 133: Also Jeremy Manson, Bill

Recommended Reading A Brief Introduction to OpenMP OpenMP FAQ http://openmp.org/openmp-faq.html

Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

Threaded Programming Lecture 2: OpenMP fundamentals Overview Basic Concepts in OpenMP

OpenMP Paolo Burgio paolo.burgio@unimore.it A history of OpenMP 1997 OpenMP for

Shared Memory Programming Introduction to OpenMP Overview Shared memory systems Basic

OpenMP 4.0 and Beyond! Aidan Chalk, Hartree Centre, STFC What is OpenMP? OpenMP is an API

Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What

Advanced OpenMP Lecture 11: OpenMP 4.0 OpenMP 4.0 Version 4.0 was released in July 2013

SHARED MEMORY PROGRAMMING WITH OPENMP Lecture 9: OpenMP Performance 2 A common scenario.....

OpenMP: a shared-memory parallel programming model Eduard Ayguad Computer Sciences Department

Speeding Up Reactive Transport Code Using OpenMP By Jared McLaughlin OpenMP A standard for

Parallel Programming using OpenMP Qin Liu The Chinese University of Hong Kong 1 Overview Why

Introduction to OpenMP Lecture 6: Further topics in OpenMP Nested parallelism Unlike most

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Advanced OpenMP Lecture 4: OpenMP and MPI Motivation In recent years there has been a trend

Programming Shared-memory Platforms with OpenMP Xu Liu Topics for Today Introduction to

GTI Diagonalization A. Ada, K. Sutner Carnegie Mellon University Fall 2017 Comments 1

COMP 516 Research Methods in Computer Science Dominik Wojtczak Department of Computer Science

Programming Language Ideas Escape the Lab: A Declarative Data Description Language Kathleen

in Social Media Context Zhongyu Wei 1 , Junwen Chen 1 , Wei Gao 2 , Binyang Li 1 Lanjun Zhou 1 ,

SecT Computer Security Seminar T echnische Universitt Berlin, Security in T elecommunications

BliStr: The Blind Strategymaker Josef Urban Czech Technical University in Prague October 18,

Teilchenphysik mit hchstenergetischen Beschleunigern (Higgs &amp; Co) 2. Hadron Accelerators

How To Use This Unique Deal Who Is This Guy, Joe? Structure to Make $1,000,000 In Full-Time

Teilchenphysik mit hchstenergetischen Beschleunigern (Higgs & Co) 2. Hadron Accelerators