Too-much Bloody Determinism CREST Workshop 22/03/12 Steven Hand 1 - PowerPoint PPT Presentation

Extreme Specialization Too-much Bloody Determinism CREST Workshop 22/03/12 Steven Hand 1

Multicore, Manycore & Mayhem • The era of M*core is upon us – Standard desktop machines now quad core (and standard servers are 2x or 4x this) – 8- and 12-core processors around the corner – Intel MIC & Tilera & foor & bar & baz => AIEE!!! • Considerable reaction from academia & industry – Moore’s law is dead! – We need new paradigms! (or at least new software) • This talk will cover some of my thoughts on this – Warning: speculative, argumentative, XXXative – and quite possibly plain wrong!

10-core Xeon (Westmere EX) 16K cores? 2020 3

Multicore, Manycore & Mayhem • The era of M*core is upon us – Standard desktop machines now quad core (and standard servers are 2x or 4x this) – 8- and 12-core processors around the corner – Intel MIC & Tilera & foor & bar & baz => AIEE!!! • Considerable reaction from academia & industry – Moore’s law is dead! – We need new paradigms! (or at least new software) • This talk will cover some of my thoughts on this – Warning : speculative, argumentative, XXXative – and quite possibly plain wrong!

Is there really a problem? • Today’s server systems work pretty well – HPC and similar – extremely parallel, scale easily – Existing server apps – extremely parallel, scale easily – OSes fine too – TxLinux (SOSP’07) shows max 12% – Brief panic (Corey, OSDI’08) but then all fine (OSDI’10) • Transactional memory not reqd for performance – Roy (HotPar’09) shows zero speed up for Apache – TxLinux shows 4-8% benefit from HTM (1% for x16!) • And if they don’t, VMMs (or other strongly partitioned OSes like Barrelfish) provide a decent solution – Disco (SOSP’95) was rather prescient…

But what about new applications? • One argument is that (most) programmers just shouldn’t worry about -ism – although, anecdotally, many seem to :-( • Instead focus on strategies (like divide and conquer) – Or on annotations (OpenMP, *- SS, …) – Or on libraries (Intel’s TBB, java.util.concurrent, ..) – Or on task-parallel programming frameworks (e.g. Cilk or MapReduce/Phoenix or Ciel or …) • Last can potentially support: – Transparent scaling (up and down = FT story), and – Code mobility (desktop, cloud, mobile, GPGPU (?), …) 6

Cloud Run-time Environments • If we move to new programming paradigms, great potential for scalability and fault tolerance • But MapReduce/Dryad/Ciel are user-space frameworks in a traditional OS (in a VM!) – Do we really need all these layers? • One possibility is to build a “custom” OS for the cloud (or at least for data intensive computing) – E.g. Xen powers most cloud computing platforms – It forms a stable virtual hardware interface – Therefore, can compile apps directly to Xen “kernels”

MirageOS: Specialized Kernels

MirageOS: Current Design Memory Layout 64-bit para-virtual memory layout No context switching Zero-copy I/O to Xen Super page mappings for heap Concurrency Cooperative threading and events Fast inter-domain communication Works across cores and hosts

DNS: BIND (C) vs Deens (ML)

DNS: with functional memoisation

SQLite performance vs PV Linux

MirageOS: Status • Open source, and has self-hosted(!) web site • Alpha quality code, but under active development at Cambridge & elsewhere – Code, tutorial and slides on web site – Recent work includes OpenFlow software – Supported by EPSRC, Verisign and DARPA URL: http://openmirage.org/ 13

Peering into The Future • Unlikely that everyone will move to MirageOS and ocaml overnight ;-) • Q: can we develop tools and systems which help regular programmers to exploit M*-core? – not about “auto parallelization” in the traditional sense (i.e. extracting fine-grained parallelism) – don’t want to make SPECint (or Parsec) faster • Our focus is on two related strands: – semi-automatic transformation of programs into task- parallel / data-flow form (c/f SOAAP), and – semi-automatic transformation of single threaded code to exploit additional cores 14

The Death of Multiprogramming • Widely overlooked problem with M*-core: – What do we do when a thread blocks? – Traditional solution (run another thread) doesn’t work so well if very large #cores • How can we reduce wait time ? – Amount of time ‘the thread’ spends unable to run • One possibility is extreme specialization : – Combines ideas from partial evaluation, memoization, dynamic specialization and speculation! 15

Specializing File I/O • One student looking at desktop applications – e.g. at start of day, load XML configuration file from disk to generate a set of program variables – can concretize values at compile stage, and partially evaluate (lots of constant propagation!) – can also elide unreachable paths (dead code elimination), and unroll loops, and inline functions – can even eliminate threads (or aio) – e.g. for font search paths, plugin scans, etc, etc • So far seems promising… at least for start-up … 16

Dealing with Uncertainty • At some stage your analysis breaks down – i.e. cannot continue with sound optimizations • This is an opportunity to gamble : – Guess which path will be taken (i.e. speculate) – Can also speculate on data values • In vanilla form, this is just symbolic execution – Remember the path predicates – Generate code guarded appropriately – Keep original stuff around just in case • Have some more extreme run-time options too: – e.g. force values into well-behaved ranges (Rinard) 17

A Use for Many-Core? • May well be many plausible values with associated paths: – Great! – Use lots of single-core almost-replicas, each specialized for specific cases – Fire up more as and when you encounter more uncertainty (e.g. I/O operations) – Garbage collect as needed – (Reserve one core for general case if you want ;-) • System now deterministic in K different universes 18

Wrapping it up • -programming can/should be a specialty – don’t expect ‘regular’ programs to do assembly • Develop a set of useful frameworks/languages – Different solutions for different patterns – Already made a great start on this – Personally expect (hope?) <20 will be enough • Real challenge is how to use many cores to make life better for the masses – app-per-core (or partially evaluated app-per-core) seems like it should work to me • But then again, I could be wrong ;-) 20

Too-much Bloody Determinism CREST Workshop 22/03/12 Steven Hand 1 - PowerPoint PPT Presentation

Extreme Specialization Too-much Bloody Determinism CREST Workshop 22/03/12 Steven Hand 1 Multicore, Manycore & Mayhem The era of M*core is upon us Standard desktop machines now quad core (and standard servers are 2x or 4x this)

9/10/18 BLOODY SUNDAY TRUST CONFLICT TRANSFORMATION AND PEACE BUILDING PROJECT Background to

Upstream Graphics: Too Little, Too Late Upstream Graphics: Too Little, Too Late Daniel Vetter,

Not a Fairy Tale Year for State Risk Managers Too hot, Too cold Too little, Too much Lessons

Bloody Footprints in the Snow: The Testers Rant Neil Kirby Member of Technical Staff

To Be Free Terence Picton Physical Determinism Free Will Neuro-Determinism Imagined Future

Section 3 Non-Determinism, Regular Expressions, and Kleenes Theorem Automata Theory

Section 3 Non-Determinism, Regular Expressions, and Kleenes Theorem Automata Theory

too hilly, too hard What are the barriers to cycling in the UK? David Wildman 15 th

ATLANTA BOURBON COMPANY Too much of anything is bad, but too much good whiskey is barely

Crab Nebula Flares: Too much ado about not too much? D. Kazanas NASA/GSFC The Crab Nebula

Too much or too little red cells What should you do? Dr Melissa Ooi Consultant Haematologist,

Determinism in Deep Learning (S9911) Duncan Riach, GTC 2019 1 RANDOMNESS Pseudo-random number

Technological determinism: PAUL THOMPSON AND revitalising labour KNUT LAASER process analyses

Race Why is parallelism hard? Non-determinism!! Practice Theory 2 Why is parallelism

Time-domain determinism using modern SoCs OSPERT 2019 David Haworth 1/42 1 / 42 Elektrobit

No More Bloody Mess: A Practical Guide to No relevant financial interests in any product

Multiplication by an Integer Constant: Lower Bounds on the Code Length Vincent L EFVRE Loria,

Memoryless Near-Collisions via Coding Theory Mario Lamberger Florian Mendel Vincent Rijmen

Verification of Dekkers Algorithm Proof of mutual exclusion Algorithm 4.2: Dekkers

Computer Security 3e Dieter Gollmann Chapter 10: 1 www.wiley.com/college/gollmann Chapter 10:

Advanced MPI Programming Tutorial at SC15, November 2015 Latest slides and code examples are

Cetus-assisted checkpointing of parallel codes guez , M.J. Mart n, P. Gonz alez, J.

14:332:231 DIGITAL LOGIC DESIGN Ivan Marsic, Rutgers University Electrical & Computer

Exercise 5a: First Prediction between Transform Blocks DC Prediction ( Goal : utilization of

Sambuz

Useful Links

Newsletter

Mail Us

Too-much Bloody Determinism CREST Workshop 22/03/12 Steven Hand 1 - PowerPoint PPT Presentation

Extreme Specialization Too-much Bloody Determinism CREST Workshop 22/03/12 Steven Hand 1 Multicore, Manycore & Mayhem The era of M*core is upon us Standard desktop machines now quad core (and standard servers are 2x or 4x this)

9/10/18 BLOODY SUNDAY TRUST CONFLICT TRANSFORMATION AND PEACE BUILDING PROJECT Background to

Upstream Graphics: Too Little, Too Late Upstream Graphics: Too Little, Too Late Daniel Vetter,

Not a Fairy Tale Year for State Risk Managers Too hot, Too cold Too little, Too much Lessons

Bloody Footprints in the Snow: The Testers Rant Neil Kirby Member of Technical Staff

To Be Free Terence Picton Physical Determinism Free Will Neuro-Determinism Imagined Future

Section 3 Non-Determinism, Regular Expressions, and Kleenes Theorem Automata Theory

Section 3 Non-Determinism, Regular Expressions, and Kleenes Theorem Automata Theory

too hilly, too hard What are the barriers to cycling in the UK? David Wildman 15 th

ATLANTA BOURBON COMPANY Too much of anything is bad, but too much good whiskey is barely

Crab Nebula Flares: Too much ado about not too much? D. Kazanas NASA/GSFC The Crab Nebula

Too much or too little red cells What should you do? Dr Melissa Ooi Consultant Haematologist,

Determinism in Deep Learning (S9911) Duncan Riach, GTC 2019 1 RANDOMNESS Pseudo-random number

Technological determinism: PAUL THOMPSON AND revitalising labour KNUT LAASER process analyses

Race Why is parallelism hard? Non-determinism!! Practice Theory 2 Why is parallelism

Time-domain determinism using modern SoCs OSPERT 2019 David Haworth 1/42 1 / 42 Elektrobit

No More Bloody Mess: A Practical Guide to No relevant financial interests in any product

Multiplication by an Integer Constant: Lower Bounds on the Code Length Vincent L EFVRE Loria,

Memoryless Near-Collisions via Coding Theory Mario Lamberger Florian Mendel Vincent Rijmen

Verification of Dekkers Algorithm Proof of mutual exclusion Algorithm 4.2: Dekkers

Computer Security 3e Dieter Gollmann Chapter 10: 1 www.wiley.com/college/gollmann Chapter 10:

Advanced MPI Programming Tutorial at SC15, November 2015 Latest slides and code examples are

Cetus-assisted checkpointing of parallel codes guez , M.J. Mart n, P. Gonz alez, J.

14:332:231 DIGITAL LOGIC DESIGN Ivan Marsic, Rutgers University Electrical &amp; Computer

Exercise 5a: First Prediction between Transform Blocks DC Prediction ( Goal : utilization of

Sambuz

Useful Links

Newsletter

Mail Us

14:332:231 DIGITAL LOGIC DESIGN Ivan Marsic, Rutgers University Electrical & Computer