Where Have all the Cycles Gone? Investigating the Runtime Overheads - PowerPoint PPT Presentation

Fakultät Informatik Institut für Systemarchitektur, Lehrstuhl Betriebssysteme Where Have all the Cycles Gone? Investigating the Runtime Overheads of OS- Assisted Replication Björn Döbel, Hermann Härtig TU Dresden Operating Systems Group Koblenz, 16.09.2013

ASTEROID – OS-Assisted Replication Enc. Driver APP Driver APP Driver APP APP Proc. [1] L4Re Romain Fiasco.OC [1] Döbel, Härtig, Engel: Operating System Support for Redundant Multithreading , EMSOFT 2012 TU Dresden, 07.05.14 Where have all the cycles gone? Folie 2 / 23

ASTEROID – OS-Assisted Replication • Interpose on system Enc. APP APP calls & CPU APP APP Proc. exceptions L4Re Romain • Replicate memory (no need for ECC) Fiasco.OC • Unmodified binary applications TU Dresden, 07.05.14 Where have all the cycles gone? Folie 3 / 23

How Much is Replicated Execution? • Resource Overhead • Roughly: N x replication → N x resouces • Optimizations vs. error coverage • Fault Coverage • No complete measurements yet • Estimation: matches compiler-assistance • Runtime overhead • Should be optimized for • This paper: SPEC INT 2006 TU Dresden, 07.05.14 Where have all the cycles gone? Folie 4 / 23

Experiment Setup L2 L1 Intel X5650 @ 2.66 GHz TU Dresden, 07.05.14 Where have all the cycles gone? Folie 5 / 23

Experiment Setup L L 3 3 12 GB RAM TU Dresden, 07.05.14 Where have all the cycles gone? Folie 6 / 23

ASTEROID – OS-Assisted Replication • L4/Fiasco.OC, 32 bit + Romain • SPEC INT 2006 L L 400.perl 3 3 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 12 GB RAM 458.sjeng 462.libquantum 464.h264 471.omnet++ 473.astar 478.xalancbmk TU Dresden, 07.05.14 Where have all the cycles gone? Folie 7 / 23

Engage! TU Dresden, 07.05.14 Where have all the cycles gone? Folie 8 / 23

TU Dresden, 07.05.14 Where have all the cycles gone? Folie 9 / 23

The Problem: CPU Assignment Native print() Execution Higher priority logger CPU0 App App Time TU Dresden, 07.05.14 Where have all the cycles gone? Folie 10 / 23

The Problem: CPU Assignment Native print() Execution Higher priority logger CPU0 App App Time print() Higher priority logger CPU0 CPU1 App App CPU2 App App ... CPU3 App App Replicated Execution TU Dresden, 07.05.14 Where have all the cycles gone? Folie 11 / 23

Where does overhead come from? Replica Replica ... Replica 1 exec. exec. Replica Replica ... Replica 2 exec. exec. Validate System Master States Call Sync time Notification time TU Dresden, 07.05.14 Where have all the cycles gone? Folie 12 / 23

Where does overhead come from? Source Overhead vs. native Per-replica execution unmodified +/- 0 Sync time No background load ~ 0 State comparison ~ 100 cycles System call Mostly unmodified ~ 0 Notifications Local core ~ 2,000 cycles On socket ~ 6,000 cycles Cross-socket ~ 14,300 cycles Rule: Prefer placing replicas on the same CPU socket TU Dresden, 07.05.14 Where have all the cycles gone? Folie 13 / 23

Replicating SPEC INT 2006 TU Dresden, 07.05.14 Where have all the cycles gone? Folie 14 / 23

Idea: Reduce Memory Management Overhead • Assumption: memory management Microbenchmark is expensive Native: 0.72 s 4 kB pages 1x: 0.80 s • Idea: reduce overhead by using x86 2x: 2.23 s huge pages (4 MB) 3x: 3.12 s • Works for microbenchmark Native: 0.38 s 4 MB pages 1x: 0.38 s • SPEC CPU: (nearly) no difference 2x: 0.53 s 3x: 0.91 s TU Dresden, 07.05.14 Where have all the cycles gone? Folie 15 / 23

Secondary Effects: Cache Miss Rates TMR misses Benchmark DMR misses L2 L3 L2 L3 429.mcf 2,600 1,300,000 11,000,000 5,200,000 462.libquantum 2,500 570 440,000 387,000 471.omnet++ 270,000 6,900,000 35,000,000 21,200,000 x 130 x 3 Rule: Prefer placing replicas on a different CPU socket TU Dresden, 07.05.14 Where have all the cycles gone? Folie 16 / 23

Secondary Effects: Cache Miss Rates TMR misses Benchmark DMR misses L2 L3 L2 L3 429.mcf 2,600 1,300,000 11,000,000 5,200,000 → 2,600 → 930,000 → 11,000,000 → 3,600,000 462.libquantum 2,500 570 → 323 440,000 387,000 → 2,500 → 385,000 → 8,700 471.omnet++ 270,000 → 6,900,000 35,000,000 21,200,000 290,000 → 5,500,000 → 34,900,000 → 16,400,000 TU Dresden, 07.05.14 Where have all the cycles gone? Folie 17 / 23

SPEC INT: Improved L3 Miss Rates TU Dresden, 07.05.14 Where have all the cycles gone? Folie 18 / 23

And now for something slightly different... Driver Enc. APP Driver APP Driver APP APP Proc. L4Re Romain Fiasco.OC Reliable Computing Base [2] [2] Engel, Döbel: The Reliable Computing Base – A new Paradigm ... , SOBRES 2012 TU Dresden, 07.05.14 Where have all the cycles gone? Folie 19 / 23

Protecting the RCB • We have full RCB source, so we can apply compiler- level techniques. • Encoded compiler anyone? → Approximate Overhead TU Dresden, 07.05.14 Where have all the cycles gone? Folie 20 / 23

Protecting the RCB: 2nd try t prot := t app + C * (t kern + t master + t kern ') + t hw t hw = t kern = t kern' = 0 t prot := t native + C * t Replicated TU Dresden, 07.05.14 Where have all the cycles gone? Folie 21 / 23

Protecting the RCB: 2nd try • Selected values for C [3]:  C SWIFT = 1.09  C ANBD = 3.89 [3] Schiffel, Schmitt, Süßkraut, Fetzer: Software-Implemented Hardware Error Detection: Costs and Gains , DEPEND 2010 TU Dresden, 07.05.14 Where have all the cycles gone? Folie 22 / 23

Summary • Romain: <5% overhead for 3x replicating most of the SPEC INT 2006 benchmarks • Replica-core placement matters • RCB protection may add additional overheads  Combination with compiler-level methods seems feasible TU Dresden, 07.05.14 Where have all the cycles gone? Folie 23 / 23

Where Have all the Cycles Gone? Investigating the Runtime Overheads - PowerPoint PPT Presentation

Fakultt Informatik Institut fr Systemarchitektur, Lehrstuhl Betriebssysteme Where Have all the Cycles Gone? Investigating the Runtime Overheads of OS- Assisted Replication Bjrn Dbel, Hermann Hrtig TU Dresden Operating Systems Group

Have you ever gone camping? Have you ever gone camping? Have you ever gone camping? Have you

SSL, GONE IN 30 SECONDS b r e a c h A BREACH beyond CRIME SSL, GONE IN 30 SECONDS AGENDA

Gone is Gone: Lessons from a Downtown Demolition Dr. Carole Nash, School of Integrated Sciences,

Remember (1849) By Christina Rossetti Remember me when I am gone away, Gone far away into the

Good Deals Gone Bad: Good Deals Gone Bad: Structuring Transactions to Structuring Transactions

Gone, But Not Gone The Rest of the Student Retention Story Presenters: Dr. Lillie Howard

SSL, GONE IN 30 SECONDS b r e a c h A BREACH beyond CRIME SSL, GONE IN 30 SECONDS PREVIOUSLY...

Good Data Gone Bad, Bad Data Gone Worse Renee Phillips pgconf.eu 2019 1 This is me. 2 Sakeeb

A Mountain Bikers Perspective Michael Cowan Chain Reaction Cycles Hello Michael Cowan

UX Case Study Created by Joy Fonderson for client: City Cycles Introduction As part of my

UX Case Study Created by Lisa Savoie for client: City Cycles Introduction As part of my

Power cycle development Steam cycles dominant for >300 yrs, mostly Rankine Gas Brayton

Global Climate Change Mil ilankovitch Cycles Comprised of 3 dominant cycles: 1. Eccentricity

Indirect Transmitted Infectious Diseases: from Microscopic Cycles to Macroscopic Cycles Jude D.

Cache and Syphilis RootedCON 2019 Haswell (4th generation) architecture Cache latencies:

1 Start: policy makers views on macro -finance 1. Framing: why study financial cycles?

Replication and Robust Results Jim Herbsleb School of Computer Science Carnegie Mellon

Replication Outline Failure Models Mirroring Quorums Spring 2002 CS 461 1 Why Replicate?

BEYOND THE CLUSTER: WAN DATA REPLICATION WITH GRIDGAIN YAKOV ZHDANOV WHO? Yakov Zhdanov: -

Replicating MySQL Data to TiDB For Near Real-Time Analytics Who Are We Jervin Real, Architect

Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer

Dynamic Replication and Partitioning Costin Raiciu University College London Joint work with

Research Reproducibility in Computational Social Science Aek Palakorn Achananuparp, SMU Research

Tutorial on Floating-Point Analysis and Reproducibility Tools for Scientific Software Ignacio

Where Have all the Cycles Gone? Investigating the Runtime Overheads - PowerPoint PPT Presentation

Fakultt Informatik Institut fr Systemarchitektur, Lehrstuhl Betriebssysteme Where Have all the Cycles Gone? Investigating the Runtime Overheads of OS- Assisted Replication Bjrn Dbel, Hermann Hrtig TU Dresden Operating Systems Group

Have you ever gone camping? Have you ever gone camping? Have you ever gone camping? Have you

SSL, GONE IN 30 SECONDS b r e a c h A BREACH beyond CRIME SSL, GONE IN 30 SECONDS AGENDA

Gone is Gone: Lessons from a Downtown Demolition Dr. Carole Nash, School of Integrated Sciences,

Remember (1849) By Christina Rossetti Remember me when I am gone away, Gone far away into the

Good Deals Gone Bad: Good Deals Gone Bad: Structuring Transactions to Structuring Transactions

Gone, But Not Gone The Rest of the Student Retention Story Presenters: Dr. Lillie Howard

SSL, GONE IN 30 SECONDS b r e a c h A BREACH beyond CRIME SSL, GONE IN 30 SECONDS PREVIOUSLY...

Good Data Gone Bad, Bad Data Gone Worse Renee Phillips pgconf.eu 2019 1 This is me. 2 Sakeeb

A Mountain Bikers Perspective Michael Cowan Chain Reaction Cycles Hello Michael Cowan

UX Case Study Created by Joy Fonderson for client: City Cycles Introduction As part of my

UX Case Study Created by Lisa Savoie for client: City Cycles Introduction As part of my

Power cycle development Steam cycles dominant for &gt;300 yrs, mostly Rankine Gas Brayton

Global Climate Change Mil ilankovitch Cycles Comprised of 3 dominant cycles: 1. Eccentricity

Indirect Transmitted Infectious Diseases: from Microscopic Cycles to Macroscopic Cycles Jude D.

Cache and Syphilis RootedCON 2019 Haswell (4th generation) architecture Cache latencies:

1 Start: policy makers views on macro -finance 1. Framing: why study financial cycles?

Replication and Robust Results Jim Herbsleb School of Computer Science Carnegie Mellon

Replication Outline Failure Models Mirroring Quorums Spring 2002 CS 461 1 Why Replicate?

BEYOND THE CLUSTER: WAN DATA REPLICATION WITH GRIDGAIN YAKOV ZHDANOV WHO? Yakov Zhdanov: -

Replicating MySQL Data to TiDB For Near Real-Time Analytics Who Are We Jervin Real, Architect

Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer

Dynamic Replication and Partitioning Costin Raiciu University College London Joint work with

Research Reproducibility in Computational Social Science Aek Palakorn Achananuparp, SMU Research

Tutorial on Floating-Point Analysis and Reproducibility Tools for Scientific Software Ignacio

Power cycle development Steam cycles dominant for >300 yrs, mostly Rankine Gas Brayton