Fakultät Informatik Institut für Systemarchitektur, Lehrstuhl Betriebssysteme Where Have all the Cycles Gone? Investigating the Runtime Overheads of OS- Assisted Replication Björn Döbel, Hermann Härtig TU Dresden Operating Systems Group Koblenz, 16.09.2013
ASTEROID – OS-Assisted Replication Enc. Driver APP Driver APP Driver APP APP Proc. [1] L4Re Romain Fiasco.OC [1] Döbel, Härtig, Engel: Operating System Support for Redundant Multithreading , EMSOFT 2012 TU Dresden, 07.05.14 Where have all the cycles gone? Folie 2 / 23
ASTEROID – OS-Assisted Replication • Interpose on system Enc. APP APP calls & CPU APP APP Proc. exceptions L4Re Romain • Replicate memory (no need for ECC) Fiasco.OC • Unmodified binary applications TU Dresden, 07.05.14 Where have all the cycles gone? Folie 3 / 23
How Much is Replicated Execution? • Resource Overhead • Roughly: N x replication → N x resouces • Optimizations vs. error coverage • Fault Coverage • No complete measurements yet • Estimation: matches compiler-assistance • Runtime overhead • Should be optimized for • This paper: SPEC INT 2006 TU Dresden, 07.05.14 Where have all the cycles gone? Folie 4 / 23
Experiment Setup L2 L1 Intel X5650 @ 2.66 GHz TU Dresden, 07.05.14 Where have all the cycles gone? Folie 5 / 23
Experiment Setup L L 3 3 12 GB RAM TU Dresden, 07.05.14 Where have all the cycles gone? Folie 6 / 23
ASTEROID – OS-Assisted Replication • L4/Fiasco.OC, 32 bit + Romain • SPEC INT 2006 L L 400.perl 3 3 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 12 GB RAM 458.sjeng 462.libquantum 464.h264 471.omnet++ 473.astar 478.xalancbmk TU Dresden, 07.05.14 Where have all the cycles gone? Folie 7 / 23
Engage! TU Dresden, 07.05.14 Where have all the cycles gone? Folie 8 / 23
TU Dresden, 07.05.14 Where have all the cycles gone? Folie 9 / 23
The Problem: CPU Assignment Native print() Execution Higher priority logger CPU0 App App Time TU Dresden, 07.05.14 Where have all the cycles gone? Folie 10 / 23
The Problem: CPU Assignment Native print() Execution Higher priority logger CPU0 App App Time print() Higher priority logger CPU0 CPU1 App App CPU2 App App ... CPU3 App App Replicated Execution TU Dresden, 07.05.14 Where have all the cycles gone? Folie 11 / 23
Where does overhead come from? Replica Replica ... Replica 1 exec. exec. Replica Replica ... Replica 2 exec. exec. Validate System Master States Call Sync time Notification time TU Dresden, 07.05.14 Where have all the cycles gone? Folie 12 / 23
Where does overhead come from? Source Overhead vs. native Per-replica execution unmodified +/- 0 Sync time No background load ~ 0 State comparison ~ 100 cycles System call Mostly unmodified ~ 0 Notifications Local core ~ 2,000 cycles On socket ~ 6,000 cycles Cross-socket ~ 14,300 cycles Rule: Prefer placing replicas on the same CPU socket TU Dresden, 07.05.14 Where have all the cycles gone? Folie 13 / 23
Replicating SPEC INT 2006 TU Dresden, 07.05.14 Where have all the cycles gone? Folie 14 / 23
Idea: Reduce Memory Management Overhead • Assumption: memory management Microbenchmark is expensive Native: 0.72 s 4 kB pages 1x: 0.80 s • Idea: reduce overhead by using x86 2x: 2.23 s huge pages (4 MB) 3x: 3.12 s • Works for microbenchmark Native: 0.38 s 4 MB pages 1x: 0.38 s • SPEC CPU: (nearly) no difference 2x: 0.53 s 3x: 0.91 s TU Dresden, 07.05.14 Where have all the cycles gone? Folie 15 / 23
Secondary Effects: Cache Miss Rates TMR misses Benchmark DMR misses L2 L3 L2 L3 429.mcf 2,600 1,300,000 11,000,000 5,200,000 462.libquantum 2,500 570 440,000 387,000 471.omnet++ 270,000 6,900,000 35,000,000 21,200,000 x 130 x 3 Rule: Prefer placing replicas on a different CPU socket TU Dresden, 07.05.14 Where have all the cycles gone? Folie 16 / 23
Secondary Effects: Cache Miss Rates TMR misses Benchmark DMR misses L2 L3 L2 L3 429.mcf 2,600 1,300,000 11,000,000 5,200,000 → 2,600 → 930,000 → 11,000,000 → 3,600,000 462.libquantum 2,500 570 → 323 440,000 387,000 → 2,500 → 385,000 → 8,700 471.omnet++ 270,000 → 6,900,000 35,000,000 21,200,000 290,000 → 5,500,000 → 34,900,000 → 16,400,000 TU Dresden, 07.05.14 Where have all the cycles gone? Folie 17 / 23
SPEC INT: Improved L3 Miss Rates TU Dresden, 07.05.14 Where have all the cycles gone? Folie 18 / 23
And now for something slightly different... Driver Enc. APP Driver APP Driver APP APP Proc. L4Re Romain Fiasco.OC Reliable Computing Base [2] [2] Engel, Döbel: The Reliable Computing Base – A new Paradigm ... , SOBRES 2012 TU Dresden, 07.05.14 Where have all the cycles gone? Folie 19 / 23
Protecting the RCB • We have full RCB source, so we can apply compiler- level techniques. • Encoded compiler anyone? → Approximate Overhead TU Dresden, 07.05.14 Where have all the cycles gone? Folie 20 / 23
Protecting the RCB: 2nd try t prot := t app + C * (t kern + t master + t kern ') + t hw t hw = t kern = t kern' = 0 t prot := t native + C * t Replicated TU Dresden, 07.05.14 Where have all the cycles gone? Folie 21 / 23
Protecting the RCB: 2nd try • Selected values for C [3]: C SWIFT = 1.09 C ANBD = 3.89 [3] Schiffel, Schmitt, Süßkraut, Fetzer: Software-Implemented Hardware Error Detection: Costs and Gains , DEPEND 2010 TU Dresden, 07.05.14 Where have all the cycles gone? Folie 22 / 23
Summary • Romain: <5% overhead for 3x replicating most of the SPEC INT 2006 benchmarks • Replica-core placement matters • RCB protection may add additional overheads Combination with compiler-level methods seems feasible TU Dresden, 07.05.14 Where have all the cycles gone? Folie 23 / 23
Recommend
More recommend