Responding in a timely manner Martin Thompson - @mjpt777 Hard - PowerPoint PPT Presentation

Responding in a timely manner Martin Thompson - @mjpt777

Hard Real-time

Soft Real-time

Squidgy Real-time

The Unaware

1. How to Test and Measure 2. A little bit of Theory 3. A little bit of Practice 4. Common Pitfalls 5. Useful Algorithms and Techniques

Test & Measure

System Under Test

Distributed Load Generation Agents System Under Test

Distributed Load Observer Generation Agents System Under Test

Setup a continuous Pro Tip: performance testing environment

Pro Tip: Record Everything

Latency Histograms

Latency Histograms Mode

Latency Histograms Mode Median

Latency Histograms Mode Mean Median

System: 1000 TPS, mean RT 50µs

System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second?

System: 1000 TPS, mean RT 50µs What is the mean if you add in a 25ms GC pause per second? ~300µs

Forget averages, it’s all about percentiles

Coordinated Omission Source: Gil Tene (Azul Systems)

Pro Tip: Don’t deceive yourself

Theory

Queuing Theory 12.0 10.0 8.0 Response Time 6.0 4.0 2.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Utilisation

Queuing Theory Kendall Notation M/D/1

Queuing Theory r = s(2 – ρ ) / 2(1 – ρ ) r = mean response time s = service time ρ = utilisation

Queuing Theory r = s(2 – ρ ) / 2(1 – ρ ) r = mean response time s = service time ρ = utilisation Note: ρ = λ * (1 / s)

Queuing Theory 12.0 10.0 8.0 Response Time 6.0 4.0 2.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Utilisation

Ensure that you have Pro Tip: sufficient capacity

Queuing Theory Little’s Law: L = λ * W L = mean queue length λ = mean arrival rate W = mean time in system

Bound queues to meet Pro Tip: response time SLAs

Can we go parallel to speedup?

Amdahl’s Law time Sequential Process A B

Amdahl’s Law time Sequential Process A B Parallel Process A A B A A A

Amdahl’s Law time Sequential Process A B Parallel Process A A B A A A Parallel Process B A B B B B

Amdahl's Law

Universal Scalability Law C(N) = N / (1 + α (N – 1) + (( β * N) * (N – 1))) C = capacity or throughput N = number of processors α = contention penalty β = coherence penalty

Universal Scalability Law 20 18 16 14 12 Speedup 10 8 6 4 2 0 1 2 4 8 16 32 64 128 256 512 1024 Processors Amdahl USL

What about the service time?

Order of Algorithms

Practice

Pitfalls

Modern Processors SMIs? P & C States??? Hyperthreading?

Non-Uniform Memory Architecture (NUMA) ... ... Registers/Buffers C 1 C n C 1 C n <1ns ... ... L1 L1 L1 L1 ~4 cycles ~1ns ... ... L2 L2 L2 L2 ~12 cycles ~3ns P & C ~40 cycles ~15ns States??? L3 L3 ~60 cycles ~20ns (dirty hit) PCI-e 3 PCI-e 3 MC QPI QPI MC QPI ~40ns DRAM DRAM DRAM DRAM 40X 40X ~65ns IO IO DRAM DRAM DRAM DRAM * Assumption: 3GHz Processor

Virtual Memory Management Page Flushing & IO Scheduling Transparent Huge Pages Swap??? vm.min_free_kbytes

Safepoints in the JVM Garbage Collection, De-optimisation, Biased Locking, Stack traces, etc.

Virtualization System Calls

Notification public class SomethingUseful { // Lots of useful stuff public void handOffSomeWork() { // prepare for handoff synchronized (this) { someObject.notify(); } } }

Law of Leaky Abstractions “All non -trivial abstractions, to some extent, are leaky.” - Joel Spolsky

Law of Leaky Abstractions “The detail of underlying complexity cannot be ignored.”

Mechanical Sympathy

Responding in the presence of failure

Algorithms & Techniques

Clean Room Experiments • sufficient CPUs • intel_idle.max_cstate=0 • cpufreq • isocpus • numctl, cgroups, affinity • “Washed” SSDs • network buffer sizing • jHiccup • tune your stack! • Mechanical Sympathy

Profiling

Incorporate telemetry Pro Tip: and histograms

Smart Batching Latency Typical Possible Load

Smart Batching Producers

Smart Batching << Amortise Expensive Costs >> Batcher Producers

Amortise the Pro Tip: Expensive Costs

Applying Backpressure Gateway Services Threads Network Network Stack Stack Customers Transaction Service Threads Network Stack IO Storage

Non-Blocking Design “Get out of your own way!” • Don’t hog any resource • Always try to make progress • Enables Smart Batching

Beware of Pro Tip: hogging resources in synchronous designs

Lock-Free Concurrent Algorithms • Agree protocols of interaction • Don’t get a 3 rd party involved, i.e. the OS • Keep to user-space • Beat the “notify()” problem

Observable State Machines

Observable state Pro Tip: machines make monitoring easy

Cluster for Response and Resilience Sequencer Service A Service A

Cluster for Response and Resilience Sequencer Service A Service A Service N

Data Structures and O(?) Models Is there a world beyond maps and lists?

In closing…

The Internet of Things (IoT) “There will be X connected devices by 2020...” Where X is 20 to 75 Billion

If you cannot control arrival rates...

...you have to think hard about improving service times!

...and/or you have to think hard about removing all contention!

Questions? Blog: http://mechanical-sympathy.blogspot.com/ Twitter: @mjpt777 “ It does not matter how intelligent you are, if you guess and that guess cannot be backed up by experimental evidence – then it is still a guess.” - Richard Feynman

Responding in a timely manner Martin Thompson - @mjpt777 Hard - PowerPoint PPT Presentation

Responding in a timely manner Martin Thompson - @mjpt777 Hard Real-time Soft Real-time Squidgy Real-time The Unaware 1. How to Test and Measure 2. A little bit of Theory 3. A little bit of Practice 4. Common Pitfalls 5. Useful Algorithms and

Responding Responding to Armed to Armed Conflict Conflict ILO Crisis Response : Trainers

Responding to Natural Responding to Natural Disasters Disasters ILO Crisis Response : Trainers

Timely Common Knowledge Strategy Timely Characterising Asymmetric Distributed Coordination via

REFUGE CONTAINER FIRE PREVENTION PREVENTING PROTECTING RESPONDING [etc] PREVENTING PROTECTING

Responding to Responding to Financial and Financial and Economic Downturns Economic Downturns

Responding to Difficult Responding to Difficult Political and Political and Social Social

Ticket to Work Program Timely Progress Review Basics 1 Objectives Discuss Timely Progress

Timely Dataflow with Heterogeneous Systems eg timely + arrayfire = win? Nat McAleese

Web Push Notifications Whois They are for push Whois They are for push Timely

Responsiveness Human perception and expectations Importance of timely feedback Handling long

A low percentage of students reach their stated educational goals in a timely manner, or at

APPROPRIATE PATIENT CARE IN A TIMELY MANNER www.patienteer.co PROCESS CRAIG BURKE FORMER NURSE

Responding to Disasters in the Foreign Responding to Disasters in the Foreign Service Setting

Responding to Skyrocketing Cyber Attacks Managing Risk, Responding to Breaches and OCR

Naiad (Timely Dataflow) & Streaming Systems CS 848: Models and Applications of Distributed

Timely Advice Time Occurs 599 Times In 546 Verses Time Is Important ! In The

Quantifying Scalability with the USL Baron Schwartz DataEngConf NYC 2018 Introduction Ive

What do you mean, Congestion? some history Congestion Collapse

Queues with vacations and their applications Dieter Fiems and Herwig Bruneel SMACS Research

Introduction to Simulation of Telecommunication Networks 4hr-Seminar in the course Switching

Computing the Transient Behavior of an Overloaded Bipartite Queuing System via Parametric Cut S.

Queuing Analysis Gregory (Grisha) Chockler, Zinovi Rabinovich, Ittai Abraham Operating Systems

PAFFI: Performance Analysis Framework for Fog Infrastructures in realistic scenarios Claudia

Reference Tables on Probability Distributions and Statistics (1) Source: Arnold O. Allen,

Responding in a timely manner Martin Thompson - @mjpt777 Hard - PowerPoint PPT Presentation

Responding in a timely manner Martin Thompson - @mjpt777 Hard Real-time Soft Real-time Squidgy Real-time The Unaware 1. How to Test and Measure 2. A little bit of Theory 3. A little bit of Practice 4. Common Pitfalls 5. Useful Algorithms and

Responding Responding to Armed to Armed Conflict Conflict ILO Crisis Response : Trainers

Responding to Natural Responding to Natural Disasters Disasters ILO Crisis Response : Trainers

Timely Common Knowledge Strategy Timely Characterising Asymmetric Distributed Coordination via

REFUGE CONTAINER FIRE PREVENTION PREVENTING PROTECTING RESPONDING [etc] PREVENTING PROTECTING

Responding to Responding to Financial and Financial and Economic Downturns Economic Downturns

Responding to Difficult Responding to Difficult Political and Political and Social Social

Ticket to Work Program Timely Progress Review Basics 1 Objectives Discuss Timely Progress

Timely Dataflow with Heterogeneous Systems eg timely + arrayfire = win? Nat McAleese

Web Push Notifications Whois They are for push Whois They are for push Timely

Responsiveness Human perception and expectations Importance of timely feedback Handling long

A low percentage of students reach their stated educational goals in a timely manner, or at

APPROPRIATE PATIENT CARE IN A TIMELY MANNER www.patienteer.co PROCESS CRAIG BURKE FORMER NURSE

Responding to Disasters in the Foreign Responding to Disasters in the Foreign Service Setting

Responding to Skyrocketing Cyber Attacks Managing Risk, Responding to Breaches and OCR

Naiad (Timely Dataflow) &amp; Streaming Systems CS 848: Models and Applications of Distributed

Timely Advice Time Occurs 599 Times In 546 Verses Time Is Important ! In The

Quantifying Scalability with the USL Baron Schwartz DataEngConf NYC 2018 Introduction Ive

What do you mean, Congestion? some history Congestion Collapse

Queues with vacations and their applications Dieter Fiems and Herwig Bruneel SMACS Research

Introduction to Simulation of Telecommunication Networks 4hr-Seminar in the course Switching

Computing the Transient Behavior of an Overloaded Bipartite Queuing System via Parametric Cut S.

Queuing Analysis Gregory (Grisha) Chockler, Zinovi Rabinovich, Ittai Abraham Operating Systems

PAFFI: Performance Analysis Framework for Fog Infrastructures in realistic scenarios Claudia

Reference Tables on Probability Distributions and Statistics (1) Source: Arnold O. Allen,

Naiad (Timely Dataflow) & Streaming Systems CS 848: Models and Applications of Distributed