Principled work fm ow-centric tracing of distributed systems Raja - PowerPoint PPT Presentation

Principled work fm ow-centric tracing of distributed systems Raja Sambasivan Ilari Shafer, Jonathan Mace, Ben Sigelman, Rodrigo Fonseca, Greg Ganger

Today’s distributed systems E.g.,Twitter Twitter “death star”: https://twitter.com/adrianco/status/441883572618948608 2

Today’s distributed systems Amazingly complex E.g., Net fm ix E.g.,Twitter insu ffj cient { GDB , gprof, Machine-centric tools strace, linux perf. counters Net fm ix “death star”: http://www.slideshare.net/adriancockcroft/fast-delivery-devops-israel 3

Work fm ow-centric tracing Provides the needed coherent view ! ! 17 µs 25 ms Get ! ! ! 27 ms ! Client Server App Server Table store Distributed FS ! Metadata (e.g., IDs) Trace point (e.g., at functions) 4

Stardust [SIGM’06] It is useful / being adopted Stardust ✚ [NSDI’11] X-Trace [NSDI’07] Category Management task X-Trace ✚ [WREN’10] ID anomalous work fm ows Diagnosis \ Retro [NSDI’15] ID work fm ows w/ PivotTrace [SOSP’15] steady-state problems Pip [NSDI’06] Pro fj ling \ Pinpoint [NSDI’04] Attribution Resource Mace [PLDI’07] mgmt. Performance tuning Dapper [TR10-14] \ HTrace Zipkin Dynamic monitoring Multiple UberTrace But, no clarity for tracing developers 5

But, no clarity for tracing developers Expectation Spectroscope Stardust Stardust Spectroscope Reality Stardust Stardust ✚ 6

We provide clarity for tracing developers } Task A Tracing infrastructure Task B ? Task C 1 2 3 4 6 5 Choices: Task D Methodology : Use experiences ID design Compare to to distill choices best existing design axes for di fg erent tasks infrastructures 7

Key results Di fg erent design decisions needed for 1 diagnosis and resource management Batching causes some design decisions across 2 some axes to interact poorly Existing tracing infrastructures suited to a 3 task make similar choices to our suggestions 8

Anatomy & design axes Causal relationships? How to de fj ne Management tasks a request? Conc./Sync. d needed? d n Trace construction n a Inter-request a b b - needed? f - o n - I t Trace storage How will u O trace points be added? ! ! In-band / Sample? out-of-band? What to use to App Server Table store File system reduce ovhd? Tracing infrastructure 9

How original Stardust de fj ned requests WRITE START Response time: ~20 ms 10 µs CACHE WRITE INSERT BLOCK } Unaccounted 20 ms latency 2 µs WRITE REPLY Trace not useful for diagnosis tasks 10

Two valid ways to de fj ne a request’s work fm ow WRITE START 10 µs CACHE WRITE 9 µs WRITE START INSERT BLOCK 10 µs 15 µs CACHE WRITE WRITE REPLY 9 µs EVICT BLOCK 5 µs ~20 ms DISK START 20,000 µs DISK END INSERT BLOCK Latent work 2 µs Resource management : Assign WRITE REPLY latent work to original submitter 11

Two valid ways to de fj ne a request’s work fm ow WRITE START 10 µs CACHE WRITE 9 µs WRITE INSERT BLOCK 10 µs 15 µs CACHE WRITE WRITE REPLY 5µs EVICT BLOCK 5 µs DISK START 20,000 µs DISK END 5µs INSERT BLOCK Latent work 2 µs Diagnosis : Assign latent work to WRITE REPLY request on whose critical path it is executed 12

Future research directions Reducing di ffj culty of adding trace points Lowering overhead when identifying anomalous work fm ows Exploring new analyses 13

Summary Key design choices dictate work fm ow-centric utility for di fg erent tasks We identify choices best suited for di fg erent tasks 14

Principled work fm ow-centric tracing of distributed systems Raja - PowerPoint PPT Presentation

Principled work fm ow-centric tracing of distributed systems Raja Sambasivan Ilari Shafer, Jonathan Mace, Ben Sigelman, Rodrigo Fonseca, Greg Ganger Todays distributed systems E.g.,Twitter Twitter death star:

Advanced Ray Tracing 1 2/8/2006 Distributed Ray Tracing Distributed ray tracing is an

Distributed Ray Tracing Sung-Eui Yoon ( ) Course URL:

Distributed Tracing Understand how your components work together About me Jos Carlos Chvez

Advanced Ray Tracing Stochastic ray tracing: distribute rays stochastically across pixel

Principled Schedulability Analysis for Distributed Storage Systems Using Thread Architecture

Distributed Systems By Aaron Stannard, CEO, Petabridge What Were Going to Cover

Introduction to Distributed Tracing Joe Elliott, Annanay Agarwal What are we doing here? -

Using Distributed Tracing to Resolve Performance Issues in Apache Ignite Greg Stachnick, Director

Enhancing End-to-End Tracing Systems for Automated Performance Debugging in Distributed Systems

MIT 6.837 - Ray Tracing Ray Tracing MIT EECS 6.837 Most slides are taken from Frdo Durand and

Conquering Microservices Complexity @Uber With Distributed Tracing Yuri Shkuro SOFTWARE ENGINEER

Traces Are the Fuel, Not the Car Making Distributed Tracing Valuable March 4, 2019 Ben Sigelman,

THE NETWORK, THE KINGMAKER DISTRIBUTED TRACING AND ZIPKIN ABOUT THE LAST PICKLE We help people

DISTRIBUTED TRACING WHO ARE WE? Frank Pfleger Lukasz Pielak @frankpfleger

Geo-indistinguishability: A Principled Approach to Location Privacy Kostas Chatzikokolakis CNRS,

Ray Tracing 1 Ray Tracing Ray Tracing kills two birds with one stone: Solves the Hidden

Data-centric Programming for Distributed Systems Chp2&3.2 by Peter Alvaro, 2015 presenter:

Introduction to Path Tracing Marc Sunet Table of contents From Ray Tracing to Path Tracing The

Computer Graphics - Ray-Tracing II - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing II

Ray-tracing Acceleration Motivation Distribution Ray Tracing Soft shadows

Welcome 1 Ray tracing part I of the course 2 Ray tracing part I of the course Why

Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning Tuo Zhao

System call tracing overhead Jrg Zinke Potsdam University Institute for Computer Science

CSE 681 Distributed Ray Tracing Shadows Assumption: The light source is a point

Principled work fm ow-centric tracing of distributed systems Raja - PowerPoint PPT Presentation

Principled work fm ow-centric tracing of distributed systems Raja Sambasivan Ilari Shafer, Jonathan Mace, Ben Sigelman, Rodrigo Fonseca, Greg Ganger Todays distributed systems E.g.,Twitter Twitter death star:

Advanced Ray Tracing 1 2/8/2006 Distributed Ray Tracing Distributed ray tracing is an

Distributed Ray Tracing Sung-Eui Yoon ( ) Course URL:

Distributed Tracing Understand how your components work together About me Jos Carlos Chvez

Advanced Ray Tracing Stochastic ray tracing: distribute rays stochastically across pixel

Principled Schedulability Analysis for Distributed Storage Systems Using Thread Architecture

Distributed Systems By Aaron Stannard, CEO, Petabridge What Were Going to Cover

Introduction to Distributed Tracing Joe Elliott, Annanay Agarwal What are we doing here? -

Using Distributed Tracing to Resolve Performance Issues in Apache Ignite Greg Stachnick, Director

Enhancing End-to-End Tracing Systems for Automated Performance Debugging in Distributed Systems

MIT 6.837 - Ray Tracing Ray Tracing MIT EECS 6.837 Most slides are taken from Frdo Durand and

Conquering Microservices Complexity @Uber With Distributed Tracing Yuri Shkuro SOFTWARE ENGINEER

Traces Are the Fuel, Not the Car Making Distributed Tracing Valuable March 4, 2019 Ben Sigelman,

THE NETWORK, THE KINGMAKER DISTRIBUTED TRACING AND ZIPKIN ABOUT THE LAST PICKLE We help people

DISTRIBUTED TRACING WHO ARE WE? Frank Pfleger Lukasz Pielak @frankpfleger

Geo-indistinguishability: A Principled Approach to Location Privacy Kostas Chatzikokolakis CNRS,

Ray Tracing 1 Ray Tracing Ray Tracing kills two birds with one stone: Solves the Hidden

Data-centric Programming for Distributed Systems Chp2&amp;3.2 by Peter Alvaro, 2015 presenter:

Introduction to Path Tracing Marc Sunet Table of contents From Ray Tracing to Path Tracing The

Computer Graphics - Ray-Tracing II - Hendrik Lensch Computer Graphics WS07/08 Ray Tracing II

Ray-tracing Acceleration Motivation Distribution Ray Tracing Soft shadows

Welcome 1 Ray tracing part I of the course 2 Ray tracing part I of the course Why

Towards Principled Methodologies and Efficient Algorithms for Minimax Machine Learning Tuo Zhao

System call tracing overhead Jrg Zinke Potsdam University Institute for Computer Science

CSE 681 Distributed Ray Tracing Shadows Assumption: The light source is a point

Data-centric Programming for Distributed Systems Chp2&3.2 by Peter Alvaro, 2015 presenter: