  1. Artemis Distributed system Hun0ng for Dryad Bugs with Overview Artemis Logs System Architecture Data Data collec1on Collec1o n Database View GUI Plug‐ins GUI Plug‐ins Conclusion s

  Hun0ng for Bugs with Artemis Gabriela F. Cre ţ u-Ciocârlie Mihai Budiu Moises Goldszmidt Microsoft Research, Silicon Valley WASL 2008

  4. Artemis Goal One‐stop shop for performance analysis of distributed systems

  5. Principles 1) Modular : Separate generic from applica0on specific parts 2) Extensible: add new analyses via plug‐ins 3) Interac3ve: human expert part of the analysis loop

  7. Distributed system Distributed Logs Data collec1on Database Local View GUI Plug‐ins

  8. Distributed system Applica0on‐ Specific Logs Data collec1on Generic Database View GUI Plug‐ins

  10. Dryad Applica0on Structure Input Channels Stage Output files files sort grep awk sed perl sort grep awk sed grep sort Ver1ces

  11. Dryad System Architecture data plane job schedule V V V Serv Serv Serv control plane Job manager cluster

  13. Text Binary XML Perfmon Data Text Binary XML Perfmon Text Binary XML Perfmon 10GB‐1TB Copy DryadLINQ Persisted data applica1on Parse Filter Aggregate 100MB‐1GB

  16. Machine U0liza0on Plug‐in

  17. Complex sta0s0cs: HiLighter plug‐in Key Binary search over Performance logis0c regression with Indicator L1 regulariza0on Correlated Metrics metrics 22

  18. Interac0ve Analysis KPI Selec0on Feature Computa0on Visualiza0on Hilighter

  20. Conclusions Automa0c diagnosis Goal Sta0s0cal analyses Feature extrac0on Artemis today Summariza0on Raw data Distributed system

