building resilience
play

Building resilience How outages shaped Etsys systems Act 1 Quick! - PowerPoint PPT Presentation

Building resilience How outages shaped Etsys systems Act 1 Quick! Be resilient! http://www.flickr.com/photos/niaid/11854196633/sizes/l/ Quick! Be resilient! Actually, its a slow process Iterative Introspective Horizontal


  1. Building resilience How outages shaped Etsy’s systems

  2. Act 1

  3. Quick! Be resilient! http://www.flickr.com/photos/niaid/11854196633/sizes/l/

  4. Quick! Be resilient! • Actually, it’s a slow process • Iterative • Introspective • Horizontal and vertical development

  5. Quick! Be resilient! http://www.flickr.com/photos/ogcodes/6091644301/sizes/l/

  6. Quick! Be resilient! http://www.flickr.com/photos/studio360/1150744342/sizes/o/

  7. Quick! Be resilient! http://www.flickr.com/photos/studio360/1150744368/sizes/o/

  8. Quick! Be resilient! http://www.flickr.com/photos/ogcodes/6091644301/sizes/l/

  9. Quick! Be resilient! Next generation Current generation

  10. Quick! Be resilient! http://www.flickr.com/photos/jurvetson/8671257096/

  11. Quick! Be resilient! http://cudebi.wordpress.com/2012/09/19/tah-pagh-tahbe-o-el-reconocimiento-de-william-shakespeare-en-el-universo-de-star-trek/

  12. Resilience Engineering http:/ /www.flickr.com/photos/freefoto/728651045/sizes/o/

  13. Resilience Engineering • “To Engineer is Human” 
 “To Forgive Design” 
 - Henry Petroski • “The Field Guide to Understanding Human Error” 
 “Just Culture” 
 - Sidney Dekker

  14. Act 2

  15. Building resilience at Etsy • Continuous deployment • Metrics, metrics, metrics • Peer review • Postmortems

  16. Building resilience at Etsy • Postmortems } • Continuous deployment • Metrics, metrics, metrics Culture • Peer review

  17. Postmortems Or: How to win at failing

  18. Constructive cultures • No blame • Open discussion • Focus on improvements

  19. Constructive cultures • Focus on improvements } • No blame Culture • Open discussion

  20. Destructive cultures “The nail that sticks up, 
 gets hammered down” –Japanese proverb

  21. The result?

  22. • #23: Fortune’s “Top 50 best small and medium businesses to work for” • Rapid code iterations and deploys • Lasting relationships • Generousity of spirit • …and much more

  23. Act 3

  24. Doing postmortems? Get Morgue http:/ /github.com/etsy/morgue

  25. Morgue

  26. Morgue

  27. Morgue

  28. Forkistan • Mean time to detect: 0 min • Mean time to recover: 10 mins

  29. Yo Dawg, I Heard You Like Errors.. • Mean time to detect: 2 mins • Mean time to recover: 15 mins

  30. Smashing INT for Fun and Profit • Mean time to detect: 0 min • Mean time to recover: 4 hrs 52 mins

  31. Apache Amnesia • Mean time to detect: 2 hours • Mean time to recover: 5 mins

  32. Continuously Upgrading Databases • Mean time to detect: 2 mins • Mean time to recover: 1 hour (but, not really..)

  33. Q & A Avleen Vig Sta ff Operations Engineer Etsy, Inc @avleen

Recommend


More recommend