a culture of failure mathias meyer, @roidrage
travis-ci.org / travis-ci.com
failure
risk
28 january 1986
sts-51-l
73 seconds
there is no root cause "What you call root cause is simply the place where you stop looking any further" -- Sidney Dekker "Overt catastrophic failure occurs when small, apparently innocuous failures join to create opportunity for a systemic accident. Each of these small failures is necessary to cause catastrophe but only the combination is su ffj cient to permit failure." -- Richard Cook
normalization of deviance
practical drift
unknown unknowns
things we do not know we don't know
risks we do not know we don't know
28 november 2007
http://xrscorp.com/blog/industry-news/unsafe-driving-basic/
http://www.datacenterknowledge.com/archives/2012/01/06/rackspace-cloud-will-expand-in-dallas/
redundancy
redudant redundancy
optimize for mean time to recovery
risk
acceptable risk
safety workload $$
cross boundaries
http://www.flickr.com/photos/rossbelmont/8014054698/
verify extreme risks Bring down your site, kill your servers.
verify assumptions
http://www.bloomberg.com/news/2012-01-13/with-safety-alcoa-showed-mettle-part-3-commentary-by-bratton-and-tumin.html
learn from failure
transform your company
resilience
thank you
Recommend
More recommend