HYNEK SCHLAWACK SOLID SNAKES
ATTITUDE
INCENTIVES
IMPORTANT VS URGENT
SIMPLICITY THE PRICE OF RELIABILITY IS THE PURSUIT OF THE UTMOST SIMPLICITY. Sir C.A.R. Hoare
NORMAL ACCIDENTS
ESSENTIAL
ESSENTIAL VS ACCIDENTAL
OPERATIONAL COMPLEXITY
Client your DC App CDN Work Redis DB Queue Cache
Client your DC App CDN Work Redis DB Queue Cache
Client your DC App CDN Work Redis DB Queue Cache
MICROSERVICES
Service 7 Service 8 Service 6 Service 1 Service 5 Service 2 Service 4 Service 3
COMPLEXITY IS REALITY
PLAN FOR STUPIDITY
HUMAN ERRORS I DON’T BELIEVE IN HUMAN ERROR John Allspaw, CTO at Etsy
DATA VALIDATION
DATA VALIDATION AT EDGES
NORMALIZATION DATA VALIDATION AT EDGES
PLOT TWIST!
FAILURE IS INEVITABLE
RELIABILITY
RELIABILITY Twitter 2007
RELIABILITY Twitter 2007 NASA 1969
FAILURE IS INEVITABLE
FAILURE IS INEVITABLE ( ⌐■ _ ■ )
EXPECT
TIMEOUTS
CLOSED Local call() call() Circuit Remote Client Breaker API result result
CLOSED → OPEN Local call() call() Circuit Remote Client Breaker API timeout! timeout!
OPEN Local call() Circuit Remote Client Breaker API circuit open!
OPEN → HALF-CLOSED Local call() call() Circuit Remote Client Breaker API result result
REDUNDANCY
DOCS
DEAL WITH IT (¬ ∎ _ ∎ )
DON’T MAKE IT WORSE
RETRIES
BACKOFF
EXPONENTIAL BACKOFF
EXPONENTIAL BACKOFF WITH JITTER
Frontend 3x Backend
Frontend 3x Backend 9x 9x Internal Internal Backend Backend A B
Frontend 3x Backend 9x 9x Internal Internal Backend Backend A B 27x Internal Backend C
DON’T SWALLOW ERRORS
try : do_something() return True except Exception : return False
try : do_something() except Exception : raise AppException()
try : do_something() return True except Exception as e: raise AppException() from e
try : do_something() return True except Exception as e: raise AppException() from e AppException().__cause__ == e
DON’T TRY TOO HARD
sys.exit(1)
CRASH-ONLY
FAIL FAST FAIL LOUDLY
FOCUS ON RECOVERY
MTTR
ZERO EXPECTATIONS
FAULT TOLERANCE
FAULT TOLERANCE RECOVERY
OX.CX/SS @HYNEK VRMD.DE
Recommend
More recommend