illusions of certainty
play

Illusions of Certainty What the brain can teach us about software - PowerPoint PPT Presentation

Illusions of Certainty What the brain can teach us about software engineering Julie Pitt Co-founder, Order of Magnitude Labs @yakticus relevant links found here: github.com/yakticus/IllusionsOfCertainty For the things we have to learn


  1. Illusions of Certainty What the brain can teach us about software engineering Julie Pitt Co-founder, Order of Magnitude Labs @yakticus

  2. relevant links found here: github.com/yakticus/IllusionsOfCertainty

  3. “For the things we have to learn before we can do them, we learn by doing them.” ― Aristotle, The Nicomachean Ethics

  4. today we will discuss a BIG reason why software projects are unpredictable ➔ how to help computers better understand what we mean ➔ how to make our software systems more resilient ➔ how to better understand what our software systems are doing ➔

  5. life

  6. life: a generative model with an interface to the world senses the world generative model action

  7. survival

  8. working working not working

  9. software as a generative model input the world the code output

  10. software as a generative model input infinite precision the world the code output

  11. misjudging uncertainty in software perception reality

  12. human precision don’t hurt people be nice to people don’t kill humans don’t kill humans keep humans alive respect human life

  13. machine precision don’t hurt people be nice to people don’t kill humans keep humans alive respect human life

  14. machine precision don’t hurt people be nice to people don’t kill humans keep humans alive respect human life

  15. the cliffs of infinite precision the happy path utterly broken

  16. how do we get to this? optimal resilience degraded

  17. ways we can cheat ➔ property tests ➔ remedy-first design ➔ build intuitive insight

  18. property tests

  19. test suite as a generative model y system test suite under test x

  20. individual test cases are often too precise desired behavior tests (“training examples”) software system state space

  21. testing an addition function: F# example ✅ test passes state space credit: http://fsharpforfunandprofit.com/posts/property-based-testing/

  22. overfitting to tests bug state space credit: http://fsharpforfunandprofit.com/posts/property-based-testing/

  23. property tests combat overfitting bug state space credit: http://fsharpforfunandprofit.com/posts/property-based-testing/

  24. property tests: let’s review - test suites are generative models - describe the properties of your system - requires less precision - test the properties

  25. remedy-first design

  26. RESTful service input output { client falls off cliff “status”: “failure” “error”: { GET /api/metadata/12345 “errorCode”: 234 “description”: “database timeout” }

  27. each error has a precise cause endpoint moved read timeout connection pool failover exhausted endpoints expired connect timeout token expired user error key rotation account problem credentials insufficient revoked permissions

  28. remedies are imprecise endpoint moved read timeout REDIRECT connection pool failover RETRY exhausted endpoints expired connect timeout token expired user error RE-AUTHENT DISPLAY key rotation account problem ICATE ERROR credentials insufficient revoked permissions

  29. remedy tells the client how to ease pain {“status”: “failure” remedy “failure”: { (actionable) “action”: “RETRY” “error”: { “errorCode”: 234 “description”: “database timeout” } }

  30. What about failures that weren’t predicted?

  31. failure comes in many forms AWS outage - 2012/10/22 -> DNS change didn’t propagate -> indirectly triggered a latent memory leak -> insufficient alerting; failovers happened too little, too late -> API throttling affected some customers more than others -> many popular internet services down for hours

  32. failure comes in many forms AWS scheduled maintenance - 2014/09/25 -> time-sensitive security update on 10% of EC2 nodes -> required reboot of those nodes -> possible impact to customer applications running on those nodes

  33. failure comes in many forms AWS DynamoDB outage - 2015/09/20 -> DynamoDB failed in us-east-1 region -> dozens of dependent services also failed -> many prominent internet services were taken down for hours

  34. Netflix was prepared

  35. meet simian army - OSS project by Netflix - deliberately cause failures in a controlled manner - e.g., randomly takes down AWS ec2 nodes, datacenter, or region - validate whether the system handles failure

  36. simian army -> cultural change - failure is the norm - simulates the nature of failure and not the cause - we can’t predict all causes of failure

  37. remedy-first design: let’s review - design with remedies in mind - # remedies << # causes - test resilience during business hours - find out what you’re up against when wide awake - use a tool that is agnostic to causes - e.g., simian army

  38. intuitive feedback

  39. is it working?

  40. logs: easy to produce

  41. logs: hard to consume

  42. charts

  43. charts: easier to consume, but still hard

  44. we want the whole picture

  45. solution: leverage our intuition

  46. thought experiment What if your software system’s interactions sounded like cars on the road?

  47. intuitive feedback: let’s review - humans want to know “is it working”? - the tools of today inhibit us from seeing the big picture - we need tools that leverage our intuition - e.g., vizceral & TBD

  48. conclusion

  49. curiosity-driven tests senses system test agent under test (neural network) action

  50. mapping the state space through exploration begin testing random states without expectations state space

  51. mapping the state space through exploration gradually build a model containing expectations state space

  52. mapping the state space through exploration model capable of recognizing anomalies state space

  53. self-healing systems senses telemetry ops agent deployment, (neural network) scaling, failover, etc. action

  54. let’s review working working not working

  55. goal: change the landscape

  56. the end.

  57. links github.com/yakticus/IllusionsOfCertainty

Recommend


More recommend