from development to production many
play

From Development to Production: Many Uses of Serverless - PowerPoint PPT Presentation

From Development to Production: Many Uses of Serverless Observability EMRAH SAMDAN | SEPTEMBER 9, 2019 Community Day 2019 Sponsors Who am I? Developer for 6+ years Product manager for 2 years VP of Product for Thundra


  1. From Development to Production: Many Uses of Serverless Observability EMRAH SAMDAN | SEPTEMBER 9, 2019 Community Day 2019 Sponsors

  2. Who am I? ● Developer for 6+ years ● Product manager for 2 years ● VP of Product for Thundra ● Organizing committee ● Serverlessdays İstanbul On October 3rd! @emrahsamdan

  3. Agenda Let’s define serverless (yes once again!) ● Is observability a buzzword or a real thing? ● Observability challenges in serverless ● Observability Driven Development ● How/When to test serverless applications ● What to check to monitor serverless stack ● Troubleshooting serverless applications ● @emrahsamdan

  4. What’s serverless Serverless computing is a cloud-computing execution model in which the cloud provider runs the server, and dynamically manages the allocation of machine resources. Pricing is based on the actual amount of resources consumed by an application, rather than on pre-purchased units of capacity. Wikipedia: https://en.wikipedia.org/wiki/Serverless_computing @emrahsamdan

  5. What is serverless? Operational construct? Things that run perfectly and I don’t need to manage. Is Stripe, Auth0 serverless? @emrahsamdan

  6. @emrahsamdan

  7. What’s serverless? Utility computing I only pay per what I use. @emrahsamdan

  8. What’s serverless? A doctrine, a thought model helping you deliver faster and put your focus on the value you provide to your customers. Ben Kehoe Paul Johnston @emrahsamdan

  9. I agree you all. But! All the ups can go down when you don’t pay attention what’s really happening with serverless.

  10. Shared Responsibility Model Cloud vendor will handle scalability and reliability. But performance and security IS still ON US.

  11. Serverless Observability Serverless is full of hidden ● traps that can harm its promise. Can be very costly. ● Can perform really poor. ● You need to check what’s ● going on. @emrahsamdan

  12. What’s observability? https://medium.com/@copyconstruct/monitoring-and-observability-8417d1952e1c @emrahsamdan

  13. Are we ready for unknown unknowns? Known knowns Known unknowns Things that we are aware of but don’t Things that we understand and we are aware of. understand at a glance. Follow the metric charts. Yes, there is a peak over there. Let me dig the traces and logs. Unknown knowns Unknown unknowns Things that we can understand but we are Things we neither understand nor aware of. not aware of. I would have fixed this if I had that that That things are kaputt and I have no metric chart :( freaking idea with what I have. @emrahsamdan

  14. The pillars Alerts Machine Learning and Insights Visualization Traces Metrics Logs @emrahsamdan

  15. Observability Challenges in Serverless No access to underlying infrastructure ● You either take whatever cloud vendor provides or accept that there will ● be an overhead. Overhead? ● Gathering intelligence (should be acceptable) ○ Transporting it to where necessary (you only have invocation life time to take the data ○ out) Everything is event-driven and distributed. ● @emrahsamdan

  16. Fix the Charizard. If you can!

  17. Observability-Driven Development @emrahsamdan

  18. Observability-Driven Development Recall that unknown-unknowns. ● Known knowns Known unknowns What do you need to when you have to ● Things that we understand and Things that we are aware of troubleshoot a unknown-unknown? but don’t understand at a we are aware of. glance. Can an observability tool know your Follow the metric charts. ● Yes, there is a peak over unknowns? there. Let me dig the traces and logs. If you don’t know what to know what ● Unknown knowns Unknown unknowns can you do? Things that we can understand Things we neither understand but we are not aware of. nor aware of. I would have fixed this if I had that that metric chart :( That things are kaputt and I have no freaking idea with what I have. @emrahsamdan

  19. ASK Your observability tool should give you auto-replies. But it should also let you ask wise questions.

  20. Observability-Driven Development Not a replacement for test-driven development. ● Think of the answers that you can give for any type of question. ● If you are thinking about questions, request that feature from your tool. ● Structured logging and manual instrumentation is the key. ● @emrahsamdan Retrieved from: https://dzone.com/articles/what-is-structured-logging

  21. Observability-Driven Development (Cons) Observability coverage? ● Hard to accustom. ● You can’t sample a thing. ● @emrahsamdan

  22. TESTING

  23. Testing challenges in Serverless Local testing is a pain. ● How to mock the cloud resources. Is it actually correct to mock them? ○ How should you test the chain of many invocations? ○ How to integrate it with CI/CD tools? ○ Integration testing with real resources is still the best effort but again ● how? @emrahsamdan

  24. Integration testing Serverless != Functions ● Test your business logic against the ● resources. See how your messages being ● transformed in the flow. Async events can cause problems ● that you can never guess. @emrahsamdan

  25. Integration Testing (Cons) Still you’re dealing only with known -knowns. ● Resources that are not pay-per-use. ● Setting up a test environment. Still? ○ Not with the production data. ● @emrahsamdan

  26. Chaos Testing Serverless Applications Serverless fits the chaos engineering greatly because ● Distributed ○ Lots of possibilities of failures in async environment ○ Event-driven (So poisonous events) ○ Roles and permissions are so granular that access can slip away. ○ @emrahsamdan

  27. Chaos testing on serverless what? What would happen if inner ● Lambda starts to respond slow? Are you sure that you properly ● tuned timeouts? Test with injecting latency. ● @emrahsamdan

  28. Chaos testing on serverless what? What if we lose the connection ● to Redis? @emrahsamdan

  29. Chaos testing on serverless. How? ● https://github.com/adhorn/aws-lambda-chaos-injection ● https://github.com/gunnargrosch/ Adrian Hornsby Gunnar Grosch @emrahsamdan

  30. MONITORING How large should be my screen to see the charts for thousands of functions?

  31. How to discover an anomaly in serverless? @emrahsamdan

  32. @emrahsamdan

  33. Serverless is more than functions, so is monitoring. Issues can stay local before you notice them. ● It is slow. Why? ● API slowdown? ○ Throttle on any resource? ○ Bad coding practice? ○ Invocation counts go crazy. ● Seasonal peak? ○ Successful product? ○ Retry storms? ○ @emrahsamdan

  34. @emrahsamdan

  35. Monitoring Latency Abnormal latency is mostly not related ● with the function code. Idly waiting for a third party API. ○ Throttled resource ○ Set aggregated alerts ● Alert on transaction duration ○ Alert on function duration. ○ Alert on operation duration ○ @emrahsamdan

  36. Storm of retries and errors When your code fails for some reason, your function will retry several ● times. Sync events: You should control it. ○ Async events: Different retry mechanisms. ○ Stream based events : Risk of losing data. ○ Does this solve? ● Check ● Iterator age ○ Number of retries ○ Number of errors ○ Memory usage ○ Cold starts ○ @emrahsamdan

  37. TROUBLESHOOTING Bad things happen in serverless, too. Now, it’s time to battle!

  38. Failure modes of serverless ● https://github.com/adhorn/aws-lambda-chaos-injection ● https://github.com/gunnargrosch/ @emrahsamdan

  39. Failure modes of serverless Bad-tuned memory Timeout Error in code or in a managed resource @emrahsamdan

  40. Consequences Downtimes Huge Bills Unhappy customers @emrahsamdan

  41. Challenges of Troubleshooting How to trace the distributed async events with non-aggregated traces, metrics and logs? How to trace requests to external resources? How to trace the async distributed events? @emrahsamdan

  42. Distributed Tracing Trace the distributed ● transactions: chain of multiple invocations Understand what is ● wrong with a glance But?! What if the I ● have a bad coding practice in the code? @emrahsamdan

  43. Local Tracing Instrument the code itself ● and check against code quality. Good for discovering ● Bad coding practices ○ Value of local variables in the ○ code. Debugging the code without ○ breakpoints. @emrahsamdan

  44. Actionable Alerts in Serverless Alert on code errors ● Stacktrace ○ Code line it caused ○ Values of (Local variables) ○ Alert on latencies and timeout ● errors Slow API communications ○ Slow DB interaction for bad queries ○ @emrahsamdan

  45. How to respond to the issues on serverless Issue may not be your code. ● Check the third parties. ● Check other metrics ● Iterator age of streams ○ Throttles on resources ○ Have some runbooks ● Exponential backoffs to APIs, Alternative APIs ○ Healthy on-call structures ○ @emrahsamdan

Recommend


More recommend