lessons learned from reviewing 150 infrastructures
play

LESSONS LEARNED FROM REVIEWING 150 INFRASTRUCTURES_ JON TOPPER | - PowerPoint PPT Presentation

LESSONS LEARNED FROM REVIEWING 150 INFRASTRUCTURES_ JON TOPPER | @jtopper | he/him/his $ whoami Founder/CEO/CTO The Scale Factory Working in hosting/infrastructure for 20 years Infrastructure / AWS / DevOps @jtopper @jtopper REVIEWS RUN _


  1. LESSONS LEARNED FROM REVIEWING 150 INFRASTRUCTURES_ JON TOPPER | @jtopper | he/him/his

  2. $ whoami Founder/CEO/CTO The Scale Factory Working in hosting/infrastructure for 20 years Infrastructure / AWS / DevOps @jtopper

  3. @jtopper

  4. REVIEWS RUN _ 180 135 90 45 0 Mar-2018 May-2018 Jul-2018 Sep-2018 Nov-2018 Jan-2019 Mar-2019 May-2019 Jul-2019 Sep-2019 Nov-2019 Jan-2020 @jtopper

  5. TODAY’S AGENDA_ What is Well-Architected? What is a Well-Architected Review? Common Review Findings @jtopper

  6. WHAT IS WELL-ARCHITECTED?_ @jtopper

  7. Catalogue of emergent good practices WELL Observed by AWS Field Solutions Architects ARCHITECTED Codified and shared ORIGINS _ Platform agnostic* @jtopper

  8. ������������������������������ ��������� ������������������������������������������������������������������������������������������� ����������������������������������������������������������������������������������������������� ����������������������������������������������������������������������������������������������������� �������������������������������������� ������� ����������������������������������� White Papers Review Tool @jtopper

  9. Operational Performance Cost Excellence Security Reliability Efficiency Optimisation @jtopper

  10. Lenses High Serverless Performance IoT Applications Computing (Internet of Things) @jtopper

  11. Gap analysis / planning USING Teaching WELL-ARCHITECTED _ Team alignment @jtopper

  12. WHAT IS A WELL-ARCHITECTED REVIEW?_ @jtopper

  13. WELL Foundational questions ARCHITECTED Up to 4 hours REVIEW _ Qualitative @jtopper

  14. Operational Performance Cost Excellence Security Reliability Efficiency Optimisation Well Architected 11 8 9 46 9 9 Core Serverless 1 1 3 2 2 9 Applications High Performance 2 4 3 3 4 16 Computing IoT 10 11 35 4 6 4 (Internet of Things) @jtopper

  15. How do you determine what your priorities are? QUESTION • Evaluate external customer needs OPS 1_ • Evaluate internal customer needs • Evaluate compliance requirements • Evaluate threat landscape • Evaluate tradeoffs • Manage benefits and risks • None of these @jtopper

  16. How do you determine what your priorities are? QUESTION • Evaluate external customer needs OPS 1_ WA • Evaluate internal customer needs WA • Evaluate compliance requirements WA • Evaluate threat landscape NI • Evaluate tradeoffs NI • Manage benefits and risks NI • None of these CI @jtopper

  17. How do you determine what your priorities are? QUESTION • Evaluate external customer needs OPS 1_ WA • Evaluate internal customer needs WA • Evaluate compliance requirements WA High Risk • Evaluate threat landscape NI • Evaluate tradeoffs NI • Manage benefits and risks NI • None of these CI @jtopper

  18. How do you determine what your priorities are? QUESTION • Evaluate external customer needs OPS 1_ WA • Evaluate internal customer needs WA • Evaluate compliance requirements WA Medium Risk • Evaluate threat landscape NI • Evaluate tradeoffs NI • Manage benefits and risks NI • None of these CI @jtopper

  19. How do you determine what your priorities are? QUESTION • Evaluate external customer needs OPS 1_ WA • Evaluate internal customer needs WA • Evaluate compliance requirements WA Medium Risk • Evaluate threat landscape NI • Evaluate tradeoffs NI • Manage benefits and risks NI • None of these CI @jtopper

  20. How do you determine what your priorities are? QUESTION • Evaluate external customer needs OPS 1_ WA • Evaluate internal customer needs WA • Evaluate compliance requirements WA Well Architected • Evaluate threat landscape NI • Evaluate tradeoffs NI • Manage benefits and risks NI • None of these CI @jtopper

  21. COMMON REVIEW FINDINGS_ @jtopper

  22. THE GOOD_ @jtopper

  23. How do you determine what your priorities are? QUESTION • Evaluate external customer needs 93% OPS 1_ WA • Evaluate internal customer needs 87% WA • Evaluate compliance requirements 90% Well Architected WA 77% • Evaluate threat landscape 85% NI • Evaluate tradeoffs 89% NI WA Rank: 1 • Manage benefits and risks 89% NI • None of these 0% CI @jtopper

  24. How do you select your storage solution? QUESTION PERF 3_ • Understand storage characteristics and 84% WA requirements Well Architected • Evaluate available configuration options 78% NI 70% • Make decisions based on access 73% NI WA Rank: 2 patterns and metrics • None of these 5% CI @jtopper

  25. How do you implement change? QUESTION REL 5_ • Deploy changes in a planned manner 83% Well Architected WA 63% • Deploy changes with automation 67% NI • None of these 6% CI WA Rank: 3 @jtopper

  26. THE BAD_ @jtopper

  27. How do you plan for disaster recovery? QUESTION • Define recovery objectives for downtime 33% WA and data loss REL 9_ • Use defined recovery strategies to meet 33% WA the recovery objectives High Risk • Test disaster recovery implementation to 79% 25% WA validate the implementation (87%) • Manage configuration drift on all HRI Rank: 1 39% NI changes • Automate recovery 16% NI • None of these 31% CI @jtopper

  28. How do you respond to a [security] incident? • Identify key personnel and external 51% QUESTION WA resources SEC 11_ 27% • Identify tooling WA 39% • Develop incident response plans WA High Risk 0% • Automate containment capability NI 75% 11% • Identify forensic capabilities NI (93%) HRI Rank: 2 27% • Pre-provision access NI 10% • Pre-deploy tools NI 3% • Run game days NI 35% • None of these CI @jtopper

  29. How do you classify your data? QUESTION SEC 8_ • Define data classification requirements 61% WA • Define data protection controls 39% WA High Risk • Implement data identification 17% WA 75% • Automate identification and classification (88%) 4% NI HRI Rank: 3 • Identify the types of data 59% NI • None of these 23% CI @jtopper

  30. How do you evaluate new services? QUESTION • Establish a cost optimisation function 34% WA COST 9_ • Develop a workload review process 26% WA • Review and implement services in an 84% NI High Risk unplanned way 71% 43% • Review and analyse this workload NI (79%) regularly HRI Rank: 4 • Keep up to date with new service 63% NI releases 1% CI • None of these @jtopper

  31. How do you test resilience? QUESTION • Use playbooks for unanticipated failures 25% WA REL 8_ • Conduct root cause analysis and share 73% WA results High Risk • Inject failures to test resiliency 6% NI 67% • Conduct game days regularly 0% NI (92%) HRI Rank: 5 • None of these 16% CI @jtopper

  32. THE NOTABLE_ @jtopper

  33. How do you reduce defects, ease remediation, and improve flow into production? 90% QUESTION Use version control • WA 87% Test and validate changes • WA OPS 3_ 78% Use config management systems • NI 82% Use build/deploy systems • NI Well Architected 37% Perform patch management • NI 14% 57% Share design standards • NI 83% Implement practices to improve code quality • NI WA Rank: 23 81% Use multiple environments • NI 63% Make frequent, small, reversible changes • NI 52% Fully automate integration and deployment • NI 3% None of these • CI @jtopper

  34. How do you understand the health of your workload? QUESTION Identify key performance indicators 53% • WA OPS 6_ Define workload metrics 62% • WA Collect and analyse workload metrics 72% • WA Establish workload metric baselines 51% NI • Well Architected 46% Learn expected patterns of activity for workload 54% NI • Alert when workload outcomes are at risk 40% • NI WA Rank: 21 Alert when workload anomalies are detected 34% • NI Validate the achievement of outcomes and the 37% • NI effectiveness of KPIs and metrics 14% None of these CI • @jtopper

  35. How do you control human access? QUESTION • Define human access requirements 70% SEC 2_ WA • Grant least privileges 58% WA • Allocate unique credentials per person 90% High Risk WA 47% • Manage credentials based on lifecycle 70% NI (88%) • Automate credential management 13% NI HRI Rank: 20 • Grant access through roles or federation 62% NI • None of these 3% CI @jtopper

Recommend


More recommend