The 1 Year and 1 hour Capacity Plan in the Drupal World
About me ● Principal SRE @Acquia (Cloud Data Team) ● Joined in December 2011 ● Location: Lisbon, Portugal ● Co-authored Seeking SRE w/ Machine Learning for SRE (O’Reilly) ● Founder and Lead of the Portuguese Drupal Association ● Fun Facts: ○ Presented in DevOps events including DrupalCons. ○ Dedicated father of 2 kids and still manages to study and write. ○ First Linux installation: Slackware in 1994. ○ Former theatre actor. The problem Agenda What is Capacity Why do Capacity Planning Relation to Site Reliability Engineering Budget & Capacity Planning Load Testing Performance Tuning vs. Capacity Planning What to measure How to measure How to track capacity Forecasting First Easy Steps Conclusions
The Problem Site Launch & User Expectations Falcon Heavy launch, Spacex Typical Drupal Site Launch Disable devel Check The Upload Sizes & Execution Time - Configure cron Check Recipient Email Addresses - - Set The File Permissions - Pro-tect Your Root Account - Check Per-mis-sions - Turn Off Error Reporting - Han-dle 404 Errors Gracefully What about - Com-bine Pathauto With Global Redirect Check Robots.txt - - Cre-ate A Main-te-nance Page Capacity Planning?? - Con-fig-ure Caching - Css And Javascript Optimisation Check Unpub-lished Con-tent Is Not Visible - - Con-fig-ure Statistics - Monitor the Site - ** Plan for Failure ** - -
User Expectations ● The end goal of capacity planning is a smooth and speedy experience for the users Drupal click screenshot ● Varies depending on what type of application is and what portion of the application they interact with No silver bullet ● Plenty of capacity but a slow website or unavailable ● Capacity is only one part of making the end-user experience fast ● We want to measure and track to make forecasts ● Intolerable amount of latency should raise a flag
What is Capacity resources required to run your services in the context you have chosen to run them Carbon Fiber Tank, SpaceX Capacity in Site Reliability Engineering (SRE) ● Capacity: The maximum amount of output a product deployment is capable of completing in a given period of time ● Capacity planning: Process that determines the resources needed, like people, instances, CPU, memory, time and more, for the company to meet changing demands for its services ● In the Drupal World we focus mostly on serving WEB capacity
Resource management ● Ensure proper resources are available to handle load ● Define procurement and an approval process ● Justify capital needs ● Manage resources after deployment The Art of Capacity Planning Arun Kejariwal, John Allspaw "O'Reilly Media, Inc." Why do Capacity Planning Kroger grocery store, Lexington Kentucky, 1947, by Brett Streutket
Quick and Dirty Math Stay Fast and Reliable ● Only spend as much as you actually need ● Be ahead of sharp growth ● Avoid emergencies Site Reliability Engineering Rocket Laboratory, 1952 NASA/William A. Bowles
“ ...an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, “ emergency response, and capacity planning of their service(s)... Ben Treynor - Google Demand Forecasting and Capacity Planning ● Ensuring that there is sufficient capacity and redundancy ● Serve projected future demand with the required availability ● Ensure the required capacity is https://unsplash.com/photos/mexeVPlTB6k in place by the time it is needed ● Take both organic and inorganic growth into account
How SRE advocates for Capacity Planning ● Perform regular load testing ● Incorporate SLOs on Capacity ● Capacity is critical to https://unsplash.com/photos/DX9X0g0Cg88 availability, therefore the SRE team leads capacity planning initiatives and provisioning Budget & Capacity Planning Vintage Grow Your Money by Chris Potter, ccPixs.com
Keeping the costs low ● Meet with Finance, Engineering and Product Product ● Gather Systems and Application Plan metrics Engineering Finance ● Use that data to justify the investment Three forces that impact Capacity Planning Load Testing “Hope is not a strategy” St. Margrethen - Load Test by Kecko
Load testing a Drupal stack ● How to load test? “Hit it until it breaks” ● Include the points of failure in the calculations ● Determining backend limits can be tricky ● Use those resource ceilings as a basis while predicting future growth https://docs.acquia.com/acquia-cloud/arch/ A Few Load testing Tools simulate collect ● Loadrunner ● Prometheus ○ http://bit.ly/microfocus-loadrunner ○ http://www.prometheus.io/ ● Iago ● Signalfx ○ https://github.com/twitter/iago ○ http://www.signalfx.com/ ● JMeter ● Cacti ○ http://jmeter.apache.org/ ○ http://cacti.net ● Ganglia ○ http://ganglia.info https://www.gocomics.com/calvinandhobbes/1986/11/26 ● Nagios ○ http://nagios.org/
Performance Tuning vs. Capacity planning (different goals) Top Speed by Alexander Nie What to measure defining the metrics End-of-life by Dennis van Zuijlekom
Divide & Conquer ● Splitting nodes ● Understand capacity demands of each node ● Measure more distinctly ● How requests or queries per second affect resources Identifying the key resources to measure ● Disk space (MB) ● Disk throughput (IOPS) ● CPU performance (FLOPS) ● RAM memory (MB) ● Network bandwidth (Mbps) ● Network IP pool (Netmask) ● Others
How to measure Living Computer Museum, Seattle | Tools to measure on Linux servers | http://www.brendangregg.com/Perf/linux_perf_tools_full.png
Collecting resources on web servers ● Example script that sends metrics to statsd ● Low footprint using /proc , df and ps TODO: CODE ● For a constant reliable monitoring service use collectd : https://collectd.org or Telegraf : https://www.influxdata.com/time- series-platform/telegraf/ How to track Capacity
Store and display time-series ● Signalfx ● CoScale ● Cacti ● Riemann ● Ganglia ● Prometheus ● Graphite ● Sensu ● Signalfx ● Idera ● Datadog ● Bijk ● Ruxit ● X-Pack ● LogicMonitor ● vRealize Hyperic HQ ● Sematext A couple of load testing tips load testing Tutorials: https://www.tutorialspoint.com/jmeter https://www.blazemeter.com/load-testing docker app for grafana: https://github.com/kamon-io/docker-grafana-graphite
Forecasting (predicting trends) Numbers And Finance by SeniorLiving.org Predict the future? ● Use Context & Math ● Make educated guesses ● Long-term view is generally steady ● Generate estimates to sustain growth ● Use an adjustable process ● Forecast guides autoscaling policies
Ceilings and Historical data ● Daily storage consumption example ● Metric: total available disk space ● Cumulative total provides an historical perspective ● We can predict future needs ● Storage will probably be exhausted in the ceiling to where the line is headed Curve fitting ● Curve fitting ● Creative & Scientific y = mx+b ● Stay ahead of growth ● Use time-series data ● Forecast by constructing new data points beyond the known ● Reconciliation of what we know and the best fit equation ● Consider context before math
Forecasting Peak-Driven Resource Usage ● Track how the peaks change over time ● Extrapolate from that data to predict future needs ● Identify the server resource ceilings ● Find a relation between resources and application-level work ● Decide if we should scale vertically or horizontally ● and perform proactive autoscalling Automating Forecasts with fityk & cfityk ● Fityk is an Open Source Software for nonlinear fitting of analytical functions to data. ● Incorporate cfityk scripts into automated curve fitting, like: cfityk ricardo-disk.fit cfityk ricardo-disk.fit @0 < ricardo-disk.csv @0 < ricardo-disk.csv guess Quadratic guess Quadratic fit fit info formula info formula quit quit Returns the formula: 4888.18 + 363.063 * x + 8.91132 + -1.55119*x + 0.0660771*x^2 Homepage: https://fityk.nieto.pl/ Small demo: https://youtube.com/watch?v=EZnyq1Hr_7I
Forecasting with Machine Learning ● Most popular method for curve-fitting in fityk is Levenberg-Marquardt ● ML is also an option for forecasting (book I co-authored) Seeking SRE Conversations About ● Code examples and guides Running Production Systems at Scale Publisher: O'Reilly Media https://github.com/ricardoamaro/MachineLearning4SRE Start with Easy Steps
Get Started 1. Select a process owner. 2. Identify the resources to be measured. 3. Measure these resources. 4. Compare to maximum capacity. 5. Collect workload forecasts. 6. Use forecasts for IT resource requirements. 7. Map requirements onto existing utilizations. 8. Predict when the system will be out of capacity. 9. Update forecasts and utilizations. Set a Goal! ● Two Classes: ○ Load : usually expressed in arrival rate or peak rate of requests hitting the service eg. target for 10.000 authenticated concurrent Drupal users ○ Performance : usually expressed in the form of Service Level Objectives eg. 99th percentile of all requests should return in less 500ms
Recommend
More recommend