presto summit nyc 2019
play

Presto Summit NYC 2019 December 11, 2019 Slack handles: @cheolsoo; - PowerPoint PPT Presentation

Presto Summit NYC 2019 December 11, 2019 Slack handles: @cheolsoo; @abhonsule slack-corp.com Mission Make peoples working lives simpler, more pleasant and more productive. Slack Data Engineering at Slack Custodian of all data generated


  1. Presto Summit NYC 2019 December 11, 2019 Slack handles: @cheolsoo; @abhonsule slack-corp.com

  2. Mission Make people’s working lives simpler, more pleasant and more productive.

  3. Slack

  4. Data Engineering at Slack Custodian of all data generated within Slack, the product. We provide the infrastructure and tooling necessary for stakeholders to reliably access product data for user facing features, product and business insights. 215B +270M 700B 250B Logs Daily Messages Daily Records Messages Table

  5. Presto at Slack Airflow Databooks DAGs running on ETL scheduling system Tool used by Analysts, Data scientists, Marketing, Sales, Finance Analytics AB Testing .ts framework Slack’s internal Slack’s AB testing/ analytics portal - Presto Experiments Product Managers, framework Engineers, Analysts, Data scientists, Sales, Marketing, Finance Sqooper BI portal Batch ingestion BI tool used by Corp/ system clog queries Biztech Query client logs

  6. Presto at Slack Past Present Future Presto on EMR Starburst on EC2 Federated clusters Single cluster Multiple clusters

  7. Query success rate

  8. Query count

  9. Multiple clusters ● Static load balancing ● Per cluster config properties ● Per cluster capacity planning

  10. Shadow clusters ● Read-only shadow cluster in parallel ● Useful for testing config changes or version upgrades

  11. Terraform module ● Provision a cluster with 25-lines of code ● ASG optionally with spot ● Dedicated HMS per cluster

  12. Resource groups ● Per cluster resource groups config ● Per cluster ● Per group resource groups scheduling policies ● Per group config scheduling policies ● Fair (ad-hoc) vs ● Fair (ad-hoc) vs weighted_fair (etl) weighted_fair (etl)

  13. JVM JMX exporter -javaagent:/usr/local/jmx_exporter/jmx_exporter.jar= 7071:/usr/local/jmx_exporter/exporter.yml Prometheus self.consul_job( 'presto', datacenters=[env + '-us-east-1-dw1'], services=['presto'] )

  14. Grafana dashboard

  15. Graceful decommission Autoscaling curl -XPUT localhost:8889/v1/info/state -d "SHUTTING_DOWN" -H "Content-type: application/json" Chef role "auto_scaling_group": { "prepare_for_termination_cmd": "<cmd>" }

  16. Federated clusters ● Dynamic load balancing ● High availability ● Minimize the impact of rogue queries

  17. Q&A Slack handles: @cheolsoo; @abhonsule slack-corp.com

Recommend


More recommend