hadoop infrastructure uber past present and future
play

Hadoop Infrastructure @Uber Past , Present and Future Mayank - PowerPoint PPT Presentation

Hadoop Infrastructure @Uber Past , Present and Future Mayank Bansal U B E R | Data Ubers Mission Transporta=on as reliable as running water , everywhere, for everyone 75+ Countries 500+ Ci=es And growing U B E R | Data How


  1. Hadoop Infrastructure @Uber Past , Present and Future Mayank Bansal U B E R | Data

  2. Uber’s Mission “ Transporta=on as reliable as running water , everywhere, for everyone ” 75+ Countries 500+ Ci=es And growing… U B E R | Data

  3. How Uber works U B E R | Data

  4. How Uber works U B E R | Data

  5. How Uber works U B E R | Data

  6. Data Driven Decisions U B E R | Data

  7. Data Infra Once Upon a 8me.. (2014) Applica=ons ETL EMR S3 Business Ops Kafka Logs A/B Experiments … Adhoc Analytics Vertica Key-Val DB Data Warehouse City Ops Data Science RDBMS DBs U B E R | Data

  8. Data Infrastructure Today Service Accounts ETL Machine Learning Kafka8 Logs Experimenta=on HDFS Data Science … Spark| Presto Hive Adhoc Analytics Schemaless DB Ops/Data Science City Ops Data Science SOA DBs U B E R | Data

  9. Few Takeaways … ● Strict Schema Management ○ Because our largest data audience are SQL Savvy! (1000s of Uber Ops!) ○ SQL = Strict Schema ● Big Data Processing Tools Unlocked - Hive, Presto and Spark ○ Migrate SQL savvy users from Ver=ca to Hive & Presto (1000s of Ops & 100s of data scien=sts & analysts) ○ Spark for more advanced users - 100s of data scien=sts

  10. Hadoop Evolu8on @ ebay Hadoop Evolu8on @ Uber 2016 2015 90X Nodes 40X PB Data 2014 10X Nodes 4X PB Data 1X Nodes 1X PB 3000+ node 30,000+ cores 50+ PB U B E R | Data

  11. Hadoop Cluster U=liza=on • Over provisioning for the peak loads. • Over capacity for an=cipa=on of future growth U B E R | Data

  12. Hadoop Evolu8on @ ebay Mesos Evolu8on @ Uber 2016 2015 300X Nodes X Nodes 2014 0 Nodes U B E R | Data

  13. Mesos Cluster U=liza=on • Over provisioning for the peak loads • Over capacity for an=cipa=on of future growth U B E R | Data

  14. End Goal Online Presto U B E R | Data

  15. What we need ? GLOBAL VIEW OF RESOURCES U B E R | Data

  16. Available Resource Managers U B E R | Data

  17. Mesos vs YARN Scales Beger Similar Isola=on YARN MESOS Single Level Scheduler Two Level Scheduler Disk is Use C groups for isola=on Use C groups for Isola=on beger CPU, Memory as a resource CPU, Memory and Disk as a resource Works well with Hadoop work loads Works well with longer running services YARN support =me based Mesos does not have support of reserva=ons reserva=ons Dominant resource scheduling Scheduling is done by frameworks and depends on case to case basis This is Important Beger for batch Imp for batch SLA’s U B E R | Data

  18. Let’s 8ed them together In a Nutshell YARN is good for Hadoop Mesos is good for Longer Running Services U B E R | Data

  19. U B E R | Data

  20. • Myriad is Mesos Framework for Apache YARN • Mesos manages Data Center resources • YARN manages Hadoop workloads • Myriad • Gets resources from Mesos • Launches Node Managers U B E R | Data

  21. Myriad’s Limita8ons Sta=c Resource Par==oning • YARN will handle resources handed over to it. • Mesos will work on rest of the resources U B E R | Data

  22. Myriad’s Limita8ons Resource Over Subscrip=on • YARN will never be able to do over subscrip=on. • Node Manager will go away • Fragmenta=on of resources • Mesos over subscrip=on can kill YARN too U B E R | Data

  23. Myriad’s Limita8ons • No Global Quota Enforcement • No Global Priori=es U B E R | Data

  24. Myriad’s Limita8ons • Elas=c Resource Management • Bin Packing • Stability • Long List … U B E R | Data

  25. Unified Scheduler U B E R | Data

  26. High Level Characteris8cs • Global Quota Management • Central Scheduling policies • Over subscrip=on for both Online and Batch • Isola=on and bin packing • SLA guarantees at Global Level U B E R | Data

  27. Unified Scheduler U B E R | Data

  28. Few Takeaways … • We need one scheduling layer across all workloads • Par==oning resources are not good • At least can save 30% resources • Stability and simplicity wins in Produc=on • Mul= Level of resource Management and scheduling will not be scalable U B E R | Data

  29. U B E R | Data

  30. Ques=ons? mabansal@uber.com mayank@apache.org U B E R | Data

  31. Thank You !!! U B E R | Data

Recommend


More recommend