Hadoop Infrastructure @Uber Past , Present and Future Mayank - PowerPoint PPT Presentation

Hadoop Infrastructure @Uber Past , Present and Future Mayank Bansal U B E R | Data

Uber’s Mission “ Transporta=on as reliable as running water , everywhere, for everyone ” 75+ Countries 500+ Ci=es And growing… U B E R | Data

How Uber works U B E R | Data

Data Driven Decisions U B E R | Data

Data Infra Once Upon a 8me.. (2014) Applica=ons ETL EMR S3 Business Ops Kafka Logs A/B Experiments … Adhoc Analytics Vertica Key-Val DB Data Warehouse City Ops Data Science RDBMS DBs U B E R | Data

Data Infrastructure Today Service Accounts ETL Machine Learning Kafka8 Logs Experimenta=on HDFS Data Science … Spark| Presto Hive Adhoc Analytics Schemaless DB Ops/Data Science City Ops Data Science SOA DBs U B E R | Data

Few Takeaways … ● Strict Schema Management ○ Because our largest data audience are SQL Savvy! (1000s of Uber Ops!) ○ SQL = Strict Schema ● Big Data Processing Tools Unlocked - Hive, Presto and Spark ○ Migrate SQL savvy users from Ver=ca to Hive & Presto (1000s of Ops & 100s of data scien=sts & analysts) ○ Spark for more advanced users - 100s of data scien=sts

Hadoop Evolu8on @ ebay Hadoop Evolu8on @ Uber 2016 2015 90X Nodes 40X PB Data 2014 10X Nodes 4X PB Data 1X Nodes 1X PB 3000+ node 30,000+ cores 50+ PB U B E R | Data

Hadoop Cluster U=liza=on • Over provisioning for the peak loads. • Over capacity for an=cipa=on of future growth U B E R | Data

Hadoop Evolu8on @ ebay Mesos Evolu8on @ Uber 2016 2015 300X Nodes X Nodes 2014 0 Nodes U B E R | Data

Mesos Cluster U=liza=on • Over provisioning for the peak loads • Over capacity for an=cipa=on of future growth U B E R | Data

End Goal Online Presto U B E R | Data

What we need ? GLOBAL VIEW OF RESOURCES U B E R | Data

Available Resource Managers U B E R | Data

Mesos vs YARN Scales Beger Similar Isola=on YARN MESOS Single Level Scheduler Two Level Scheduler Disk is Use C groups for isola=on Use C groups for Isola=on beger CPU, Memory as a resource CPU, Memory and Disk as a resource Works well with Hadoop work loads Works well with longer running services YARN support =me based Mesos does not have support of reserva=ons reserva=ons Dominant resource scheduling Scheduling is done by frameworks and depends on case to case basis This is Important Beger for batch Imp for batch SLA’s U B E R | Data

Let’s 8ed them together In a Nutshell YARN is good for Hadoop Mesos is good for Longer Running Services U B E R | Data

U B E R | Data

• Myriad is Mesos Framework for Apache YARN • Mesos manages Data Center resources • YARN manages Hadoop workloads • Myriad • Gets resources from Mesos • Launches Node Managers U B E R | Data

Myriad’s Limita8ons Sta=c Resource Par==oning • YARN will handle resources handed over to it. • Mesos will work on rest of the resources U B E R | Data

Myriad’s Limita8ons Resource Over Subscrip=on • YARN will never be able to do over subscrip=on. • Node Manager will go away • Fragmenta=on of resources • Mesos over subscrip=on can kill YARN too U B E R | Data

Myriad’s Limita8ons • No Global Quota Enforcement • No Global Priori=es U B E R | Data

Myriad’s Limita8ons • Elas=c Resource Management • Bin Packing • Stability • Long List … U B E R | Data

Unified Scheduler U B E R | Data

High Level Characteris8cs • Global Quota Management • Central Scheduling policies • Over subscrip=on for both Online and Batch • Isola=on and bin packing • SLA guarantees at Global Level U B E R | Data

Unified Scheduler U B E R | Data

Few Takeaways … • We need one scheduling layer across all workloads • Par==oning resources are not good • At least can save 30% resources • Stability and simplicity wins in Produc=on • Mul= Level of resource Management and scheduling will not be scalable U B E R | Data

U B E R | Data

Ques=ons? mabansal@uber.com mayank@apache.org U B E R | Data

Thank You !!! U B E R | Data

Hadoop Infrastructure @Uber Past , Present and Future Mayank - PowerPoint PPT Presentation

Hadoop Infrastructure @Uber Past , Present and Future Mayank Bansal U B E R | Data Ubers Mission Transporta=on as reliable as running water , everywhere, for everyone 75+ Countries 500+ Ci=es And growing U B E R | Data How

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop?

Apache Hadoop Ingestion & Dispersal Framework Danny Chen dannyc@uber.com, Omkar Joshi

Time Predictions in Uber Eats Zi Wang@Uber QCon New York 2019 June 2019 Agenda 1. ML in Uber

Peeking Beneath the Hood of Uber Le Chen, Alan Mislove, Christo Wilson Northeastern University

STREAM PROCESSING @ UBER DANNY YUAN @ UBER What is Uber Transportation at your fingertips

Plug and Play Language Model : A Simple Baseline for Controlled Language Generation ICLR20

The Architecture of Uber's Realtime System March 25, 2015 Amos Barreto Danny Yuan

Tracing polyglot systems An OpenTracing Tutorial Yuri Shkuro (Uber), Won Jun Jang (Uber),

Uber & MADD Franchesca Cassanese Victoria Walker Natalia Colon Lee Andrews Uber &

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow,

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools

The Past, Present, and Future of the R Project Kurt Hornik Kurt Hornik useR! 2008 The Past,

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part

STATEWIDE TRUCK PARKING STUDY Great American Trucking Show August 2019 Texas Freight Mobility

Procurement and Business Aspects of the Cloud Belnet Mario Vandaele Brussels 19th

AerCap Holdings N.V. Aengus Kelly, CEO January 2017 Industry Update Looking Back PASSENGER

SPOTLYTICS: HOW TO USE CLOUD MARKET PLACES FOR DATA ANALYTICS? TIM KRASKA, ELKHAN DADASHOV,

Time to Vote EHR use for asynchronous and telehealth visits PC Shortage Collection of Social

More on Duality Marco Chiarandini Department of Mathematics & Computer Science University of

Media Call with CEO Fritz Joussen 12 FEBRUARY 2019 Current sector challenges

Im proving Consum er Experience: Progress and Opportunities 20 13 Chicago Paym ents Sym posium

Sambuz

Useful Links

Newsletter

Mail Us

Hadoop Infrastructure @Uber Past , Present and Future Mayank - PowerPoint PPT Presentation

Hadoop Infrastructure @Uber Past , Present and Future Mayank Bansal U B E R | Data Ubers Mission Transporta=on as reliable as running water , everywhere, for everyone 75+ Countries 500+ Ci=es And growing U B E R | Data How

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop?

Apache Hadoop Ingestion &amp; Dispersal Framework Danny Chen dannyc@uber.com, Omkar Joshi

Time Predictions in Uber Eats Zi Wang@Uber QCon New York 2019 June 2019 Agenda 1. ML in Uber

Peeking Beneath the Hood of Uber Le Chen, Alan Mislove, Christo Wilson Northeastern University

STREAM PROCESSING @ UBER DANNY YUAN @ UBER What is Uber Transportation at your fingertips

Plug and Play Language Model : A Simple Baseline for Controlled Language Generation ICLR20

The Architecture of Uber's Realtime System March 25, 2015 Amos Barreto Danny Yuan

Tracing polyglot systems An OpenTracing Tutorial Yuri Shkuro (Uber), Won Jun Jang (Uber),

Uber &amp; MADD Franchesca Cassanese Victoria Walker Natalia Colon Lee Andrews Uber &amp;

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow,

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools

The Past, Present, and Future of the R Project Kurt Hornik Kurt Hornik useR! 2008 The Past,

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part

STATEWIDE TRUCK PARKING STUDY Great American Trucking Show August 2019 Texas Freight Mobility

Procurement and Business Aspects of the Cloud Belnet Mario Vandaele Brussels 19th

AerCap Holdings N.V. Aengus Kelly, CEO January 2017 Industry Update Looking Back PASSENGER

SPOTLYTICS: HOW TO USE CLOUD MARKET PLACES FOR DATA ANALYTICS? TIM KRASKA, ELKHAN DADASHOV,

Time to Vote EHR use for asynchronous and telehealth visits PC Shortage Collection of Social

More on Duality Marco Chiarandini Department of Mathematics &amp; Computer Science University of

Media Call with CEO Fritz Joussen 12 FEBRUARY 2019 Current sector challenges

Im proving Consum er Experience: Progress and Opportunities 20 13 Chicago Paym ents Sym posium

Sambuz

Useful Links

Newsletter

Mail Us

Apache Hadoop Ingestion & Dispersal Framework Danny Chen dannyc@uber.com, Omkar Joshi

Uber & MADD Franchesca Cassanese Victoria Walker Natalia Colon Lee Andrews Uber &

More on Duality Marco Chiarandini Department of Mathematics & Computer Science University of