! morning good CS 744: MESOS Shivaram Venkataraman Fall 2020
↳ ADMINISTRIVIA lie poll ! → fill out - Assignment 1: How did it go? distributed - Assignment 2 out tonight ML → - Project details - N 3 students - Create project groups → week - Bid for projects/Propose your own next → - Work on Introduction page 1 - 2 - in check - and session poster report Final -
COURSE FORMAT Paper reviews “Compare, contrast and evaluate research papers” Discussion
Applications Assignment Machine Learning SQL Streaming Graph T d park MR , Computational Engines → → GFS Scalable Storage Systems - Resource Management Datacenter Architecture →
MapReduce = GFS Spark
BACKGROUND: OS SCHEDULING code, static data code, static data code, static data heap heap heap pt B P2 chrome Evin , gee stack stack stack = = . time sharing How do we an ¥777 o - 10ms for share CPU rim - as lo . . for . . between go ; - processes ? time CPU
↳ ↳ CLUSTER SCHEDULING naff :h machines number of large Scale → ? scheduler one ' Fairness - searing " " " 1M € space - WT , tolerance fault multi / time C . aware ) ( placement constraint preferences , or pump scheduling
utilization resources TARGET ENVIRONMENT ↳ Not all used are g → Multiple MapReduce versions applications { kinds of Different cluster same on Mix of frameworks: MPI, Spark, MR - Faris - - → word count 100 martinet MR hankering . Data sharing across frameworks . . t ! L - in F Avoid per-framework clusters : ¥
. fifteenth ↳ - level scheduling Two DESIGN ars stoked awww . o¥÷m5onYs I ↳ scheduling across framework Single per - framework master - scheduler fi¥ scheduler wide ME fret new frameworks ↳ Add ^ www.oibi " ke fibre in , Flexibility Scalability
¥m ! :# " " ' RESOURCE OFFERS ant Dared 7 reply offering === . zcpuisgb ' ' policy " c- ri : ==== he :* ←
↳ ↳ CONSTRAINTS Examples of constraints - Dita soft locality → hard machines → Gpu Constraints in Mesos: reject offer frameworks can functions " Boolean " fitters →
↳ DESIGN DETAILS Dai Allocation: tasks ! L Guaranteed allocation, revocation ,kfd f To Hers 1000 T o an ④ short . lived , gong running task can - empted , Isolation ' when be pre interest - express Containers (Docker) frameworks Other " 4¥ podcefya.me
FAULT TOLERANCE ¥ :& . + adf.qt.ws ¥ master failure ft son . meso , jobs l doesnt affect # heartbeat
↳ PLACEMENT PREFERENCES with prep What is the problem? frameworks more ↳ If cluster you the available in machines than How do we do allocations? scheme weighted lottery resources that overall offers the to needs make size µ portioned in a framework
CENTRALIZED VS DECENTRALIZED Decentralised Centralized → Scalability of frameworks ~ loos of apps rloos each solution optimal new frameworks handle 1 for framework Complexity developer
CENTRALIZED VS DECENTRALIZED ✓ Framework complexity offers resource → If Fragmentation, Starvation small too are Inter-dependent framework
→ Apache Hadoop COMPARISON: YARN " " Meroe matter ng Per-job scheduler ¥g per framework AM asks for resource - RM replies ⇐ - scheduler Fer - job
→ Google COMPARISON: BORG Single centralized scheduler - Requests mem, cpu in cfg Priority per user / service I Better packing Support for quotas / reservations
SUMMARY • Mesos: Scheduler to share cluster between Spark, MR, etc. framework • Two-level scheduling with app-specific schedulers Go • Provides scalable, decentralized scheduling • Pluggable Policy ? Next class!
DISCUSSION https://forms.gle/urHSeukfyipCKjue6
↳ What are some problems that could come up if we scale from 10 frameworks to 1000 frameworks in Mesos? odds up Fragmentation / starvation go → bottleneck ? Master → to frameworks for wait to time takes it reply master Mems pre - emption ? Yes soft state → why ? / T . has takes longer ? n unclear ? failure recovery →
" : ~ 2x penni pongee : . O O O framework terror : Rigid y Ihle !fain MPI 's share
↳ List any one difference between an OS scheduler and Mesos lecture Motivation the part of ÷÷÷÷÷ Data locality - oversubscribed clusters spark on :÷÷÷÷ :* ↳ a . . . . . . . → felt pre blown away - empted cache is → . coarse Grained lived long Layard → shuffle files Executor Backend " share gamanteed "
↳ offers resource " " " " ? better perform how does it I e - up " " ramp schedule to Ci ) Time C- completion to c- dis Time optimal policy : Borg YARN with bonparisons ,
thrashing " in :* µ NEXT STEPS " Next class: Scheduling Policy released will be Athgnmentz Further reading • https://www.umbrant.com/2015/05/27/mesos-omega-borg-a-survey/ • https://queue.acm.org/detail.cfm?id=3173558 - scheduling m2 Delay wait for offer task part or so Ss ' is made m2 offer ' Ss after : D ← offer me Holik m2 , m3,m4
Recommend
More recommend