CS 744: MESOS
Shivaram Venkataraman Fall 2020
good
morning
!
CS 744: MESOS Shivaram Venkataraman Fall 2020 ADMINISTRIVIA lie - - PowerPoint PPT Presentation
! morning good CS 744: MESOS Shivaram Venkataraman Fall 2020 ADMINISTRIVIA lie poll ! fill out - Assignment 1: How did it go? distributed - Assignment 2 out tonight ML - Project details - N 3 students - Create project
CS 744: MESOS
Shivaram Venkataraman Fall 2020
good
morning
!
ADMINISTRIVIA
→ fill out
lie poll
!
→
MLdistributed
→next
week
↳
1
page
report
and
poster
session
COURSE FORMAT
Paper reviews “Compare, contrast and evaluate research papers” Discussion
Scalable Storage Systems Datacenter Architecture Resource Management Computational Engines Machine Learning SQL Streaming Graph Applications
Assignment
T
→
MR ,d park
→ GFS
MapReduce GFS Spark
BACKGROUND: OS SCHEDULING
code, static data heap stack code, static data heap stack code, static data heap stack CPU
How do we share CPU between processes ?
pt
P2
B
Evin
= , gee = . chrometime sharing
rim
for
go
for
;
loan¥777
. . . .CLUSTER SCHEDULING
Scale
→large
number of machines
↳
scheduler
?
Fairness
naff:h
space
searing
multi
fault
tolerance"""1M€
, Ctime
/
preferences (placement
, constraint)
pump
. awarescheduling
TARGET ENVIRONMENT
Multiple MapReduce versions Mix of frameworks: MPI, Spark, MR Data sharing across frameworks Avoid per-framework clusters
utilization
↳ Not all resources areused
Different
kinds
applications
cluster
hankering .
. .L
t !
¥
DESIGN
ars
. . fifteenthTwo
awww
stoked
I
↳ scheduling
acrossframework
Single
per -frameworkmaster-
scheduler
ME
wide
scheduler
fret
^
↳ Add
new frameworkswww.oibi
"
in ke fibreScalability
, FlexibilityRESOURCE OFFERS
' ""¥m!:#
ant
Dared
7
reply
===
zcpuisgb
' ' policy "c-
ri:
he:*
====
←
CONSTRAINTS
Examples of constraints Constraints in Mesos:
locality
→soft
Gpu
machines
→hard
↳
frameworks
canreject
↳
"fitters
" →Boolean
functions
DESIGN DETAILS
Allocation: Guaranteed allocation, revocation Isolation Containers (Docker)
Dai
T
↳
short . lived
tasks ! L
, gongrunning task
canbe
pre
podcefya.me
" 4¥
Other
frameworks
express
interest
FAULT TOLERANCE
¥:&
.
son
.l
master failure
+adf.qt.ws ¥
doesnt affect
jobs
PLACEMENT PREFERENCES
What is the problem? How do we do allocations?
↳ If
you
moreframeworks
with prepthan machines
available
in thecluster
↳
weighted
lottery
scheme
make
µportioned
in size to
theneeds
CENTRALIZED VS DECENTRALIZED
Centralized
Decentralised
→ Scalability ~ looseach
rloos
solution
handle
new frameworks1
Complexity
for framework developer
CENTRALIZED VS DECENTRALIZED
Framework complexity Fragmentation, Starvation Inter-dependent framework ✓
→ If
resource
too
small
COMPARISON: YARN
Per-job scheduler AM asks for resource RM replies
→ Apache Hadoop
ng
" Meroematter
"¥g
per
framework
⇐ -
scheduler
COMPARISON: BORG
Single centralized scheduler Requests mem, cpu in cfg Priority per user / service Support for quotas / reservations
packing
SUMMARY
framework
Go
DISCUSSION
https://forms.gle/urHSeukfyipCKjue6
What are some problems that could come up if we scale from 10 frameworks to 1000 frameworks in Mesos?
→Fragmentation / starvation
go
up
→
Master
bottleneck
?↳
it
takes
time
to
wait
for
frameworks to
reply
Mems master
→
pre - emption ?Yes
. Thas
soft state
→
failure
recovery
takes longer ?
why ? /
n unclear ?:
":
pongee
.
O
O O
Rigid
framework
terror
:
y
Ihle!fain
MPI 's
shareList any one difference between an OS scheduler and Mesos
↳
Motivation
part
lecture
clusters
Data locality
↳ a
:÷÷÷÷:*
. . .. ..
.
→ felt pre→
cache
isblown
away
→ shuffle fileslong
lived
. coarse GrainedLayard
"gamanteed
share
" Executor Backendperform
" "better
" ?I
e
" rampC-
Ci ) Time to schedulec-dis
Time to completion policy :bonparisons
with YARN ,Borg
NEXT STEPS
Next class: Scheduling Policy Further reading
"in:*
"Athgnmentz
will
be
released
scheduling
wait for
SsD
' ' : Ssafter
m2Holik
← offer me
m2 , m3,m4