a. 1 ! CS 744: RAY Shivaram Venkataraman Fall 2020
ADMINISTRIVIA late mall week - Assignment Grades? by next → - Project proposal aka Introduction (10/16) Introduction Related Work Timeline (with eval plan) week . next - Midterm: Oct 22 early →
MACHINE LEARNING: STACK Data parallelism → = ypytorch train Performance → ,y → Portability TVM D ' m ⇒ \ Pipeline Pipe dream parallelism
REINFORCEMENT LEARNING
Reward Affeldt wt . mm by " algo
tea " n ;÷:i
RL SETUP .in#ja:hna :* → perforation -7 - ¥ . - - - re - do ow - - rein . !qm at Imitation :*
↳ O ¥ RL REQUIREMENTS O . static exeat 'm plan → flexibility grained computation Simulation fine → be or ms could - simulation ~ each hours processing , simeator state I stateless ,dm!nFu7e stated Training - proving ↳ data pre - latency Inane low execution Serving → very ↳ future computation ) on outfit of High throughput depends compute very past tasks I see IM
↳ " @yeifzI÷mg ↳ : qdfeadegr.am?t:aFIImtereqg-8osl9 a' ← RAY API Outbox 'kiEg¥e µ Tasks Actors Tfeu :EE That :b → tasks - variable I futures = f.remote(args) actor = Class.remote(args) - - - ,I¥mItfauI ¥ T Im futures = actor.method.remote(args) 2mg - Fwd name . remote Camp • handle = actor . method 't ) not 8. I I - objects = ray.get(futures) . handled will → - arap ready = ray.wait(futures, k,timeout) war # ( or before args1# I arguments tasks to futures be can within tasks wait ) for spawn ( or you can a task
a•img÷ • o RAY API ' \ " " / tasks - Nested Tasks Actors " " futures = f.remote(args) actor = Class.remote(args) - futures = actor.method.remote(args) f Largs ) def - : Ito lo : i in for remote ( 443 ) fo . g. . remote Ci ) Hof . wait Cfo ) - ray objects = ray.get(futures) o - wait Frigg . - ' " EP ready = ray.wait(futures, k,timeout) rmthprtati to "
↳ ↳ ⇒ Lineage ! COMPUTATION MODEL . remote create - policy g. Dotted lines Control edges a task C > spawn - an actor - get C) ① spam @ lines Sold ✓ ✓ edges Data - y edges / X stateful arts action on sequentially happen
= make Deterministic ARCHITECTURE key , had hash Mtn can tasks state .gr I 1 to ← idea : Fagin : o • ¥ " anger 01¥ Emir :& .
↳ ↳ ↳ Global control store Sort of Database Externalizes a \ Nam de Object table state ! donations ) all objets list r metadata and their ↳ shard Task table Replicate tasks of Lineage more Scale Function table easily , simplify corresponding blocks code 1 design ached tasks To tolerance fault
RAY SCHEDULER Can local scheduler lags ) locality ? . remote take § C . / ✓ bcdi5 ( wait for timeout & fund if busy : Fasching , g. * ¥ Global Scheduler Global Control Store to → length - ← determine if locality node is busy
FAULT TOLERANCE tasks - execution of replay Tasks ok lineage re → , periodically checkpoint actors Actors → cleft restore tree messages replay → 775ha drain 'd I replication GCS stateless ! Nothing ? Scheduler or launch → a new scheduler Re - spawn
SUMMARY Ray: Unified system for ML training, serving, simulation Flexible API with support for Stateless tasks Stateful Actors Distributed scheduling, Global control store
DISCUSSION https://forms.gle/PN5FSJB6vVkDjoih8
Consider you are implementing two apps: a deep learning model training and - balancing a sorting application. When will use tasks vs actors and why ? bad stateless Tasks Actors state , location operations * + external Deterministic Sorting Does → it unity still have ! dependencies into smaller parts ? a. d ' ride weights Model are dependencies do can state , iterators ? Training between Multiple for - grained data parallel fine recovery
replica YET ✓ node ;÷÷£ new if → better has :* :* . to replica ? - doin / n - goes we # ) after gift recovery ? v ? godwin ageing Minkowski
NEXT STEPS scalability Linear Next class: Clipper sillier ¢ Last lecture on ML! a linear super ' I :::÷;L I hardware l2hB/mc
Recommend
More recommend