cs 744 pywren
play

CS 744: PYWREN Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - PowerPoint PPT Presentation

Hello ! CS 744: PYWREN Shivaram Venkataraman Fall 2020 ADMINISTRIVIA deadline Tonight Friday Project checkins due Nov 20 th submitting for In-class project presentations about ! talks requests regrade Dec 8 th and Dec 10 th 5


  1. Hello ! CS 744: PYWREN Shivaram Venkataraman Fall 2020

  2. ADMINISTRIVIA deadline Tonight → Friday Project checkins due Nov 20 th submitting for In-class project presentations about ! talks requests regrade Dec 8 th and Dec 10 th 5 min → project - - your Midterm I for soon ! Project grade breakdown Canvas → Intro: 5% Mid-semester checkin: 5% Presentation: 10% Final Report: 10%

  3. analysis I ↳ Society Implications → Date of Big a NEW HARDWARE MODELS computation Engines evolution shed Data Syctems Big storage hardware New

  4. Infiniband Networks Compute Accelerators Serverless Computing - Non-Volatile Memory

  5. SERVERLESS COMPUTING 1 servers ? ? No

  6. MOTIVATION: USABILITY etc Google Azure . , Scientist - - Data Analysis E- What instance type? What base image? How many to spin up? What price? Spot? - it difficult Makes the cloud to use

  7. O

  8. Snowflake ABSTRACTION LEVEL ? . .ae/totarinmisamneqn-,ouyIf.::.i-;:::e " ÷÷j Logistic Regression Avery Application Application → or SOL query spark Compute subset Spark a on wfm → Framework RDD machines - of ? Compute signing strains Framework Amazon EC2 . CloudLab server - VM Hardware Private Cluster → - …

  9. ⇒ STATELESS DATA PROCESSING - Intermediate Compute aerogel state in spark IMR resource state .biz f local disk on was Redis I ← local storage IAA is ephemeral so intermediate state remote ! S3 be needs to

  10. Provided by “Serverless” computing Provider → cloud Y÷mqFydoad µ § function ( lambda ) submit a - executed to be 300 900 seconds single-core - → Time bound r I 512 MB in /tmp storage tgpsowds → 3GB RAM → memory → cloud database Python, Java, node.js =

  11. PYWREN API / test foython - pg ' Integrated ! ! Language . py test dependencies martially captures ⇒ fat ← cloud ships and to them the libraries 2010 ] use [ cloud pickle → ~ like - - ← Pyspark to similar function map → - - - - to get Ray API similar ↳ block in

  12. Distributed key : get put value PYWREN: how it works - Amazon # future = runner.map(fn, data) T.name get # Invoke In " ÷ ¥ → fetch fu & data often containers ) - toll future.result() - - . . fetch variable - your laptop ! # JUS in - < your laptop the cloud

  13. how it works future = runner.map(fn, data) data data func data Serialize func and data Put on S3 pull job from s3 Invoke Lambda download anaconda runtime python to run code pickle result stick in S3 future.result() poll S3 result unpickle and return your laptop the cloud

  14. STATELESS FUNCTIONS: WHY NOW ? - What are the trade-offs ? Need network 210 → more data is the All network ! read over f pretty is network BW But - → - local to good ! comparable Bw ! - - SSD Ss ? could be Bottleneck →

  15. Shuffle phase in MAP and REDUCE ? now is MR benchmark Sort using done being paper MapReduce ↳ same Redi as key ? ,hey2 - - - - Co - - - soo ) top - - Goi - . - . → - Input Output = - Data Data . ! = - - keys - bucket ( red intoning - value key files - small - memory blob store like not good for store

  16. ↳ PARAMETER SERVERS prediction ) ML model compute models stored input → # sparse read ↳ Ad click get Redi Use lambdas to run “workers” or VMs etc . update Parameter server as a service ? Parameter Server - - profile measure do or How you requirements ? function function locally use Ran , I I profiler ? resume [ !] Recent work time limit ) and tolerance → checkpoint ( before Fault

  17. ↳ WHEN Should we use SERVERLESS ? Yes! Maybe not ? need elasticity semesters when Use you not we when me state ( actors ) need local don't need when Use you workloads ) need might Iterative across grained Comm fine . iteration poor state from . workers might lambdas all not the active at he time ! same

  18. SUMMARY Motivation: Usability of big data analytics Approach: Language-integrated cloud computing Features - Breakdown computation into stateless functions - Schedule on serverless containers - Use external storage for state management Open question on scheduling, overheads

  19. DISCUSSION https://forms.gle/PAMDKmwHepmPWDrBA

  20. ↳ ywjrkefpu.es?diforageindefedentY scale by K workers Increasing improvement ! ' Sx f - D - to Hard know is → compute to ← how very short compared to choose men I/O pavilions wards more read / write time to reduces Reds to

  21. Consider you are a cloud provider (e.g., AWS) implementing support for serverless. What could be some of the new challenges in scheduling these workloads? How would you go about addressing them? lambda functions machines → Mapping - this ? do do How we Redi shard ? talk to some lambda ? Does one Locality - infer it ? we can ? container / when reuse do we schedule new to - when a " ML ? configuration ? use find to opt Need * I core are fixed ! 900 , requirements Resource ' - 3GB upto

  22. OPEN QUESTIONS - Scalable scheduling: Low latency with large number of functions ? - Debugging: Correlate events across functions ? - Launch overheads: Fraction of time spent in setup (OpenLambda) - Resource limits: 15 minute AWS Lambda (Oct 2018) tu

  23. ↳ ↳ Stark told " m side btw " side ] App warm for 5 mins - be sued . ⇒ if ran you Swiss within TB%YiaAuw# one Azure paper policy - ÷÷÷i¥¥ ⇐ -1 :÷ : . 3h13

More recommend