cs 744 clipper
play

CS 744: CLIPPER Shivaram Venkataraman Fall 2020 ADMINISTRIVIA - PowerPoint PPT Presentation

morning ! Good CS 744: CLIPPER Shivaram Venkataraman Fall 2020 ADMINISTRIVIA Course Project Proposals - Due on Friday! - See Piazza for template - Submission instructions soon midterms dangle / the ML on Pinera at upto


  1. morning ! Good CS 744: CLIPPER Shivaram Venkataraman Fall 2020

  2. ADMINISTRIVIA Course Project Proposals - Due on Friday! → - See Piazza for template - Submission instructions soon midterms dangle / the ML on Pinera → at upto Midterm details → section - Open book, open notes - Held in class time 9.30-10.45am Central Time - Type / Upload photos (extra 15 mins)

  3. MACHINE LEARNING: INFERENCE ÷ ÷ : ÷ : : O O

  4. ↳ Fw ! ! GOALS ( percentile " 99.9 ggg percentile or latency [ - - -iger - Interactive latencies (tail latency < 100ms) how → - High throughput to handle load to users many need → that requests " many made be - Improved prediction accuracy specific ML - Generality (?) models / qwitpj.AM?.wxe..m mi II t \ handle many as as frameworks possible

  5. n*iJL÷¥÷÷ Requests HTTP sk ARCHITECHTURE inform ^ over \ - - . t Improve - accuracy ' eager - → Failures r eight L → Replicates It [ - - ↳ zseikit / ↳ a spark - rabbi fair X herd ← D Deel go ← t L

  6. ↳ API people tint # dell MODEL CONTAINERS ÷dciM is implemented Interface framework per Run using Docker containers once ] µ initiative TF shim ' ate Can be replicated across machines - ' r - are model frameworks Mim ' ! TF instantiate 1¥ ' Y .pe?f - so rent → win .

  7. ↳ data point gpeoloinieddder MODEL ABSTRACTION LAYER Predict , * good . ¥kiEm;; to user - id - I movies for Caching predict - 101 TF - es are → M " µ , - Improve performance for frequent queries or spark -50 Tt - LRU eviction policy . Her - Important for feedback : - www.tfiidrYDTE.jo/iv.::.:Eiesf&dback T feedback I - ¢ dir ith ① predict . mm ! ? high and → Model Predictions update -

  8. that RPC max . do both size an To BATCHING, QUEUING within put while + ↳ fixed lost ( eating SLO - we lead overhead - → amortize hasmdddisswt ;÷% :L Goals, Insight batches could vary opium - Increase latency (within SLO) → hardware model ! parellism each for improved throughput f- ✓ - Reduce RPC overheads - GPU / BLAS acceleration Approach - Per container queues. - Why? Cpu Gpu

  9. ADAPTIVE BATCHING latency observe ) SLO LAND late AIMD: Additive Inc Multiplicative Dec write batch - - - - inc . Why ? carefully 5 batch in Increase & ed-domrmaffiHE.7.su ! 4 4 Batch Size FL 3 2 f s Delayed: Wait until batch exists 2 ① 1 - Why? 0 examples Gang 0 2 4 6 8 10 Collect certainties wait ? . ? Time → link ) should upto a dinette Elias should Iad ? then few ↳ Der

  10. Accuracy MODEL SELECTION Improve → ensembles ÷ → - y

  11. a n÷÷÷ SINGLE MODEL SELECTION Multi-Arm Bandit formulation ' - Explore vs Exploit - Regret: Loss by not picking optimal action - Goal: Minimize regret option Clipper each atta "¥a- get with - Exp3 algorithm → - - Single evaluation - - Scales to more models ¥y based weights update on feedback Omodd2 ④ model I -

  12. ↳ ensembles MULTI MODELS → predict movies it Ensemble 5¥ - Combine output from models (weighted average) Este ft \ - How do we get the weights ? Apart Combination tf v linear Robust Prediction =L y.at/32i - React to model changes • se - Output confidence score t t B d & " dos update cat Expo → classifier , - o . 0.25 → cat Binary . CI - 6 > as threshold - 4 O O CZ dog & combine

  13. ↳ ↳ STRAGGLER MITIGATION - to Why do stragglers occur? model containers N slow ? for wait be 9TH might we them of reply , some totem locating ? replicas more . .2rep Approach 1¥ result based . on Approx finished has whatever late ! → ML them specific Better approx →

  14. SUMMARY • Clipper: ML inference Workloads + Requirements • Layered architecture provides generality • Caching, Batching, Replication to improve latency, throughput • Multi-Arm bandits to improve accuracy

  15. DISCUSSION https://forms.gle/FCVhPURqz7HSbDtg6

  16. Consider a scenario where you run a model serving service that hosts a number of different applications. The traffic for some applications is sporadic (e.g. only a few hours where they are used). What are some advantages / disadvantages of using Clipper for such a service? Advantages Disadvantages contented be Rade might batching Adaptive → tune → → delayed .net?fashim ? - hooted replicas multiple → elasticity roti frequent greeted → Containerization - applications inlet - . slow t ppc pt ⇒ we . provided T ↳ de - Effie

  17. ) :L , homie :O smug bing.g.ms ? judith things ? . different O O D D Ao O - : Treasonable ↳ µ ensembles accurate low tetany inflation very is

Recommend


More recommend