ri
play

ri RISE to the Challenges of AI Systems Joseph E. Gonzalez - PowerPoint PPT Presentation

ri RISE to the Challenges of AI Systems Joseph E. Gonzalez Assistant Professor, UC Berkeley jegonzal@cs.berkeley.edu Training Big Data Big Model Large-Scale parallel and distributed systems Training Big Data Big Model Training Big


  1. ri RISE to the Challenges of AI Systems Joseph E. Gonzalez Assistant Professor, UC Berkeley jegonzal@cs.berkeley.edu

  2. Training Big Data Big Model Large-Scale parallel and distributed systems

  3. Training Big Data Big Model

  4. Training Big Data Big Model VW CoCoA Splash

  5. How to do Research in AI Systems Ø Manage Complexity Ø seek parsimony in system design Ø great systems research is often about what features are taken away Ø Do a few things well and be composable Ø Identify Tradeoffs Ø With each design decision what do you gain and lose? Ø What trade-offs are fundamental? Ø Evaluate your System Ø Positive: How fast and scalable is it and why ? Ø Negative: When does it fail and what are it’s limitations ?

  6. Hemingway * Modeling Throughput and Convergence for ML Workloads Shivaram Xinghao Zi Venkataraman Pan Zheng Ø What is the best algorithm and level of parallelism for an ML task? Ø Trade-off: Parallelism, Coordination, & Convergence Ø Research challenge: Can we model this trade-off explicitly? Iter. / Sec. We can estimate I from ML Metric Loss data on many systems Systems Metric We can estimate L from data for our problem Cores Iteration I ( p ) Iterations per second as Loss as a function of L ( i, p ) a function of cores p iterations i and cores p *follow-up work to Shivaram’s Ernest System in NSDI’16

  7. Hemingway * Modeling Throughput and Convergence for ML Workloads Shivaram Xinghao Zi Venkataraman Pan Zheng Ø What is the best algorithm and level of parallelism for an ML task? Ø Trade-off: Parallelism, Coordination, & Convergence Ø Research challenge: Can we model this trade-off explicitly? Loss as a function of loss ( t, p ) = L ( t ∗ I ( p ) , p ) L ( i, p ) iterations i and cores p I ( p ) Iterations per second as • How long does it take to get to a given loss? a function of cores p • Given a time budget and number of cores which algorithm will give the best result? *follow-up work to Shivaram’s Ernest System in NSDI’16

  8. System Performance as a Convergence as a function function of Parallelism of Parallelism and Iterations Time Per. Iteration Training Loss Parallelism Iteration Training Loss Convergence as a fn. of Time and Parallelism Hemingway: Modeling Distributed Optimization Algorithms. Xinghao Pan, Shivaram Venkataraman, Zizheng Tai, Joseph Gonzalez. NIPS’16 ML-Sys Workshop.

  9. Take away … try to decouple System Algorithm Improvements Improvements use data collection + sparse modeling to understand your system

  10. Training Big Data Big Model VW CoCoA Splash

  11. Training Big Data Big Model

  12. Learning ? Big Training Data Big Model

  13. Learning Conference Big Training Data Papers Big Model

  14. Learning Conference Papers Big Training Data Dashboards and Big Model Reports

  15. Learning Dashboards Conference and Papers Reports Big Training Data Big Model Drive Actions

  16. Learning Drive Actions Big Training Data Big Model

  17. Learning Inference Big Training Data Big Model

  18. Learning Inference Query ? Big Training Data Decision Big Model Application

  19. Inference Learning Query Big Training Data Decision Big Model Application Often overlooked Timescale: ~10 milliseconds Billions of Queries a Day à Costly

  20. Inference why is challenging? Need to render low latency (< 10ms) predictions for complex Models Queries Features Top K SELECT * FROM users JOIN items, click_logs, pages WHERE … under heavy load with system failures .

  21. Inference is moving beyond the cloud Augmented Reality Home Security Home Automation Mobile Self Driving Cars Personal Robotics Assistants

  22. Inference is moving beyond the cloud Opportunities Ø Reduce latency and improve privacy Ø Address network partitions Research Challenges Ø Minimize power consumption Ø Limited hardware & long life-cycles Ø Develop new hybrid models to leverage the cloud and edge devices

  23. Inference Robust is critical Self “ Parking ” Cars Self “ Driving ” Cars Chat AIs

  24. Learning Inference Query Big Training Data Decision Big Model Application Feedback

  25. Learning Inference Decision Training Big Data Timescale: hours to weeks Often re-run training Sensitive to feedback loops Application Feedback

  26. Closing the Loop Why is challenging? d dt Implicit and Delayed Self Reinforcing World Changes Feedback Feedback Loops at varying rates

  27. Learning Inference Query Big Responsive Training Adaptive Data (~10ms) (~1 seconds) Decision Big Model Application Feedback

  28. Learning Inference Responsive Adaptive (~10ms) (~1 seconds) ?

  29. Learning Inference Responsive Adaptive (~10ms) (~1 seconds) Secure

  30. Intelligence in Sensitive Contexts Augmented Reality Home Monitoring Voice Technologies Medical Imaging Protect the data , the model , and the query

  31. Protect the data , the model , and the query High-Value Data is Sensitive Models capture value in data • Medical Info. • Core Asset • Home video • Sensitive • Finance Data Queries can be as sensitive as the data

  32. Opaque: Analytics on Secure Enclaves Exploit hardware support to SQL ML Graph enable computing on encrypted data Opaque Ø Today: prototype system query optimization running in Apache Spark Ø support SQL queries in o-filter o-groupby o-join untrusted cloud Catalyst Ø ~50% reduction in perf. Ø Future: enable prediction Spark Execution serving on enc. queries Wenting et al. (NSDI’17)

  33. Adaptive Responsive Secure

  34. riselab UC Berkeley

  35. Clipper A Low-Latency Online Prediction Serving System NSDI’17 Daniel Crankshaw Xin Wang Giulio Zhou Michael J. Franklin Joseph E. Gonzalez Ion Stoica

  36. Learning Inference Query Big Training Data Decision Application Feedback

  37. Learning Inference Slow Changing Fast Changing Parameters Parameters Query Big Training Data Decision Application Feedback Slow

  38. Hybrid Offline + Online Learning Update “feature” functions offline using batch solvers • Leverage high-throughput systems (Tensor Flow) • Exploit slow change in population statistics f ( x ; θ ) T w u Update the user weights online : • Simple to train + more robust model • Address rapidly changing user statistics

  39. Common modeling structure f ( x ; θ ) T w u Matrix Deep Ensemble Factorization Learning Methods Items Users Input

  40. Clipper Online Learning for Recommendations (Simulated News Rec.) 0.6 Partial Updates: 0.4 ms Retraining: 7.1 seconds 0.4 >4 orders-of- Error magnitude faster 0.2 adaptation 0 0 10 20 30 Examples

  41. Learning Inference Slow Changing Fast Changing Parameters Parameters Big Data Application Feedback Slow

  42. Learning Inference Slow Changing Parameters Clipper Fast Changing Parameters Big Data Caffe Application Feedback Slow

  43. Clipper Serves Predictions across ML Frameworks Fraud Content Personal Robotic Machine Detection Rec. Asst. Control Translation Clipper VW Create Caffe

  44. Clipper Key Insight: Caffe VW Create The challenges of prediction serving can be addressed between end-user applications and machine learning frameworks As a result, Clipper is able to: Ø hide complexity by providing a common interface to applications Ø Ø bound latency and maximize throughput through caching, adaptive batching, model replication Ø Ø enable robust online learning and personalization through model selection and ensemble algorithms Ø without modifying machine learning frameworks or front-end applications

  45. Clipper Architecture Fraud Content Personal Robotic Machine Detection Rec. Asst. Control Translation Clipper VW Create Caffe

  46. Clipper Architecture Applications Predict RPC/REST Interface Observe Clipper VW Create Caffe

  47. Clipper Architecture Applications Predict RPC/REST Interface Observe Clipper RPC RPC RPC RPC Model Wrapper (MW) MW MW MW Caffe

  48. Clipper Architecture Applications RPC/REST Interface Predict Observe Clipper Improve accuracy through bandit methods , Model Selection Layer ensembles , online learning , and personalization Provide a common interface to models Model Abstraction Layer while bounding latency and maximizing throughput . RPC RPC RPC RPC Model Wrapper (MW) MW MW MW Caffe

  49. Clipper Architecture Applications RPC/REST Interface Predict Observe Clipper Model Selection Layer Anytime Predictions Caching Model Abstraction Layer Adaptive Batching RPC RPC RPC RPC Model Wrapper (MW) MW MW MW Caffe

  50. Caching Model Abstraction Layer Adaptive Batching RPC RPC RPC RPC Model Wrapper (MW) MW MW MW Caffe

  51. Caching Model Abstraction Layer Model Abstraction Layer Adaptive Batching RPC RPC RPC RPC Model Wrapper (MW) MW MW MW Caffe Provide a common interface to models while bounding latency and maximizing throughput . Ø Models run in separate processes as Docker containers Ø Resource isolation

Recommend


More recommend