ri RISE to the Challenges of AI Systems Joseph E. Gonzalez - PowerPoint PPT Presentation

ri RISE to the Challenges of AI Systems Joseph E. Gonzalez Assistant Professor, UC Berkeley jegonzal@cs.berkeley.edu

Training Big Data Big Model Large-Scale parallel and distributed systems

Training Big Data Big Model

Training Big Data Big Model VW CoCoA Splash

How to do Research in AI Systems Ø Manage Complexity Ø seek parsimony in system design Ø great systems research is often about what features are taken away Ø Do a few things well and be composable Ø Identify Tradeoffs Ø With each design decision what do you gain and lose? Ø What trade-offs are fundamental? Ø Evaluate your System Ø Positive: How fast and scalable is it and why ? Ø Negative: When does it fail and what are it’s limitations ?

Hemingway * Modeling Throughput and Convergence for ML Workloads Shivaram Xinghao Zi Venkataraman Pan Zheng Ø What is the best algorithm and level of parallelism for an ML task? Ø Trade-off: Parallelism, Coordination, & Convergence Ø Research challenge: Can we model this trade-off explicitly? Iter. / Sec. We can estimate I from ML Metric Loss data on many systems Systems Metric We can estimate L from data for our problem Cores Iteration I ( p ) Iterations per second as Loss as a function of L ( i, p ) a function of cores p iterations i and cores p *follow-up work to Shivaram’s Ernest System in NSDI’16

Hemingway * Modeling Throughput and Convergence for ML Workloads Shivaram Xinghao Zi Venkataraman Pan Zheng Ø What is the best algorithm and level of parallelism for an ML task? Ø Trade-off: Parallelism, Coordination, & Convergence Ø Research challenge: Can we model this trade-off explicitly? Loss as a function of loss ( t, p ) = L ( t ∗ I ( p ) , p ) L ( i, p ) iterations i and cores p I ( p ) Iterations per second as • How long does it take to get to a given loss? a function of cores p • Given a time budget and number of cores which algorithm will give the best result? *follow-up work to Shivaram’s Ernest System in NSDI’16

System Performance as a Convergence as a function function of Parallelism of Parallelism and Iterations Time Per. Iteration Training Loss Parallelism Iteration Training Loss Convergence as a fn. of Time and Parallelism Hemingway: Modeling Distributed Optimization Algorithms. Xinghao Pan, Shivaram Venkataraman, Zizheng Tai, Joseph Gonzalez. NIPS’16 ML-Sys Workshop.

Take away … try to decouple System Algorithm Improvements Improvements use data collection + sparse modeling to understand your system

Training Big Data Big Model VW CoCoA Splash

Training Big Data Big Model

Learning ? Big Training Data Big Model

Learning Conference Big Training Data Papers Big Model

Learning Conference Papers Big Training Data Dashboards and Big Model Reports

Learning Dashboards Conference and Papers Reports Big Training Data Big Model Drive Actions

Learning Drive Actions Big Training Data Big Model

Learning Inference Big Training Data Big Model

Learning Inference Query ? Big Training Data Decision Big Model Application

Inference Learning Query Big Training Data Decision Big Model Application Often overlooked Timescale: ~10 milliseconds Billions of Queries a Day à Costly

Inference why is challenging? Need to render low latency (< 10ms) predictions for complex Models Queries Features Top K SELECT * FROM users JOIN items, click_logs, pages WHERE … under heavy load with system failures .

Inference is moving beyond the cloud Augmented Reality Home Security Home Automation Mobile Self Driving Cars Personal Robotics Assistants

Inference is moving beyond the cloud Opportunities Ø Reduce latency and improve privacy Ø Address network partitions Research Challenges Ø Minimize power consumption Ø Limited hardware & long life-cycles Ø Develop new hybrid models to leverage the cloud and edge devices

Inference Robust is critical Self “ Parking ” Cars Self “ Driving ” Cars Chat AIs

Learning Inference Query Big Training Data Decision Big Model Application Feedback

Learning Inference Decision Training Big Data Timescale: hours to weeks Often re-run training Sensitive to feedback loops Application Feedback

Closing the Loop Why is challenging? d dt Implicit and Delayed Self Reinforcing World Changes Feedback Feedback Loops at varying rates

Learning Inference Query Big Responsive Training Adaptive Data (~10ms) (~1 seconds) Decision Big Model Application Feedback

Learning Inference Responsive Adaptive (~10ms) (~1 seconds) ?

Learning Inference Responsive Adaptive (~10ms) (~1 seconds) Secure

Intelligence in Sensitive Contexts Augmented Reality Home Monitoring Voice Technologies Medical Imaging Protect the data , the model , and the query

Protect the data , the model , and the query High-Value Data is Sensitive Models capture value in data • Medical Info. • Core Asset • Home video • Sensitive • Finance Data Queries can be as sensitive as the data

Opaque: Analytics on Secure Enclaves Exploit hardware support to SQL ML Graph enable computing on encrypted data Opaque Ø Today: prototype system query optimization running in Apache Spark Ø support SQL queries in o-filter o-groupby o-join untrusted cloud Catalyst Ø ~50% reduction in perf. Ø Future: enable prediction Spark Execution serving on enc. queries Wenting et al. (NSDI’17)

Adaptive Responsive Secure

riselab UC Berkeley

Clipper A Low-Latency Online Prediction Serving System NSDI’17 Daniel Crankshaw Xin Wang Giulio Zhou Michael J. Franklin Joseph E. Gonzalez Ion Stoica

Learning Inference Query Big Training Data Decision Application Feedback

Learning Inference Slow Changing Fast Changing Parameters Parameters Query Big Training Data Decision Application Feedback Slow

Hybrid Offline + Online Learning Update “feature” functions offline using batch solvers • Leverage high-throughput systems (Tensor Flow) • Exploit slow change in population statistics f ( x ; θ ) T w u Update the user weights online : • Simple to train + more robust model • Address rapidly changing user statistics

Common modeling structure f ( x ; θ ) T w u Matrix Deep Ensemble Factorization Learning Methods Items Users Input

Clipper Online Learning for Recommendations (Simulated News Rec.) 0.6 Partial Updates: 0.4 ms Retraining: 7.1 seconds 0.4 >4 orders-of- Error magnitude faster 0.2 adaptation 0 0 10 20 30 Examples

Learning Inference Slow Changing Fast Changing Parameters Parameters Big Data Application Feedback Slow

Learning Inference Slow Changing Parameters Clipper Fast Changing Parameters Big Data Caffe Application Feedback Slow

Clipper Serves Predictions across ML Frameworks Fraud Content Personal Robotic Machine Detection Rec. Asst. Control Translation Clipper VW Create Caffe

Clipper Key Insight: Caffe VW Create The challenges of prediction serving can be addressed between end-user applications and machine learning frameworks As a result, Clipper is able to: Ø hide complexity by providing a common interface to applications Ø Ø bound latency and maximize throughput through caching, adaptive batching, model replication Ø Ø enable robust online learning and personalization through model selection and ensemble algorithms Ø without modifying machine learning frameworks or front-end applications

Clipper Architecture Fraud Content Personal Robotic Machine Detection Rec. Asst. Control Translation Clipper VW Create Caffe

Clipper Architecture Applications Predict RPC/REST Interface Observe Clipper VW Create Caffe

Clipper Architecture Applications Predict RPC/REST Interface Observe Clipper RPC RPC RPC RPC Model Wrapper (MW) MW MW MW Caffe

Clipper Architecture Applications RPC/REST Interface Predict Observe Clipper Improve accuracy through bandit methods , Model Selection Layer ensembles , online learning , and personalization Provide a common interface to models Model Abstraction Layer while bounding latency and maximizing throughput . RPC RPC RPC RPC Model Wrapper (MW) MW MW MW Caffe

Clipper Architecture Applications RPC/REST Interface Predict Observe Clipper Model Selection Layer Anytime Predictions Caching Model Abstraction Layer Adaptive Batching RPC RPC RPC RPC Model Wrapper (MW) MW MW MW Caffe

Caching Model Abstraction Layer Adaptive Batching RPC RPC RPC RPC Model Wrapper (MW) MW MW MW Caffe

Caching Model Abstraction Layer Model Abstraction Layer Adaptive Batching RPC RPC RPC RPC Model Wrapper (MW) MW MW MW Caffe Provide a common interface to models while bounding latency and maximizing throughput . Ø Models run in separate processes as Docker containers Ø Resource isolation

ri RISE to the Challenges of AI Systems Joseph E. Gonzalez - PowerPoint PPT Presentation

ri RISE to the Challenges of AI Systems Joseph E. Gonzalez Assistant Professor, UC Berkeley jegonzal@cs.berkeley.edu Training Big Data Big Model Large-Scale parallel and distributed systems Training Big Data Big Model Training Big

Design Patterns Dependency Injection Oliver Haase 1 Motivation A simple, motivating example

Data Data is PROOF! No Time to Train | April 2 Participation Fruit, vegetable, white milk, or

Solving Linear and Integer Programs Robert E. Bixby ILOG, Inc. and Rice University Ed Rothberg

Section 8: Design Patterns Slides by Alex Mariakakis with material from David Mailhot, Hal

Swayam Distributed Autoscaling for Machine Learning as a Service Sameh Elnikety, Arpan Gujarati,

How many servings of CoolWhip in an 8 oz container? A. less than 10 B. between 10 and 20 C.

Fitting parametric distributions using R : the fitdistrplus package M. L. Delignette-Muller - CNRS

VOTD: Buffer Overflow Engineering Secure Software Last Revised: August 17, 2020 SWEN-331:

Week 2 - Monday What did we talk about last time? Software development Data

Week 4 - Friday What did we talk about last time? Examples switch statements

Slide 1 Hello Everyone, Welcome to the Centers for Medicare & Medicare Advantage Medicaid

Yashar Ganjali Department of Computer Science University of Toronto HotI 2012 Santa Clara, CA

Apache James: more than emails in the cloud Ioan Eugen Stan Berlin Buzzwords 2012 About myself

Co vid 19 Bio E ngine e ring Adviso ry Bo ard We binar T he rape utic s Covid-19

AIRS impact on analysis and forecast of extreme precipitation events in the tropics with a

drought defined impacts causes atmospheric blocking measures of drought: Drought Index A

Surface Observations We now look at some hourly surface observations to study the frontal

The Challenge of Natural Hazards This PowerPoint will cover information on: Natural Hazards

6th Grade Weather & Climate and Natural Hazards 2015-10-15 www.njctl.org Slide 3 / 161

ENSC 408: Lab 6 Radar Imagery Interpretation October 22 nd , 2019 Lab 4 Marks General Comments:

China Regional Reanalysis : One-year Preliminary Experiments and Evaluation of First Stage (1998

Last-Mile Hazard Warning System in Sri Lanka: Lessons Leaned from the Pilot Project LIRNE asia

Outline 1. Introduction 2. Data and Method 3. Analysis Results 4. Summary 5. Conclusions 6.

Jungho Im, PhD (ersgis@unist.ac.kr) School of Urban and Environmental Engineering Ulsan National

ri RISE to the Challenges of AI Systems Joseph E. Gonzalez - PowerPoint PPT Presentation

ri RISE to the Challenges of AI Systems Joseph E. Gonzalez Assistant Professor, UC Berkeley jegonzal@cs.berkeley.edu Training Big Data Big Model Large-Scale parallel and distributed systems Training Big Data Big Model Training Big

Design Patterns Dependency Injection Oliver Haase 1 Motivation A simple, motivating example

Data Data is PROOF! No Time to Train | April 2 Participation Fruit, vegetable, white milk, or

Solving Linear and Integer Programs Robert E. Bixby ILOG, Inc. and Rice University Ed Rothberg

Section 8: Design Patterns Slides by Alex Mariakakis with material from David Mailhot, Hal

Swayam Distributed Autoscaling for Machine Learning as a Service Sameh Elnikety, Arpan Gujarati,

How many servings of CoolWhip in an 8 oz container? A. less than 10 B. between 10 and 20 C.

Fitting parametric distributions using R : the fitdistrplus package M. L. Delignette-Muller - CNRS

VOTD: Buffer Overflow Engineering Secure Software Last Revised: August 17, 2020 SWEN-331:

Week 2 - Monday What did we talk about last time? Software development Data

Week 4 - Friday What did we talk about last time? Examples switch statements

Slide 1 Hello Everyone, Welcome to the Centers for Medicare &amp; Medicare Advantage Medicaid

Yashar Ganjali Department of Computer Science University of Toronto HotI 2012 Santa Clara, CA

Apache James: more than emails in the cloud Ioan Eugen Stan Berlin Buzzwords 2012 About myself

Co vid 19 Bio E ngine e ring Adviso ry Bo ard We binar T he rape utic s Covid-19

AIRS impact on analysis and forecast of extreme precipitation events in the tropics with a

drought defined impacts causes atmospheric blocking measures of drought: Drought Index A

Surface Observations We now look at some hourly surface observations to study the frontal

The Challenge of Natural Hazards This PowerPoint will cover information on: Natural Hazards

6th Grade Weather &amp; Climate and Natural Hazards 2015-10-15 www.njctl.org Slide 3 / 161

ENSC 408: Lab 6 Radar Imagery Interpretation October 22 nd , 2019 Lab 4 Marks General Comments:

China Regional Reanalysis : One-year Preliminary Experiments and Evaluation of First Stage (1998

Last-Mile Hazard Warning System in Sri Lanka: Lessons Leaned from the Pilot Project LIRNE asia

Outline 1. Introduction 2. Data and Method 3. Analysis Results 4. Summary 5. Conclusions 6.

Jungho Im, PhD (ersgis@unist.ac.kr) School of Urban and Environmental Engineering Ulsan National

Slide 1 Hello Everyone, Welcome to the Centers for Medicare & Medicare Advantage Medicaid

6th Grade Weather & Climate and Natural Hazards 2015-10-15 www.njctl.org Slide 3 / 161