Machine Learning Pipelines Marco Serafini COMPSCI 532 Lecture 21

Training vs. Inference • Training: data à model • Computationally expensive • No hard real-time requirements (typically) •Inference: data + model à prediction • Computationally cheaper • Real-time requirements (sometimes sub-millisecond) • Today we talk about inference 3 3

Lifecycle 4 4

Challenge: Different Frameworks • Different training frameworks, each has its strengths • E.g.: Caffe for computer vision, HTK for speech recognition • Each uses different formats à tailored deployment • Best tool may change over time • Solution: model abstraction 5 5

Challenge: Prediction Latency • Many ML models have high prediction latency • Some are too slow to use online, e.g., when choosing an ad • Combining model outputs makes it worse • Trade-off between accuracy and latency • Solutions • Adaptive batching • Enable mixing models with different complexity • Straggler mitigation when using multiple models 6 6

Challenge: Model Selection • How to decide which models to deploy? • Selecting the best model offline is expensive • Best model changes over time • Concept drift: relationships in data change over time • Feature corruption • Combining multiple models can increase accuracy • Solution: automatically select among multiple models 7 7

Overview • Requests flow top to bottom and back • We start reviewing the Model Abstraction Layer Project 3 8 8

Caching • Stores prediction results • Avoids rerunning inference on recent predictions • Enables correlating prediction with feedback • Useful when selecting one model 9 9

Batching • Maximize batch size given upper bound on latency • Advantages of batching • Fewer RPC requests • Data-parallel optimizations (e.g. using GPUs) • Different queue/batch size per model container • Some systems like TensorFlow require static batch sizes • Adaptive Batch Sizing: AIMD • Additively increase batch size until exceed latency threshold • Scale down by 10% 10 10

Benefits of (Adaptive) Batching up to 26x throughput increase 11 11

Per-Model Batch size • Different models have different optimal batch sizes • Linear latency growth, easy to predict with AIDM 12 12

Delayed Batching • When a batch is done and the next is not full, wait • Not always beneficial 13 13

Model Containers 14 14

Model Containers • Docker containers • API to be implemented • State (parameters) passed during initialization • No other state management • Clipper replicates containers as needed 15 15

Effect of Replication • 10 GB network: GPU bottleneck, scales out • 1 GB network: network bottleneck does not scale out 16 16

Model Selection 17 17

Model Selection • Enables running multiple models • Advantages • Combine outputs from different models (if run in parallel) • Estimate prediction accuracy (through comparison) • Switch to better model (when feedback available) • Disadvantage of running models in parallel: stragglers • They can often be ignored with minimal accuracy loss • Context: different model selection state per user or session 18 18

Model Selection API S: Selection policy state X: Input Y: Prediction/Feedback incorporate feedback 19 19

Single-Model Selection • Multi-Armed Bandit • Select one action, observe outcome • Decide whether to explore a new action or exploit current one • Exp3 algorithm • Choose an action based on a probability distribution • Adjust probably distribution of current choice based on loss 20 20

Multi-Model Ensembles 21 21

Ensembles and Changing Accuracy 22 22

Ensembles and Stragglers Ensembles and Stragglers 23 24 23 24

Personalized Model Selection • Model selection can be done per-user 24 24

TensorFlow Serving • Inference mechanism of TensorFlow • Can run TensorFlow models • Also uses batching (static) • Missing features • Latency objectives • No support for multiple models • No feedback 25 25

Machine Learning Pipelines Marco Serafini COMPSCI 532 Lecture 21 - PowerPoint PPT Presentation

Machine Learning Pipelines Marco Serafini COMPSCI 532 Lecture 21 Training vs. Inference Training: data model Computationally expensive No hard real-time requirements (typically) Inference: data + model prediction

Licensed Pipelines & the Planning System Council Briefing 2019 Critical Infrastructure

COMPLETED PIPELINES FT Completed Pipelines SNOWSWICK BLUNSDEN - 2019 Instalcom for Thames

UK COMPLETED PIPELINES FT Completed Pipelines WING PIPELINE ANGLIAN WATER 1000mm water

Planning Near Transmission Pipelines Planning Near Transmission Pipelines Meghan Thoreau, planner

CS 104 Computer Organization and Design Fancy Pipelines: not just scalar in-order CS104: Fancy

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Pipelines on Pipelines: Creating Agile CI/CD Workflows for Airflow DAGs By Victor Shafran CPO

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Pipelines and Informed Planning Alliance (PIPA) Pipelines and Informed Planning Alliance (PIPA)

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Clipper A Low-Latency Online Prediction Serving System Dan Crankshaw crankshaw@cs.berkeley.edu

Faster Region-based Hotspot Detection Ran Chen 1 , Wei Zhong 2 , Haoyu Yang 1 , Hao Geng 1 , Xuan

51 live sites in 37 languages

61A Extra Lecture 4 Thursday, February 19

Prediction Serving what happens after learning? Joseph E. Gonzalez Asst. Professor, UC Berkeley

Diodes Waveform shaping Circuits Lecture notes: page 2-20 to 2-31 Sedra & Smith (6 th Ed):

Clipper Breathing Life into Cultural Collections and Archives John Casey 1 , Trevor Collins 2

HOW CRYPTO FAILS IN PRACTICE CMSC 414 APR 3 2018 POOR PROGRAMING CryptoLint tool to perform