Cuttlefish A Lightweight Primitive for Online Tuning by Tomer - PowerPoint PPT Presentation

Tuning Convolution with Cuttlefish def loopConvolve(image, filters): … def fftConvolve(image, filters): … def mmConvolve(image, filters): … tuner = Tuner([loopConvolve, fftConvolve, mmConvolve]) for image, filters in convolutions: convolve, token = tuner.choose() start = now() result = convolve(image, filters) elapsedTime = now() - start reward = computeReward(elapsedTime) tuner.observe(token, reward) output result 14

Cuttlefish I. Problem & Motivation II. The Cuttlefish API III.Bandit-based Online Tuning IV. Distributed Tuning Approach V. Contextual Tuning VI. Handling Nonstationary Settings VII.Other Operators VIII.Conclusion 15

Approach: Tuning 16

Approach: Tuning Multi-armed Bandit Problem 16

Approach: Tuning Multi-armed Bandit Problem • K possible choices (called arms) 16

Approach: Tuning Multi-armed Bandit Problem • K possible choices (called arms) • Arms have unknown reward distributions 16

Approach: Tuning Multi-armed Bandit Problem • K possible choices (called arms) • Arms have unknown reward distributions • At each round: select an Arm and observe a reward 16

Approach: Tuning Multi-armed Bandit Problem • K possible choices (called arms) • Arms have unknown reward distributions • At each round: select an Arm and observe a reward Goal: Maximize Cumulative Reward (by balancing exploration & exploitation) 16

Thompson Sampling 17

Thompson Sampling Belief distributions about expected reward Reward Arm 1 Arm 2 Arm 3 Arm 4 17

Thompson Sampling Reward Arm 1 Arm 2 Arm 3 Arm 4 18

Thompson Sampling Better arms chosen more often Reward Arm 1 Arm 2 Arm 3 Arm 4 21

Thompson Sampling 22

Thompson Sampling • Gaussian runtimes with initially unknown means and variances 22

Thompson Sampling • Gaussian runtimes with initially unknown means and variances • Belief distributions form t-distributions • Depend only on sample mean, variance, count 22

Thompson Sampling • Gaussian runtimes with initially unknown means and variances • Belief distributions form t-distributions • Depend only on sample mean, variance, count • No meta-parameters, yet works well for diverse operators 22

Thompson Sampling • Gaussian runtimes with initially unknown means and variances • Belief distributions form t-distributions • Depend only on sample mean, variance, count • No meta-parameters, yet works well for diverse operators • Constant memory overhead, 0.03 ms per tuning round 22

Convolution Evaluation 23

Convolution Evaluation • Prototype in Apache Spark 23

Convolution Evaluation • Prototype in Apache Spark • Tune between three convolution algorithms (Nested Loops, FFT, or Matrix Multiply) • Reward: -1*elapsedTime (maximizes throughput) 23

Convolution Evaluation • Prototype in Apache Spark • Tune between three convolution algorithms (Nested Loops, FFT, or Matrix Multiply) • Reward: -1*elapsedTime (maximizes throughput) • Convolve 8000 Flickr images with sets of filters (~32gb) • Vary number & size of filters 23

Convolution Evaluation • Prototype in Apache Spark • Tune between three convolution algorithms (Nested Loops, FFT, or Matrix Multiply) • Reward: -1*elapsedTime (maximizes throughput) • Convolve 8000 Flickr images with sets of filters (~32gb) • Vary number & size of filters • Run on an 8-node (AWS EC2 4-core r3.xlarge) cluster. • 32 total cores, ~252 images per core 23

Convolution Evaluation • Prototype in Apache Spark • Tune between three convolution algorithms (Nested Loops, FFT, or Matrix Multiply) • Reward: -1*elapsedTime (maximizes throughput) • Convolve 8000 Flickr images with sets of filters (~32gb) • Vary number & size of filters • Run on an 8-node (AWS EC2 4-core r3.xlarge) cluster. • 32 total cores, ~252 images per core • *Very* compute intensive • (Some configs up to 45 min on a single node) 23

Convolution Results Relative throughput normalized against the highest-throughput algorithm 24

Cuttlefish I. Problem & Motivation II. The Cuttlefish API III. Bandit-based Online Tuning IV. Distributed Tuning Approach V. Contextual Tuning VI. Handling Nonstationary Settings VII.Other Operators VIII.Conclusion 25

Challenges in Distributed Tuning 26

Challenges in Distributed Tuning 1. Choosing and observing occur throughout a cluster • To maximize learning, need to communicate 26

Challenges in Distributed Tuning 1. Choosing and observing occur throughout a cluster • To maximize learning, need to communicate 2. Synchronization & communication overheads 26

Challenges in Distributed Tuning 1. Choosing and observing occur throughout a cluster • To maximize learning, need to communicate 2. Synchronization & communication overheads 3. Feedback delay • How many times is `choose’ called before an earlier reward is observed? • Fortunately, theoretically sound to have delays 26

Distributed Tuning Approach 27

Distributed Tuning Approach Centralized Tuner Choose/Observe Machine 1 Machine 2 Machine 3 27

Distributed Tuning Approach Independent Tuners, Centralized Tuner Centralized Store Choose/Observe Push Local / Pull Global Machine 1 Machine 1 Global Model Machine 2 Machine 2 Store Machine 3 Machine 3 27

Distributed Tuning Approach Independent Tuners, Centralized Tuner Centralized Store Choose/Observe Push Local / Pull Global Machine 1 Machine 1 Global Model Machine 2 Machine 2 Store Machine 3 Machine 3 Peer-to-Peer is also a possibility, but requires more communication 27

Distributed Tuning Approach Worker 1 Local State Model Store Local State Non-local State Worker 1: Thread 1 Thread 2 Thread 3 Local State Non-local State Worker 2: Worker 2 Local State Local State Non-local State *On Master or a Parameter Server* Thread 1 Thread 2 Thread 3 … 28

Distributed Tuning Approach Worker 1 Local State Model Store Local State Non-local State Worker 1: Thread 1 Thread 2 Thread 3 Local State Non-local State Worker 2: Worker 2 Local State Local State Non-local State *On Master or a Parameter Server* Thread 1 Thread 2 Thread 3 • When choosing: aggregate local & non-local state … 28

Distributed Tuning Approach Worker 1 Local State Model Store Local State Non-local State Worker 1: Thread 1 Thread 2 Thread 3 Local State Non-local State Worker 2: Worker 2 Local State Local State Non-local State *On Master or a Parameter Server* Thread 1 Thread 2 Thread 3 • When choosing: aggregate local & non-local state • When observing: update the local state … 28

Distributed Tuning Approach Worker 1 Local State Model Store Local State Non-local State Worker 1: Thread 1 Thread 2 Thread 3 Local State Non-local State Worker 2: Worker 2 Local State Local State Non-local State *On Master or a Parameter Server* Thread 1 Thread 2 Thread 3 • When choosing: aggregate local & non-local state • When observing: update the local state • Model store aggregates non-local state … 28

Results with Distributed Approach Relative throughput normalized against the highest-throughput algorithm 29

Results with Distributed Approach Throughput normalized against an ideal oracle that always picks the fastest algorithm 30

Cuttlefish I. Problem & Motivation II. The Cuttlefish API III. Bandit-based Online Tuning IV. Distributed Tuning Approach V. Contextual Tuning (by learning cost models) VI. Handling Nonstationary Settings VII.Other Operators VIII.Conclusion 31

Contextual Tuning 32

Contextual Tuning • Best physical operator for each round may depend on current context • e.g. convolution performance depends on the image & filter dimensions 32

Contextual Tuning • Best physical operator for each round may depend on current context • e.g. convolution performance depends on the image & filter dimensions • Users may know important context features • e.g. from the asymptotic algorithmic complexity 32

Contextual Tuning • Best physical operator for each round may depend on current context • e.g. convolution performance depends on the image & filter dimensions • Users may know important context features • e.g. from the asymptotic algorithmic complexity • Users can specify context in Tuner.choose 32

Contextual Tuning Algorithm 33

Contextual Tuning Algorithm • Linear contextual Thompson sampling learns a linear model that maps features to rewards 33

Contextual Tuning Algorithm • Linear contextual Thompson sampling learns a linear model that maps features to rewards • Feature Normalization & Regularization • Increased robustness towards feature choices 33

Contextual Tuning Algorithm • Linear contextual Thompson sampling learns a linear model that maps features to rewards • Feature Normalization & Regularization • Increased robustness towards feature choices • Effectively learns a cost model 33

Cuttlefish A Lightweight Primitive for Online Tuning by Tomer - PowerPoint PPT Presentation

Cuttlefish A Lightweight Primitive for Online Tuning by Tomer Kaftan (UW), Magdalena Balazinska (UW), Alvin Cheung (UW), Johannes Gehrke (Microsoft) 1 Logical Operators have multiple physical Operators The system should automatically

Cuttlefish Lightweight Primitives for Online Tuning by Tomer Kaftan (UW), Magdalena Balazinska

Cuttlefish easing the pain of erlang application configuration Joe DeVivo erlanger @ basho

Sea Monsters: Myth and Mystery Phillip G. Lee, Sr. Research Scientist Gulf Coast Research

Cutting a part from many measures Nevena Pali 6th BMS Student Conference Nevena Pali

Lecture 16: Dynamic Programming - Pole Cutting COMS10007 - Algorithms Dr. Christian Konrad

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

Cutting Plane Separators in SCIP Kati Wolter Zuse Institute Berlin DFG Research Center M ATHEON

Traffic Measurement Activities of the WIDE project Kenjiro Cho IIJ Research Lab traffic

UNIFYING CONCEPTS IN PHYSICS, BIOPHYSICS & MATERIALS SCIENCE Avadh Saxena (Los Alamos

CS440/ECE448: Artificial Intelligence Lecture 2: History and Themes Slides by Svetlana Lazebnik,

Complexation of Synthetic Organic Dye Dapoxyl with cyclodextrins studied by fluorescence

Designing amorphous dispersion formulations for poorly soluble drugs Ian Yates Product

Investigation of an Unusual Phase Transition Freezing on heating of liquid solution Calin Gabriel

Building and Installing All material from this tutorial can be found at:

Shoppers of the future 27 th June 2018 Vanessa Henry Shopper Insight Manager The future is

10/26/20 Kratom or Baitem: History, Pharmacology, PK, and Regulation Revisited Jeffrey Fudin,

MICROBIAL ELECTROCHEMICAL AND FUEL CELLS CEE 597T Electrochemical Water and Wastewater Treatment

Characterization of mixtures and intermolecular interactions Petr V. Konarev European Molecular

Analysis of mixtures tutorial Petr V. Konarev European Molecular Biology Laboratory, Hamburg

Treatment of high risk MDS Valeria Santini MDS Unit, AOU Careggi, Universit di Firenze

Differential Dynamic Logic for Verifying Parametric Hybrid Systems e Platzer 1 , 2 Andr 1

Description Logic: Axioms and Rules Ian Horrocks horrocks@cs.man.ac.uk University of Manchester

Candidate Tech: SWRL W3C Workshop on Rules April, 2005 Bijan Parsia mindswap maryland

Introduction to ML & DL Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science,

Cuttlefish A Lightweight Primitive for Online Tuning by Tomer - PowerPoint PPT Presentation

Cuttlefish A Lightweight Primitive for Online Tuning by Tomer Kaftan (UW), Magdalena Balazinska (UW), Alvin Cheung (UW), Johannes Gehrke (Microsoft) 1 Logical Operators have multiple physical Operators The system should automatically

Cuttlefish Lightweight Primitives for Online Tuning by Tomer Kaftan (UW), Magdalena Balazinska

Cuttlefish easing the pain of erlang application configuration Joe DeVivo erlanger @ basho

Sea Monsters: Myth and Mystery Phillip G. Lee, Sr. Research Scientist Gulf Coast Research

Cutting a part from many measures Nevena Pali 6th BMS Student Conference Nevena Pali

Lecture 16: Dynamic Programming - Pole Cutting COMS10007 - Algorithms Dr. Christian Konrad

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

Cutting Plane Separators in SCIP Kati Wolter Zuse Institute Berlin DFG Research Center M ATHEON

Traffic Measurement Activities of the WIDE project Kenjiro Cho IIJ Research Lab traffic

UNIFYING CONCEPTS IN PHYSICS, BIOPHYSICS &amp; MATERIALS SCIENCE Avadh Saxena (Los Alamos

CS440/ECE448: Artificial Intelligence Lecture 2: History and Themes Slides by Svetlana Lazebnik,

Complexation of Synthetic Organic Dye Dapoxyl with cyclodextrins studied by fluorescence

Designing amorphous dispersion formulations for poorly soluble drugs Ian Yates Product

Investigation of an Unusual Phase Transition Freezing on heating of liquid solution Calin Gabriel

Building and Installing All material from this tutorial can be found at:

Shoppers of the future 27 th June 2018 Vanessa Henry Shopper Insight Manager The future is

10/26/20 Kratom or Baitem: History, Pharmacology, PK, and Regulation Revisited Jeffrey Fudin,

MICROBIAL ELECTROCHEMICAL AND FUEL CELLS CEE 597T Electrochemical Water and Wastewater Treatment

Characterization of mixtures and intermolecular interactions Petr V. Konarev European Molecular

Analysis of mixtures tutorial Petr V. Konarev European Molecular Biology Laboratory, Hamburg

Treatment of high risk MDS Valeria Santini MDS Unit, AOU Careggi, Universit di Firenze

Differential Dynamic Logic for Verifying Parametric Hybrid Systems e Platzer 1 , 2 Andr 1

Description Logic: Axioms and Rules Ian Horrocks horrocks@cs.man.ac.uk University of Manchester

Candidate Tech: SWRL W3C Workshop on Rules April, 2005 Bijan Parsia mindswap maryland

Introduction to ML &amp; DL Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science,

UNIFYING CONCEPTS IN PHYSICS, BIOPHYSICS & MATERIALS SCIENCE Avadh Saxena (Los Alamos

Introduction to ML & DL Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science,