Bayesian Optimization of Gaussian Processes applied to Performance - PowerPoint PPT Presentation

Bayesian Optimization of Gaussian Processes applied to Performance Tuning Ramki Ramakrishna @ysr1729 #TwitterVMTeam QCon Sao Paulo, 2019

A JVM Engineer talks to a Data Scientist � 2

Many Hundreds of Services

Several Tens of Thousands of Physical Servers

Several Millions of CPU Cores

Several Hundreds of Thousands of Twitter JVMs

A Few Hundred Tunable JVM Parameters

Mining for Gold • 1930’s South Africa • Prospecting for gold and other minerals • Daniel Krige, 1951: “Kriging” in geostatistics • Jonas Mockus 70’s • Jones et al. 80’s • Rasmussen & Williams 90’s : Gaussian Processes � 12

Applications • design of expensive experiments • optimal designs • optimization of engineered materials • hyperparameter tuning (architectural parameters) of neural networks � 13

Engineering as Optimization • linear or non-linear objective function • finite convex or non-convex space, rectangular, linear (a ffi ne) or non-linear constraints • black box objective function • black box constraints • noisy objective function • noisy constraints � 14

Black Box Modeling • Model the unknown objective function • Model the unknown constraints • Model is a “surrogate” • Evaluations are expensive � 15

Models and Model Parameters • Parametric models • Non-parametric models � 16

Probabilistic Models • A measure of our uncertainty • A measure of measurement/observation noise � 17

Gaussian Process GP ( μ , κ ) • : mean function μ ( x ) κ ( x , x ′ � ) • : covariance function � 18

Gaussian Process • Two di ff erent views • a vector of possibly uncountably many Gaussian variables with given mean and a joint covariate distribution • a Gaussian distribution over functions � 19

Gaussian Process � 20

GP n μ n ( x ) = κ T ( K + σ 2 noise I ) − 1 Y κ n ( x , x ′ � ) = κ ( x , x ′ � ) − κ T ( K + σ 2 noise I ) − 1 κ ′ � � 28

Covariance Kernel Function • Squared exponentials (SE) • “n/2" Matern kernels � 29

Covariance Kernel Functions � 30

Covariance Kernel Functions � 31

Acquisition Function GP prior + Data n → Bayes GP n → ? x n +1 � 32

Acquisition Function GP prior + Data n +1 → Bayes GP n +1 → ? x n +2 � 33

Acquisition Functions • Thompson Sampling from the posterior GP (TS) • Probability of Improvement (PI) • Upper Confidence Bound (UCB) • Expected Improvement (EI) � 34

Thompson Sampling � 35

Probability of Improvement � 36

Upper Confidence Bound � 37

Expected Improvement � 38

Acquisition Function • Thompson Sampling from the posterior GP (TS) • Probability of Improvement (PI) • Upper Confidence Bound (UCB) • Expected Improvement (EI) � 39

Maximizing the Acquisition Function • piecewise infinitely smooth • gradient-based techniques work • modified Monte-Carlo techniques are typically used � 40

Optimizing Performance Parameters myExpt = Optimizer.declareDevice(Parm1: {Int, Min1, Max1}, Parm2: {Real, Min2, Max2}, Parm3: {Enum, enum1, enum2, enum3} …) myExpt.setSLA(…) // set performance SLA myExpt.setTerminationCriteria(…) // set termination criteria while (!myExpt.shouldTerminate()) { parmSuggestion = myExpt.suggest() // get another test suggestion newRun = myDevice.test(parmSuggestion) // test device at given setting if (myExpt.isValid(newRun)) { // is SLA met? myExpt.update(parmSuggestion, newRun) // update w/new result } } return myExpt.bestConfig()

Bayesian Optimization myExpt = Optimizer.declareDevice(Parm1: {Int, Min1, Max1}, Parm2: {Real, Min2, Max2}, Parm3: {Enum, enum1, enum2, enum3} …) myExpt.setSLA(…) // set performance SLA myExpt.setTerminationCriteria(…) // set termination criteria while (!myExpt.shouldTerminate()) { parmSuggestion = myExpt.suggest() // get another test suggestion newRun = myDevice.test(parmSuggestion) // test device at given setting if (myExpt.isValid(newRun)) { // is SLA met? myExpt.update(parmSuggestion, newRun) // update w/new result } } return myExpt.bestConfig()

Constraints myExpt = Optimizer.declareDevice(Parm1: {Int, Min1, Max1}, Parm2: {Real, Min2, Max2}, Parm3: {Enum, enum1, enum2, enum3} …) myExpt.setSLA(…) // set performance SLA myExpt.setTerminationCriteria(…) // set termination criteria while (!myExpt.shouldTerminate()) { parmSuggestion = myExpt.suggest() // get another test suggestion newRun = myDevice.test(parmSuggestion) // test device at given setting if (myExpt.isValid(newRun)) { // is SLA met? myExpt.update(parmSuggestion, newRun) // update w/new result } } return myExpt.bestConfig()

AUTOTUNE AS A SERVICE

GizmoDuck & Garbage Collection Overhead via Tuning JVM Parameters

TweetyPie & CPU Utilization via Tuning Graal JIT Parameters

Bayesian Optimization of Gaussian Processes applied to Performance - PowerPoint PPT Presentation

Bayesian Optimization of Gaussian Processes applied to Performance Tuning Ramki Ramakrishna @ysr1729 #TwitterVMTeam QCon Sao Paulo, 2019 A JVM Engineer talks to a Data Scientist 2 Many Hundreds of Services Several Tens of Thousands

My research over Bayesian Optimization and Gaussian Processes Eduardo C. GarridoMerch an

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CSC2541 Lecture 2 Bayesian Occams Razor and Gaussian Processes Roger Grosse Roger Grosse

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Another introduction to Gaussian Processes Richard Wilkinson School of Maths and Statistics

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior Zi Wang*

Fast Laplace Approximation for Gaussian Ketter Processes with a Tensor Product Kernel

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 Lei Tang Kernel Methods

Convex Analysis in Stochastic Teams and Asymptotic Optimality of Finite Model Representations and

Time-Synchronization in Mobile Sensor Networks from Difference Measurements Chenda Liao and

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

TV Ads Attribution and Gaussian Processes Adrin Jalali November 16, 2016 1 / 27 Problem

An Extension of the Divergence Operator for Gaussian Processes Jorge A. Len Departamento de

Bayesian Optimization in Reduced Eigenbases David Gaudrie 1 , Rodolphe Le Riche 2 , Victor Picheny