bayesian optimization of gaussian processes applied to
play

Bayesian Optimization of Gaussian Processes applied to Performance - PowerPoint PPT Presentation

Bayesian Optimization of Gaussian Processes applied to Performance Tuning Ramki Ramakrishna @ysr1729 #TwitterVMTeam QCon Sao Paulo, 2019 A JVM Engineer talks to a Data Scientist 2 Many Hundreds of Services Several Tens of Thousands


  1. Bayesian Optimization of Gaussian Processes applied to Performance Tuning Ramki Ramakrishna @ysr1729 #TwitterVMTeam QCon Sao Paulo, 2019

  2. A JVM Engineer talks to a Data Scientist � 2

  3. Many Hundreds of Services

  4. Several Tens of Thousands of Physical Servers

  5. Several Millions of CPU Cores

  6. Several Hundreds of Thousands of Twitter JVMs

  7. A Few Hundred Tunable JVM Parameters

  8. Mining for Gold • 1930’s South Africa • Prospecting for gold and other minerals • Daniel Krige, 1951: “Kriging” in geostatistics • Jonas Mockus 70’s • Jones et al. 80’s • Rasmussen & Williams 90’s : Gaussian Processes � 12

  9. Applications • design of expensive experiments • optimal designs • optimization of engineered materials • hyperparameter tuning (architectural parameters) of neural networks � 13

  10. Engineering as Optimization • linear or non-linear objective function • finite convex or non-convex space, rectangular, linear (a ffi ne) or non-linear constraints • black box objective function • black box constraints • noisy objective function • noisy constraints � 14

  11. Black Box Modeling • Model the unknown objective function • Model the unknown constraints • Model is a “surrogate” • Evaluations are expensive � 15

  12. Models and Model Parameters • Parametric models • Non-parametric models � 16

  13. Probabilistic Models • A measure of our uncertainty • A measure of measurement/observation noise � 17

  14. Gaussian Process GP ( μ , κ ) • : mean function μ ( x ) κ ( x , x ′ � ) • : covariance function � 18

  15. Gaussian Process • Two di ff erent views • a vector of possibly uncountably many Gaussian variables with given mean and a joint covariate distribution • a Gaussian distribution over functions � 19

  16. Gaussian Process � 20

  17. Gaussian Process � 21

  18. Gaussian Process � 22

  19. Gaussian Process � 23

  20. Gaussian Process � 24

  21. Gaussian Process � 25

  22. Gaussian Process � 26

  23. Gaussian Process � 27

  24. GP n μ n ( x ) = κ T ( K + σ 2 noise I ) − 1 Y κ n ( x , x ′ � ) = κ ( x , x ′ � ) − κ T ( K + σ 2 noise I ) − 1 κ ′ � � 28

  25. Covariance Kernel Function • Squared exponentials (SE) • “n/2" Matern kernels � 29

  26. Covariance Kernel Functions � 30

  27. Covariance Kernel Functions � 31

  28. Acquisition Function GP prior + Data n → Bayes GP n → ? x n +1 � 32

  29. Acquisition Function GP prior + Data n +1 → Bayes GP n +1 → ? x n +2 � 33

  30. Acquisition Functions • Thompson Sampling from the posterior GP (TS) • Probability of Improvement (PI) • Upper Confidence Bound (UCB) • Expected Improvement (EI) � 34

  31. Thompson Sampling � 35

  32. Probability of Improvement � 36

  33. Upper Confidence Bound � 37

  34. Expected Improvement � 38

  35. Acquisition Function • Thompson Sampling from the posterior GP (TS) • Probability of Improvement (PI) • Upper Confidence Bound (UCB) • Expected Improvement (EI) � 39

  36. Maximizing the Acquisition Function • piecewise infinitely smooth • gradient-based techniques work • modified Monte-Carlo techniques are typically used � 40

  37. Optimizing Performance Parameters myExpt = Optimizer.declareDevice(Parm1: {Int, Min1, Max1}, Parm2: {Real, Min2, Max2}, Parm3: {Enum, enum1, enum2, enum3} …) myExpt.setSLA(…) // set performance SLA myExpt.setTerminationCriteria(…) // set termination criteria while (!myExpt.shouldTerminate()) { parmSuggestion = myExpt.suggest() // get another test suggestion newRun = myDevice.test(parmSuggestion) // test device at given setting if (myExpt.isValid(newRun)) { // is SLA met? myExpt.update(parmSuggestion, newRun) // update w/new result } } return myExpt.bestConfig()

  38. Bayesian Optimization myExpt = Optimizer.declareDevice(Parm1: {Int, Min1, Max1}, Parm2: {Real, Min2, Max2}, Parm3: {Enum, enum1, enum2, enum3} …) myExpt.setSLA(…) // set performance SLA myExpt.setTerminationCriteria(…) // set termination criteria while (!myExpt.shouldTerminate()) { parmSuggestion = myExpt.suggest() // get another test suggestion newRun = myDevice.test(parmSuggestion) // test device at given setting if (myExpt.isValid(newRun)) { // is SLA met? myExpt.update(parmSuggestion, newRun) // update w/new result } } return myExpt.bestConfig()

  39. Constraints myExpt = Optimizer.declareDevice(Parm1: {Int, Min1, Max1}, Parm2: {Real, Min2, Max2}, Parm3: {Enum, enum1, enum2, enum3} …) myExpt.setSLA(…) // set performance SLA myExpt.setTerminationCriteria(…) // set termination criteria while (!myExpt.shouldTerminate()) { parmSuggestion = myExpt.suggest() // get another test suggestion newRun = myDevice.test(parmSuggestion) // test device at given setting if (myExpt.isValid(newRun)) { // is SLA met? myExpt.update(parmSuggestion, newRun) // update w/new result } } return myExpt.bestConfig()

  40. AUTOTUNE AS A SERVICE

  41. GizmoDuck & Garbage Collection Overhead via Tuning JVM Parameters

  42. TweetyPie & CPU Utilization via Tuning Graal JIT Parameters

Recommend


More recommend