How to Make Decisions (Optimally) Siddhartha Sen Microsoft - PowerPoint PPT Presentation

How to Make Decisions (Optimally) Siddhartha Sen Microsoft Research NYC

AI for Systems • Vision: Infuse AI to optimize cloud infrastructure decisions, while being: • Minimally disruptive (agenda: Harvesting Randomness ) • Synergistic with human solutions (agenda : HAIbrid algorithms ) • Safe and reliable (agenda : Safeguards ) • Impact: Above criteria differentiate us, ensure wider‐spread impact • Team: • MSR NYC, MSR India • Azure: Azure Compute, Azure Frontdoor • Universities: Columbia, NYU, Princeton, Yale, UBC, U. Chicago, Cornell

Vision: Safe optimization without disruption Fuzzer Fuzzer Evaluate alternatives without disrupting? System System Optimizer Optimizer Safeguard Safeguard (complex) (complex)

Roadmap • A framework for making systematic decisions: Reinforcement Learning • A way to reason about decisions in the past: Counterfactual Evaluation • How to make this work in cloud systems? • Successes, fundamental obstacles, workarounds

Decisions in the real world action policy context reward reward Which policy maximizes my total reward?

Reinforcement learning (RL) reward Which policy maximizes my total reward?

Example: online news articles (MSN) article on top user, browse history clicked/ignored

Example: machine health (Azure cloud) wait time before reboot machine, failure hist total downtime

Example: commute options bike, subway, car weather, traffic trip time, cost

Example: online dating match user, dating hist length of relationship

Reinforcement learning reflects real life • Traditional (supervised) machine learning needs the answer as input �  dog, cat, … � � � gives you the full answer train a model: �  �

Reinforcement learning reflects real life • RL interacts with environment, learns from feedback action � �, � only gives a partial answer context � train a policy: �  � reward �

How to learn in an RL setting? • Explore to learn about new actions • Incorporate reward feedback • Do this systematically! (Humans are not good at this)

Simple example: online news articles Humans are Humans are bad at this bad at this Policy A (Career) Clicked Humans are Humans are bad at this bad at this Policy B (Location) Ignored This is an A/B test!

Simple example: online news articles Policy A (Career) Clicked Policy space Giant table Policy B (Location) Ignored RL: richer policy space, richer representation

Aside: Deep Reinforcement Learning! • Superhuman ability in Go, Chess • Lots of engineering/tweaking • Learning from self‐play not new • Far from AI apocalypse • But (opinion): a glimpse of a more subtle, subconscious overtaking

Policy A (e.g. Career) Clicked Policy space Giant table Policy B (e.g. Location) Ignored

Testing policies online is inefficient Policy A (e.g. Career) Clicked Policy space Giant table Policy B (e.g. Location) Ignored • Costly (prod deployment) • Risky (live user traffic) • Slow (split 100% of traffic)

Testing policies online is inefficient Policy A (e.g. Career) Clicked Policy space Giant table Policy B (e.g. Location) Ignored Instead: randomize directly over actions Problem: randomizing over policies Collect data first, then evaluate policies after‐the‐fact

Test policies offline! … Clicked Clicked Ignored Clicked Later evaluate Gender policy: Later evaluate Location policy: Later evaluate Career policy: Engineer Engineer Engineer Teacher Texas Seattle Seattle Texas Female Male Female Male

Counterfactual evaluation (testing policies offline) • Ask “what if” questions about the past: how would this new policy have performed if I had run it? • Basic idea: Use (randomized) decisions made by a deployed policy to match/evaluate decisions the new policy would make: �� • Problem: deployed policy’s decisions may be biased

Counterfactual evaluation (testing policies offline) • Ask “what if” questions about the past: how would this new policy have performed if I had run it? • Basic idea: Use (randomized) decisions made by a deployed policy to match/evaluate decisions the new policy would make: Use probabilities to over/underweight decisions �� • Test many different policies on the same dataset, offline!

RL + Counterfactual Evaluation • Very powerful combination: evaluate a billion policies offline, find the best one • Exponential boost over online A/B testing Can we apply this paradigm Can we apply this paradigm to cloud systems? to cloud systems?

Example: machine health (Azure Compute) wait time before reboot machine, failure hist total downtime

Example: TCP config (Azure Frontdoor) TCP parameter settings OS, locale, traffic type CPU utilization

Example: replica selection (Azure LB) replica to handle request req, replica loads latency

What if… • … we waited a different amount of time before rebooting? • … we used different TCP settings on an edge proxy machine? • … we sent a request to a different replica? Counterfactual evaluation! Counterfactual evaluation!

Counterfactual evaluation in Systems • Opportunity: Many systems are naturally randomized • Load balancing, data replicas, cache eviction, fault handling, etc. • When we need to spread things, when choices are ambiguous  Free exploration! • Opportunity: Many systems provide implicit feedback • Naïve defaults, conservative parameter settings • Worse settings yield more information  Free feedback!

Counterfactual evaluation in Systems Challenge Technique Mess of methods/techniques Taxonomy spanning multiple disciplines Huge action spaces (coverage) Spatial coarsening Stateful, non‐independent Temporal coarsening, decisions Time horizons Dynamic environments (Baseline normalization)

Taxonomy for counterfactual evaluation Full Supervised Learning Feedback? Feedback? Randomize/explore Direct method Partial No Randomization? Randomization? ? Yes Reinforcement Learning Yes Independent Independent (contextual bandits) decisions? decisions? Reinforcement Learning No Unbiased estimator (DR) (general) Unbiased estimator + time horizon (DR‐Time)

Example: Machine health in Azure Compute • Wait for some time, then reboot …

Example: Machine health in Azure Compute • Wait for some time, then reboot • Wait for {1,2,…,10 min} Spatial coarsening …

Example: Machine health in Azure Compute Decision? Action [‐]Reward Machine A Wait 10 min 5 min Machine B Wait 10 min 3 min Machine C Wait 10 min 10 min + reboot … … … …

Example: Machine health in Azure Compute Decision? Action [‐]Reward Feedback Machine A Wait 10 min 5 min Wait 1,2,…,9 Machine B Wait 10 min 3 min Wait 1,2,…,9 Machine C Wait 10 min 10 min + reboot Wait 1,2,…,9 … … … … …

Example: Machine health in Azure Compute Decision? Action [‐]Reward Feedback Machine A Wait 6 min 5 min Wait 1,2,…,9 Machine B Wait 2 min 3 min Wait 1,2,…,9 Machine C Wait 10 min 10 min + reboot Wait 1,2,…,9 … … … … …

Example: Machine health in Azure Compute Decision? Action [‐]Reward Feedback Machine A Wait 6 min 5 min Wait 1,2,…,9 Machine B Wait 2 min 2 min + reboot Wait 1 Machine C Wait 10 min 10 min + reboot Wait 1,2,…,9 … … … … Implicit feedback …

Results: Machine health in Azure Compute DR DR + Implicit feedback

Results: Machine health in Azure Compute

Example: TCP config in Azure Frontdoor • TCP parameters: Mumbai, India • initial cwnd Cloud Datacenter Edge proxy • initial RTO Clients cluster Service 1 • min RTO endpoint • max SYN retransmit Atlanta, USA resp Service 1 WAN Edge proxy • delayed ACK freq req endpoint cluster • delayed ACK timeout Service 2 endpoint Clients

Example: TCP config in Azure Frontdoor • TCP parameters: Mumbai, India • initial cwnd Cloud Datacenter Edge proxy • initial RTO Clients cluster Service 1 • min RTO endpoint • max SYN retransmit Atlanta, USA resp Service 1 WAN Edge proxy • delayed ACK freq req endpoint cluster • delayed ACK timeout Service 2 • Pick from 17 different endpoint Clients configurations, per hour per machine Spatial/temporal coarsening

Example: TCP config in Azure Frontdoor • Dynamic workload and Mumbai, India environment Cloud Datacenter Edge proxy Clients • Assign “control” machine cluster Service 1 to each RL machine as endpoint Atlanta, USA baseline, report delta resp Service 1 WAN Edge proxy req endpoint cluster Service 2 endpoint Clients

Example: TCP config in Azure Frontdoor • Dynamic workload and Mumbai, India environment Cloud Datacenter Clients • Assign “control” machine Service 1 to each RL machine as endpoint Atlanta, USA baseline, report delta resp Service 1 WAN req endpoint Service 2 Baseline normalization endpoint Clients

Results: TCP config in Azure Frontdoor Estimate Reward Error Ground truth 0.713 ‐‐ DR 0.720 (0.637, 0.796) 0.97%

Lesson: Unbiased estimator vs. biased policy Configuration 1 2 (default) 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

How to Make Decisions (Optimally) Siddhartha Sen Microsoft - PowerPoint PPT Presentation

How to Make Decisions (Optimally) Siddhartha Sen Microsoft Research NYC AI for Systems Vision: Infuse AI to optimize cloud infrastructure decisions, while being: Minimally disruptive (agenda: Harvesting Randomness ) Synergistic with

Optimally Propagating SAT Encodings Martin Brain, Liana Hadarean , Ruben Martins and Daniel

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference

Optimal Predition Ma rk ets with Optimal Pla y ers Lea rn Optimally fo r Log Loss

GCSE or Equivalent Options Decisions! Decisions! Decisions! An important time for our Year 10

Decisions Matter: Understanding How and Why We Make Decisions About the Environment Elke U.

Doing Your Taxes Decisions Decisions Decisions How do I get ready? Should I

Dysphagia: decisions, decisions, decisions Sean White Home Enteral Feed Dietitian Sheffield

Today Making Simple Decisions Making Decisions Making Sequential Decisions Planning

Fairness in Machine Learning Fairness in Supervised Learning Make decisions by machine learning:

??? It s It s Make Your Mind Up Make Your Mind Up Time Time ??? Make

? Can Functional Programmers Make make Make Sense? Norman Ramsey Tufts

Roberts Rules of Order Quorum is 50%+1 Need quorum to make decisions Decisions made

Dynamic Programming Sequence Of Decisions Sequence of decisions. As in the greedy

Dynamic Programming Sequence Of Decisions Sequence of decisions. As in the greedy

Missing Link in Creating Optimally Effective Interventions Steven F. Warren, PhD University of

Probing the properties of soft matter by optimally designed nonequilibrium experiments Carsten

Surviving the Zombie Apocalypse Security in the Cloud Containers, KVM and Xen Ian Jackson

A Topos Approach to the Formulation of Physical Theories Category Theory 2008 Calais 26. June

Acts Series Lesson #118 August 6, 2013 Dean Bible Ministries www.deanbible.org Dr. Robert L.

Neologism: Easy Vocabulary Publishing Cosmin Basca, Stphane Corlosquet, Richard Cyganiak,

But know this, that in the last days perilous times will come. 2 Timothy 3:1 Where Are We In

Scope Stack Allocation Andreas Fredriksson, DICE <dep@dice.se> Contents What are Scope

Review Models that use SVD or eigen-analysis PageRank: eigen-analysis of random dolphin

Decomposition of Boolean Multi-Relational Data with Graded Relations Martin Trnecka, Marketa

How to Make Decisions (Optimally) Siddhartha Sen Microsoft - PowerPoint PPT Presentation

How to Make Decisions (Optimally) Siddhartha Sen Microsoft Research NYC AI for Systems Vision: Infuse AI to optimize cloud infrastructure decisions, while being: Minimally disruptive (agenda: Harvesting Randomness ) Synergistic with

Optimally Propagating SAT Encodings Martin Brain, Liana Hadarean , Ruben Martins and Daniel

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference

Optimal Predition Ma rk ets with Optimal Pla y ers Lea rn Optimally fo r Log Loss

GCSE or Equivalent Options Decisions! Decisions! Decisions! An important time for our Year 10

Decisions Matter: Understanding How and Why We Make Decisions About the Environment Elke U.

Doing Your Taxes Decisions Decisions Decisions How do I get ready? Should I

Dysphagia: decisions, decisions, decisions Sean White Home Enteral Feed Dietitian Sheffield

Today Making Simple Decisions Making Decisions Making Sequential Decisions Planning

Fairness in Machine Learning Fairness in Supervised Learning Make decisions by machine learning:

??? It s It s Make Your Mind Up Make Your Mind Up Time Time ??? Make

? Can Functional Programmers Make make Make Sense? Norman Ramsey Tufts

Roberts Rules of Order Quorum is 50%+1 Need quorum to make decisions Decisions made

Dynamic Programming Sequence Of Decisions Sequence of decisions. As in the greedy

Dynamic Programming Sequence Of Decisions Sequence of decisions. As in the greedy

Missing Link in Creating Optimally Effective Interventions Steven F. Warren, PhD University of

Probing the properties of soft matter by optimally designed nonequilibrium experiments Carsten

Surviving the Zombie Apocalypse Security in the Cloud Containers, KVM and Xen Ian Jackson

A Topos Approach to the Formulation of Physical Theories Category Theory 2008 Calais 26. June

Acts Series Lesson #118 August 6, 2013 Dean Bible Ministries www.deanbible.org Dr. Robert L.

Neologism: Easy Vocabulary Publishing Cosmin Basca, Stphane Corlosquet, Richard Cyganiak,

But know this, that in the last days perilous times will come. 2 Timothy 3:1 Where Are We In

Scope Stack Allocation Andreas Fredriksson, DICE &lt;dep@dice.se&gt; Contents What are Scope

Review Models that use SVD or eigen-analysis PageRank: eigen-analysis of random dolphin

Decomposition of Boolean Multi-Relational Data with Graded Relations Martin Trnecka, Marketa

Scope Stack Allocation Andreas Fredriksson, DICE <dep@dice.se> Contents What are Scope