Gumbel-Max Structural Causal Models Michael Oberst David Sontag - PowerPoint PPT Presentation

Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models Michael Oberst David Sontag MIT MIT @MichaelOberst

Motivation: Building trust in RL policies ► Goal : Apply reinforcement learning in high risk settings (e.g., healthcare) ► Problem : How to safely evaluate a policy? No simulator, and off-policy evaluation can fail due to ► Confounding ► Small sample sizes ► Poorly specified rewards ► Could try to interpret the policy directly, but if not possible, what can we do?

Motivation: Building trust in RL policies Suppose we are given: • Markov Decision Process (MDP) Markov Decision Process (MDP) • Policy (e.g., learned using MDP) 𝑄 𝑇 ′ , 𝑆 𝑇, 𝐵) 𝑇: Current State 𝐵: Action 𝑆: Reward ? 𝑇′: Next State Observational Data Policy 𝜌 𝐵 𝑇)

Using counterfactuals to “sanity check” 𝑇: State 𝐵: Action …patient …drug …significant has infection reaction agitation Antibiotics Mechanical Sedation 𝐵 1 𝐵 2 𝐵 3 Ventilation Time

Using counterfactuals to “sanity check” 𝑇: State If the new policy had been applied to this patient… 𝐵: Action …patient …drug …significant has infection reaction agitation Antibiotics Mechanical Sedation 𝐵 1 𝐵 2 𝐵 3 Ventilation Time

Using counterfactuals to “sanity check” 𝑇: State If the new policy had been applied to this patient… 𝐵: Action 𝐵 1 Antibiotics …patient 𝑇 0 has infection …patient …drug …significant has infection reaction agitation Antibiotics Mechanical Sedation 𝐵 1 𝐵 2 𝐵 3 Ventilation Time

Using counterfactuals to “sanity check” 𝑇: State If the new policy had been applied to this patient… 𝐵: Action 𝐵 1 Antibiotics …patient …infection 𝑇 0 𝑇 1 has infection cleared …patient …drug …significant has infection reaction agitation Antibiotics Mechanical Sedation 𝐵 1 𝐵 2 𝐵 3 Ventilation Time

Using counterfactuals to “sanity check” 𝑇: State If the new policy had been applied to this patient… 𝐵: Action Model-based rollout 𝐵 1 Antibiotics not a fair comparison …patient …infection 𝑇 0 𝑇 1 has infection cleared …patient …drug …significant has infection reaction agitation Antibiotics Mechanical Sedation 𝐵 1 𝐵 2 𝐵 3 Ventilation Time

Using counterfactuals to “sanity check” 𝑇: State If the new policy had been applied to this patient… 𝐵: Action 𝐵 1 Antibiotics …patient 𝑇 0 𝑇 1 has infection …patient …drug …significant has infection reaction agitation Antibiotics Mechanical Sedation 𝐵 1 𝐵 2 𝐵 3 Ventilation Time

Using counterfactuals to “sanity check” 𝑇: State If the new policy had been applied to this patient… 𝐵: Action Counterfactual influenced 𝐵 1 Antibiotics by actual outcome …patient …drug 𝑇 0 𝑇 1 has infection reaction …patient …drug …significant has infection reaction agitation Antibiotics Mechanical Sedation 𝐵 1 𝐵 2 𝐵 3 Ventilation Time

Using counterfactuals to “sanity check” 𝑇: State If the new policy had been applied to this patient… 𝐵: Action 𝐵 1 𝐵 2 𝐵 3 Antibiotics No action Discharge …patient …drug …patient 𝑇 0 𝑇 1 𝑇 2 has infection reaction recovers …patient …drug …significant has infection reaction agitation Antibiotics Mechanical Sedation 𝐵 1 𝐵 2 𝐵 3 Ventilation Time

Using counterfactuals to “sanity check” 𝑇: State If the new policy had been applied to this patient… 𝐵: Action 𝐵 1 𝐵 2 𝐵 3 Antibiotics No action Discharge …patient …drug …patient 𝑇 0 𝑇 1 𝑇 2 has infection reaction recovers …patient …drug …significant has infection reaction agitation Antibiotics Mechanical Sedation 𝐵 1 𝐵 2 𝐵 3 Ventilation Idea: If the counterfactual trajectory is unreasonable given Time full context of patient, the model / policy may be flawed

Using counterfactuals to “sanity check” Approach Decomposition of reward 1 over real episodes, to identify interesting cases See paper / poster for synthetic case study motivated by sepsis management

Using counterfactuals to “sanity check” Example Approach Decomposition of reward 1 over real episodes, to identify interesting cases See paper / poster for synthetic case study motivated by sepsis management

Using counterfactuals to “sanity check” Example Approach Decomposition of reward 1 over real episodes, to identify interesting cases Examine counterfactual 2 trajectories under new policy Validate and/or criticize 3 conclusions, using full patient information (e.g., chart review) See paper / poster for synthetic case study motivated by sepsis management

Simulating counterfactual trajectories What we need 1 Observed trajectories 2 Policy to evaluate 𝜌 𝐵 𝑇) Model of discrete dynamics, 3 e.g., Markov Decision Process 𝑇 𝑇′ 𝑇: Current State 𝐵: Action 𝑇′: Next State 𝐵

Simulating counterfactual trajectories What we need 1 Observed trajectories Structural Causal Model (SCM) 𝑇 𝑇′ 2 Policy to evaluate 𝜌 𝐵 𝑇) + 𝐵 𝑉 𝑇′ Model of discrete dynamics, 3 𝑇 ′ = 𝑔(𝑇, 𝐵, 𝑉 𝑡′ ) e.g., Markov Decision Process 𝑉 𝑡′ ∼ 𝑄(𝑉 𝑡 ′ ) 𝑇 𝑇′ 𝑇: Current State 𝐵: Action 𝑇′: Next State 𝐵

Simulating counterfactual trajectories What we need 1 Observed trajectories Structural Causal Model (SCM) 𝑇 𝑇′ 2 Policy to evaluate 𝜌 𝐵 𝑇) + 𝐵 𝑉 𝑇′ Model of discrete dynamics, 3 𝑇 ′ = 𝑔(𝑇, 𝐵, 𝑉 𝑡′ ) e.g., Markov Decision Process 𝑉 𝑡′ ∼ 𝑄(𝑉 𝑡 ′ ) 𝑇 𝑇′ 𝑇: Current State Problem : Choice of SCM is not 𝐵: Action identifiable from data! 𝑇′: Next State 𝐵

So, what should we use for the structural causal model (SCM)? Key challenge: Non-identifiability There are multiple SCMs consistent with 𝑄 𝑇 ′ 𝑇, 𝐵) but with different counterfactual distributions For binary variables , assuming the property of monotonicity (Pearl, 2000) is sufficient to identify the counterfactual distribution But most real-world MDPs have non-binary states!

So, what should we use for the structural causal model (SCM)? Key challenge: Non-identifiability Theorem 1 (informal) : (Newly defined) There are multiple SCMs consistent property of counterfactual stability generalizes with 𝑄 𝑇 ′ 𝑇, 𝐵) but with different monotonicity to categorical variables counterfactual distributions For binary variables , assuming the property of monotonicity (Pearl, 2000) is sufficient to identify the counterfactual distribution But most real-world MDPs have non-binary states!

So, what should we use for the structural causal model (SCM)? Key challenge: Non-identifiability Theorem 1 (informal) : (Newly defined) There are multiple SCMs consistent property of counterfactual stability generalizes with 𝑄 𝑇 ′ 𝑇, 𝐵) but with different monotonicity to categorical variables counterfactual distributions Gumbel-Max SCM For binary variables , assuming the Use the Gumbel-Max trick to sample from a property of monotonicity (Pearl, categorical distribution with 𝑙 categories: 2000) is sufficient to identify the 𝑕 𝑘 ∼ 𝐻𝑣𝑛𝑐𝑓𝑚 counterfactual distribution 𝑇 ′ = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑘 { log 𝑄 𝑇 ′ = 𝑘 𝑇, 𝐵) + 𝑕 𝑘 } But most real-world MDPs have non-binary states! Theorem 2: Gumbel-Max SCM satisfies the counterfactual stability condition

Thank you! Come to our poster for more details: Pacific Ballroom #72

Gumbel-Max Structural Causal Models Michael Oberst David Sontag - PowerPoint PPT Presentation

Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models Michael Oberst David Sontag MIT MIT @MichaelOberst Motivation: Building trust in RL policies Goal : Apply reinforcement learning in high risk settings (e.g.,

Outline Outline Gumbel Gumbel Asymptotic Distributions Asymptotic Distributions

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Causal Programming Causal Programming Joshua Brul Joshua Brul

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Causal Discovery from Observational Data Brady Neal causalcourse.com What if we dont have

Why Invariant Functions . . . Clayton & Gumbel Copulas: Not All Physical . . . Why Scalings

Features To Daniel Leung Charlie Johnson Matt Gumbel Support A Andy Ross Single Build Target

On estimation of functional causal models: Post - nonlinear causal model as an

Introduction to Causal Inference Lan Liu University of Minnesota at Twin Cities liux3771@umn.edu

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

A Brief Introduction to Causal Inference Brady Neal causalcourse.com What is causal inference?

Randomized Experiments The goal of randomized experiments is to identify The causal

Causal and Non-Causal Feature Selection for Ridge Regression Gavin Cawley School of Computing

Lab & Pharmacy: Turning Daily Interaction into a Partnership Danielle C. Kauffman, PharmD,

DRAFT HCV Testing and Linkage to Care in California Rachel McLean, MPH California Department of

Introduction: Knowledge Discovery guided

Pharmacist intervention to prevent hospitalization and death in patients with heart failure: A

A A novel international fram ew ork l i i l f k for privacy-enhanced data processing,

Selected topics in meta analysis H. Schmidt - Tagung der AG "Bayes-Methodik", 5 December

HIV Pharmacology I have nothing to disclose Parya Saberi, PharmD, MAS Assistant Professor,

ADAP Coverage of HCV Treatment Medications Ama Amanda Bowes No November 29, , 2017 AGENDA

Gumbel-Max Structural Causal Models Michael Oberst David Sontag - PowerPoint PPT Presentation

Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models Michael Oberst David Sontag MIT MIT @MichaelOberst Motivation: Building trust in RL policies Goal : Apply reinforcement learning in high risk settings (e.g.,

Outline Outline Gumbel Gumbel Asymptotic Distributions Asymptotic Distributions

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Causal Programming Causal Programming Joshua Brul Joshua Brul

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Causal Discovery from Observational Data Brady Neal causalcourse.com What if we dont have

Why Invariant Functions . . . Clayton &amp; Gumbel Copulas: Not All Physical . . . Why Scalings

Features To Daniel Leung Charlie Johnson Matt Gumbel Support A Andy Ross Single Build Target

On estimation of functional causal models: Post - nonlinear causal model as an

Introduction to Causal Inference Lan Liu University of Minnesota at Twin Cities liux3771@umn.edu

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

A Brief Introduction to Causal Inference Brady Neal causalcourse.com What is causal inference?

Randomized Experiments The goal of randomized experiments is to identify The causal

Causal and Non-Causal Feature Selection for Ridge Regression Gavin Cawley School of Computing

Lab &amp; Pharmacy: Turning Daily Interaction into a Partnership Danielle C. Kauffman, PharmD,

DRAFT HCV Testing and Linkage to Care in California Rachel McLean, MPH California Department of

Introduction: Knowledge Discovery guided

Pharmacist intervention to prevent hospitalization and death in patients with heart failure: A

A A novel international fram ew ork l i i l f k for privacy-enhanced data processing,

Selected topics in meta analysis H. Schmidt - Tagung der AG &quot;Bayes-Methodik&quot;, 5 December

HIV Pharmacology I have nothing to disclose Parya Saberi, PharmD, MAS Assistant Professor,

ADAP Coverage of HCV Treatment Medications Ama Amanda Bowes No November 29, , 2017 AGENDA

Why Invariant Functions . . . Clayton & Gumbel Copulas: Not All Physical . . . Why Scalings

Lab & Pharmacy: Turning Daily Interaction into a Partnership Danielle C. Kauffman, PharmD,

Selected topics in meta analysis H. Schmidt - Tagung der AG "Bayes-Methodik", 5 December