Sequential Extensions of Causal and Evidential Decision Theory Tom - PowerPoint PPT Presentation

Sequential Extensions of Causal and Evidential Decision Theory Tom Everitt, Jan Leike, and Marcus Hutter http://jan.leike.name/ ADT’15 — 29 September 2015

Outline Agent Models Decision Theory Sequential Decision Making Conclusion References

Dualistic Agent Model action a t percept e t agent environment

Dualistic Agent Model action a t percept e t agent environment Goal: maximize expected utility E [ � m t =1 u ( e t )]

Physicalistic Agent Model hidden state s action a t self-model environment model percept e t agent environment

Physicalistic Agent Model hidden state s action a t self-model environment model percept e t agent environment Goal: maximize expected utility E [ � m t =1 u ( e t )]

Newcomb’s Problem Presented by [Nozick, 1969] Actions: (1) take the opaque box or (2) take both boxes

Reasoning Causally Causal decision theory (CDT): take the action that causes the best outcome

Reasoning Causally Causal decision theory (CDT): take the action that causes the best outcome � arg max µ ( e | do ( a )) u ( e ) (CDT) a ∈A e ∈E [Gibbard and Harper, 1978, Lewis, 1981, Skyrms, 1982, Joyce, 1999, Weirich, 2012]

Reasoning Causally Causal decision theory (CDT): take the action that causes the best outcome � arg max µ ( e | do ( a )) u ( e ) (CDT) a ∈A e ∈E [Gibbard and Harper, 1978, Lewis, 1981, Skyrms, 1982, Joyce, 1999, Weirich, 2012] In Newcomb’s problem: taking both boxes causes you to have $1000 more

Reasoning Evidentially Evidential decision theory (EDT): take the action that gives the best news about the outcome

Reasoning Evidentially Evidential decision theory (EDT): take the action that gives the best news about the outcome � arg max µ ( e | a ) u ( e ) (EDT) a ∈A e ∈E [Jeffrey, 1983, Briggs, 2014, Ahmed, 2014]

Reasoning Evidentially Evidential decision theory (EDT): take the action that gives the best news about the outcome � arg max µ ( e | a ) u ( e ) (EDT) a ∈A e ∈E [Jeffrey, 1983, Briggs, 2014, Ahmed, 2014] In Newcomb’s problem: taking just the opaque box is good news because that means it likely contains $1,000,000

Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state

Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state Newcomblike problems are actually quite common!

Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state Newcomblike problems are actually quite common! ◮ People predict each other all the time

Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state Newcomblike problems are actually quite common! ◮ People predict each other all the time ◮ Prediction does not need to be perfect

Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state Newcomblike problems are actually quite common! ◮ People predict each other all the time ◮ Prediction does not need to be perfect ◮ Example: Environment that knows your source code

Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state Newcomblike problems are actually quite common! ◮ People predict each other all the time ◮ Prediction does not need to be perfect ◮ Example: Environment that knows your source code ◮ Example: Multi-Agent setting with multiple copies of one agent

Sequential Decision Making

The Causal Graph One-shot: s a e

The Causal Graph One-shot: s a e Sequential: s . . . a 1 e 1 a 2 e 2

Notation ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon

Sequential Evidential Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon

Sequential Evidential Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon Sequential action-evidential decision theory (SAEDT): � � � V aev ( æ < t a t ) := µ ( e t | æ < t a t ) u ( e t ) + V aev ( æ < t a t e t ) � �� e t µ ( e t | past , a t ) future utility

Sequential Evidential Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon Sequential action-evidential decision theory (SAEDT): � � � V aev ( æ < t a t ) := µ ( e t | æ < t a t ) u ( e t ) + V aev ( æ < t a t e t ) � �� e t µ ( e t | past , a t ) future utility Sequential policy-evidential decision theory (SPEDT): � � � V pev ( æ < t a t ) := µ ( e t | æ < t a t , π t +1: m ) u ( e t ) + V pev ( æ < t a t e t ) � �� e t � �� µ ( e t | past ,π ) future utility

Sequential Causal Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon

Sequential Causal Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon Sequential causal decision theory (SCDT): � � � V cau ( æ < t a t ) := µ ( e t | æ < t , do ( a t )) u ( e t ) + V cau ( æ < t a t e t ) � �� e t ∈E � �� µ ( e t | past , do ( a t )) future utility

Sequential Causal Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon Sequential causal decision theory (SCDT): � � � V cau ( æ < t a t ) := µ ( e t | æ < t , do ( a t )) u ( e t ) + V cau ( æ < t a t e t ) � �� e t ∈E � �� µ ( e t | past , do ( a t )) future utility Proposition (Policy-Causal = Action-Causal). For all histories æ < t and percepts e t : µ ( e t | æ < t , do ( a t )) = µ ( e t | æ < t , do ( π t : m )) .

Examples action-evidential policy-evidential causal Newcomb � � × Newcomb w/ precommit × � � Newcomb w/ looking × × × Toxoplasmosis × × � Seq. Toxoplasmosis × × � Formal description in [Everitt et al., 2015] and source code at http://jan.leike.name

Conclusion ◮ How should physicalistic agents make decisions?

Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT

Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making

Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making Which decision theory is better?

Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making Which decision theory is better? ◮ In the end it matters whether you win (get the most utility)

Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making Which decision theory is better? ◮ In the end it matters whether you win (get the most utility) ◮ Neither EDT nor CDT model the environment containing themselves

Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making Which decision theory is better? ◮ In the end it matters whether you win (get the most utility) ◮ Neither EDT nor CDT model the environment containing themselves ◮ Neither EDT nor CDT win on every example

Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making Which decision theory is better? ◮ In the end it matters whether you win (get the most utility) ◮ Neither EDT nor CDT model the environment containing themselves ◮ Neither EDT nor CDT win on every example ◮ How physicalistic agents make decisions optimally is unsolved

Sequential Extensions of Causal and Evidential Decision Theory Tom - PowerPoint PPT Presentation

Sequential Extensions of Causal and Evidential Decision Theory Tom Everitt, Jan Leike, and Marcus Hutter http://jan.leike.name/ ADT15 29 September 2015 Outline Agent Models Decision Theory Sequential Decision Making Conclusion

Evidential and Causal Reasoning Much reasoning in AI can be seen as evidential reasoning ,

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Sequential Decision Making AIMA Chapters: 17.1, 17.2, 17.3. Sutton and Barto, Reinforcement

Evidential and Legal Reasoning in AI the role of argumentation Floris Bex Utrecht University

An Evidential Tool Bus John Rushby Computer Science Laboratory SRI International Menlo Park,

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Causal Programming Causal Programming Joshua Brul Joshua Brul

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Causal Discovery from Observational Data Brady Neal causalcourse.com What if we dont have

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

SmartFarm Data Management. Agriculture Victoria Research iRODS User Phenoshop Conference 23-4

Legal Overview Daniel E. Will, Esq. Solicitor General, State of New Hampshire New Hampshire

Unix, Perl and Python Introduction to Unix and LSF Bingbing Yuan, M.D., Ph.D. WIBR

Okay. Okay. We're going to go ahead and get started. Welcome again everyone to the information

The Formalities of Affordance Antony Galton University of Exeter, UK Antony Galton The

PHILIPPIANS Part 16: Financial Partnership 09.19.10 Nate reads Philippians 4:14-19 NIV Video:

Preparing for the Career Fair Part 3: Elevator Speeches CS1000 - Explorations in Computer Science

Today How is data saved in the hard disk? Magnetic disk Disk speed parameters Disk