sequential extensions of causal and evidential decision
play

Sequential Extensions of Causal and Evidential Decision Theory Tom - PowerPoint PPT Presentation

Sequential Extensions of Causal and Evidential Decision Theory Tom Everitt, Jan Leike, and Marcus Hutter http://jan.leike.name/ ADT15 29 September 2015 Outline Agent Models Decision Theory Sequential Decision Making Conclusion


  1. Sequential Extensions of Causal and Evidential Decision Theory Tom Everitt, Jan Leike, and Marcus Hutter http://jan.leike.name/ ADT’15 — 29 September 2015

  2. Outline Agent Models Decision Theory Sequential Decision Making Conclusion References

  3. Dualistic Agent Model action a t percept e t agent environment

  4. Dualistic Agent Model action a t percept e t agent environment Goal: maximize expected utility E [ � m t =1 u ( e t )]

  5. Physicalistic Agent Model hidden state s action a t self-model environment model percept e t agent environment

  6. Physicalistic Agent Model hidden state s action a t self-model environment model percept e t agent environment Goal: maximize expected utility E [ � m t =1 u ( e t )]

  7. Outline Agent Models Decision Theory Sequential Decision Making Conclusion References

  8. Newcomb’s Problem Presented by [Nozick, 1969] Actions: (1) take the opaque box or (2) take both boxes

  9. Reasoning Causally Causal decision theory (CDT): take the action that causes the best outcome

  10. Reasoning Causally Causal decision theory (CDT): take the action that causes the best outcome � arg max µ ( e | do ( a )) u ( e ) (CDT) a ∈A e ∈E [Gibbard and Harper, 1978, Lewis, 1981, Skyrms, 1982, Joyce, 1999, Weirich, 2012]

  11. Reasoning Causally Causal decision theory (CDT): take the action that causes the best outcome � arg max µ ( e | do ( a )) u ( e ) (CDT) a ∈A e ∈E [Gibbard and Harper, 1978, Lewis, 1981, Skyrms, 1982, Joyce, 1999, Weirich, 2012] In Newcomb’s problem: taking both boxes causes you to have $1000 more

  12. Reasoning Evidentially Evidential decision theory (EDT): take the action that gives the best news about the outcome

  13. Reasoning Evidentially Evidential decision theory (EDT): take the action that gives the best news about the outcome � arg max µ ( e | a ) u ( e ) (EDT) a ∈A e ∈E [Jeffrey, 1983, Briggs, 2014, Ahmed, 2014]

  14. Reasoning Evidentially Evidential decision theory (EDT): take the action that gives the best news about the outcome � arg max µ ( e | a ) u ( e ) (EDT) a ∈A e ∈E [Jeffrey, 1983, Briggs, 2014, Ahmed, 2014] In Newcomb’s problem: taking just the opaque box is good news because that means it likely contains $1,000,000

  15. Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state

  16. Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state Newcomblike problems are actually quite common!

  17. Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state Newcomblike problems are actually quite common! ◮ People predict each other all the time

  18. Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state Newcomblike problems are actually quite common! ◮ People predict each other all the time ◮ Prediction does not need to be perfect

  19. Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state Newcomblike problems are actually quite common! ◮ People predict each other all the time ◮ Prediction does not need to be perfect ◮ Example: Environment that knows your source code

  20. Newcomblike Problems = problems where your actions are not independent of the (unobservable) environment state Newcomblike problems are actually quite common! ◮ People predict each other all the time ◮ Prediction does not need to be perfect ◮ Example: Environment that knows your source code ◮ Example: Multi-Agent setting with multiple copies of one agent

  21. Outline Agent Models Decision Theory Sequential Decision Making Conclusion References

  22. Sequential Decision Making

  23. The Causal Graph One-shot: s a e

  24. The Causal Graph One-shot: s a e Sequential: s . . . a 1 e 1 a 2 e 2

  25. Notation ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon

  26. Sequential Evidential Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon

  27. Sequential Evidential Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon Sequential action-evidential decision theory (SAEDT): � � � V aev ( æ < t a t ) := µ ( e t | æ < t a t ) u ( e t ) + V aev ( æ < t a t e t ) � �� � � �� � e t µ ( e t | past , a t ) future utility

  28. Sequential Evidential Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon Sequential action-evidential decision theory (SAEDT): � � � V aev ( æ < t a t ) := µ ( e t | æ < t a t ) u ( e t ) + V aev ( æ < t a t e t ) � �� � � �� � e t µ ( e t | past , a t ) future utility Sequential policy-evidential decision theory (SPEDT): � � � V pev ( æ < t a t ) := µ ( e t | æ < t a t , π t +1: m ) u ( e t ) + V pev ( æ < t a t e t ) � �� � e t � �� � µ ( e t | past ,π ) future utility

  29. Sequential Causal Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon

  30. Sequential Causal Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon Sequential causal decision theory (SCDT): � � � V cau ( æ < t a t ) := µ ( e t | æ < t , do ( a t )) u ( e t ) + V cau ( æ < t a t e t ) � �� � e t ∈E � �� � µ ( e t | past , do ( a t )) future utility

  31. Sequential Causal Decision Theory ◮ æ < t = a 1 e 1 . . . a t − 1 e t − 1 denotes the history ◮ µ : ( A × E ) ∗ × A → ∆( E ) denotes the environment model ◮ π : ( A × E ) ∗ → A is my policy ◮ m ∈ N is the horizon Sequential causal decision theory (SCDT): � � � V cau ( æ < t a t ) := µ ( e t | æ < t , do ( a t )) u ( e t ) + V cau ( æ < t a t e t ) � �� � e t ∈E � �� � µ ( e t | past , do ( a t )) future utility Proposition (Policy-Causal = Action-Causal). For all histories æ < t and percepts e t : µ ( e t | æ < t , do ( a t )) = µ ( e t | æ < t , do ( π t : m )) .

  32. Outline Agent Models Decision Theory Sequential Decision Making Conclusion References

  33. Examples action-evidential policy-evidential causal Newcomb � � × Newcomb w/ precommit × � � Newcomb w/ looking × × × Toxoplasmosis × × � Seq. Toxoplasmosis × × � Formal description in [Everitt et al., 2015] and source code at http://jan.leike.name

  34. Conclusion ◮ How should physicalistic agents make decisions?

  35. Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT

  36. Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making

  37. Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making Which decision theory is better?

  38. Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making Which decision theory is better? ◮ In the end it matters whether you win (get the most utility)

  39. Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making Which decision theory is better? ◮ In the end it matters whether you win (get the most utility) ◮ Neither EDT nor CDT model the environment containing themselves

  40. Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making Which decision theory is better? ◮ In the end it matters whether you win (get the most utility) ◮ Neither EDT nor CDT model the environment containing themselves ◮ Neither EDT nor CDT win on every example

  41. Conclusion ◮ How should physicalistic agents make decisions? ◮ Answer from (philosophical) decision theory: EDT, CDT ◮ Extended to sequential decision making Which decision theory is better? ◮ In the end it matters whether you win (get the most utility) ◮ Neither EDT nor CDT model the environment containing themselves ◮ Neither EDT nor CDT win on every example ◮ How physicalistic agents make decisions optimally is unsolved

Recommend


More recommend