learning structured decision problems with unawareness
play

Learning Structured Decision Problems with Unawareness Craig Innes - PowerPoint PPT Presentation

Learning Structured Decision Problems with Unawareness Craig Innes (craig.innes@ed.ac.uk), Alex Lascarides (alex@inf.ed.ac.uk) Institute for Language, Cognition and Computation University of Edinburgh 1 Why Unawareness? Fertiliser


  1. Learning Structured Decision Problems with Unawareness Craig Innes (craig.innes@ed.ac.uk), Alex Lascarides (alex@inf.ed.ac.uk) Institute for Language, Cognition and Computation University of Edinburgh 1

  2. Why Unawareness? Fertiliser Precipitation Grain Yield Protein R X = { Prec , Protein , Yield } A = { Grain , Fert } scope ( R ) = { Yield , Protein } Pa Prot = { Grain } P ( Prot = p | Grain = g ) = θ p | g 2

  3. Why Unawareness? Local Soil Insect Fertiliser Pesticide Precipitation Concern Type Prevalence Bad Nitrogen Grain Harrow Fungicide Temperature Infestation Press Gross Protein Fungus Weeds Crops Yield R X 0 ⊆ X + A 0 ⊆ A + scope 0 ( R ) ⊆ scope + ( R ) Pa Prot = { Grain } P ( Prot = p | Grain = g ) = θ p | g 2

  4. Contributions Our agent learns an interpretable model of a decision problem incrementally via evidence from domain trials and expert advice . 3

  5. Contributions Our agent learns an interpretable model of a decision problem incrementally via evidence from domain trials and expert advice . Evidence may reveal actions/variables the agent was completely unaware of prior to learning. 3

  6. Contextual Advice Types of Advice 1. Advice on Better Actions 2. Resolving Misunderstandings 3. Unexpected Rewards 4. Unknown Effects 4

  7. Contextual Advice - Better Action If agent’s performance in last k trials is below threshold β of true policy π + , say: 5

  8. Contextual Advice - Better Action If agent’s performance in last k trials is below threshold β of true policy π + , say: “At time t you should have done a ′ = � A 1 = 0 , A 2 = 1 , A 3 = 0 � rather than a t ” 5

  9. Contextual Advice - Better Action If agent’s performance in last k trials is below threshold β of true policy π + , say: “At time t you should have done a ′ = � A 1 = 0 , A 2 = 1 , A 3 = 0 � rather than a t ” • Action variable A 3 is part of the problem ( A 3 ∈ A ) • A 3 is relevant ( ∃ X ∈ scope ( R ) , anc ( A 3 , X )) • There exists a better reward ( ∃ s , s [ B t ] = s t [ B t ] ∧ R + ( s ) > r t ) • a ′ has a greater expected utility than a t ( EU ( a ′ | s ) > EU ( a t | s )) 5

  10. Conserving Previous Beliefs P (Pa Yield | D 0: t ) Pa Yield = ∅ Fertiliser Precipitation Grain Pa Yield = { Fert } Yield Protein . . . R Pa Yield = { Fert , Prec , Grain } 6

  11. Conserving Previous Beliefs P (Pa Yield | D 0: t ) Pa Yield = ∅ Pa Yield = { Fungus } Fertiliser Precipitation Grain Fungus Pa Yield = { Fert } Pa Yield = { Fert , Fungus } Yield Protein . . . R Pa Yield = { Fert , Prec , Grain } Pa Yield = { Fert , Prec , Grain , Fungus } 6

  12. Conserving Previous Beliefs P (Pa Yield | D 0: t ) Pa Yield = ∅ Pa Yield = { Fungus } Fertiliser Precipitation Grain Fungus Pa Yield = { Fert } Pa Yield = { Fert , Fungus } Yield Protein . . . R Pa Yield = { Fert , Prec , Grain } Pa Yield = { Fert , Prec , Grain , Fungus }  (1 − ρ ) P old ( Pa X | D 0: t ) if Fungus / ∈ Pa X  P new ( Pa X ) = ′ ′ ρ P old ( Pa X | D 0: t ) if Pa X = Pa X ∪ { Fungus }  6

  13. Experiments Randomly Generated Networks: 12 - 36 Variables • 12 - 36 Variables • 3000 Trials • ǫ -greedy strategy • Expert Aid β = 0 . 1 B3 O2 A11 B7 B2 A3 B6 A10 A1 A5 B1 O3 A7 A12 B9 B8 B5 O5 B12 O4 A8 O10 O8 A2 A4 A9 B10 O7 O11 A6 B4 O12 O6 B11 O9 O2 O1 R R Start Learning Goal 7

  14. Results 60 50 Cumulative Reward 40 30 20 default 10 truePolicy random 0 0 500 1000 1500 2000 2500 3000 t 8

  15. Results 60.0 57.5 55.0 Cumulative Reward 52.5 50.0 47.5 45.0 default nonCon 42.5 nonRelevant 40.0 0 500 1000 1500 2000 2500 3000 t 8

  16. Results 60.0 57.5 Cumulative Reward 55.0 52.5 50.0 47.5 default 45.0 lowTolerance 42.5 highTolerance 40.0 0 500 1000 1500 2000 2500 3000 t 8

  17. Conclusions and Contact Details + Paper Link Paper Learning Structured Decision Problems with Unawareness Authors Craig Innes (craig.innes@ed.ac.uk) Alex Lascarides (alex@inf.ed.ac.uk) Poster Session: 6:30pm-9pm, Pacific Ballroom #35 9

Recommend


More recommend