facilitating testing and debugging of markov decision
play

Facilitating Testing and Debugging of Markov Decision Processes - PowerPoint PPT Presentation

Facilitating Testing and Debugging of Markov Decision Processes with Interactive Visualization Sean McGregor, Hailey Buckingham, Thomas G. Dietterich, Rachel Houtman, Claire Montgomery, and Ronald Metoyer What are Markov Decision Processes


  1. Facilitating Testing and Debugging of Markov Decision Processes with Interactive Visualization Sean McGregor, Hailey Buckingham, Thomas G. Dietterich, Rachel Houtman, Claire Montgomery, and Ronald Metoyer

  2. What are Markov Decision Processes (MDPs)? Sequential Decision Making Under Uncertainty � Wildfire Suppression � Autonomous Helicopter 0 � Mountain Car � 1 Logistics 1 � Medical Diagnosis 2 �

  3. Outline 1. Markov Decision Processes (MDPs) � Basic Introduction � Testing � 2. MDPvis � Design � Testing Examples � MDPvis Use Case Study � 3. Concluding � 2

  4. MDPs: Basic Introduction � Notation, M = ⟨ S , A , P , R , γ , P 0 ⟩ S � All States of the World � P 0 � Starting State Distribution � A � Available Actions � R ( s , a ) � Rewards � γ ∈ (0, 1) � Discount � P � State Transition Probabilities (Simulators) � π (s) → a � Policy � Puterman, M. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming (1st ed.). Wiley-Interscience. � 3

  5. MDPs: Basic Introduction � Motivating Domain of Wildfire Starting in 1935, the United States adopted the “ 10 AM policy ” We need a more nuanced approach. Houtman, R. M., Montgomery, C. A., Gagnon, A. R., Calkin, D. E., Dietterich, T. G., McGregor, S., & Crowley, M. (2013). Allowing a Wildfire to Burn: 4 Estimating the Effect on Future Fire Suppression Costs. International Journal of Wildland Fire, 22(7), 871–882. � � http://www.fs.fed.us/sites/default/files/2015-Fire-Budget-Report.pdf �

  6. MDPs: Basic Introduction � Modeling Wildfire S � All the possible configurations of trees/ignitions � P 0 � A snapshot of the current forest, with a random fire � A � Suppress or let-burn � R ( s , a ) � Timber harvest, Suppression Expense � γ ∈ (0, 1) � 0.96 (Forest Service Standard) � P � Several Simulators � π (s) → a � Suppress all fires � Represents a challenging and more general class of MDPs � • High Dimensional States � • Large State Space � • Integrates Several Simulators � 5

  7. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � P 0 � 6

  8. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Start with Today’s Landscape � P 0 � 7

  9. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Generate an ignition and weather � P 0 � 8

  10. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Generate an ignition and weather � P 0 � 9

  11. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Select an Action � P 0 � 10

  12. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Fire Suppression Costs � Fire Suppression Effort � $(95,000) � P 0 � 11

  13. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Update Vegetation for Wildfire � P 0 � 12

  14. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Harvest Revenue � Update Vegetation for harvest � $20,000 � P 0 � 13

  15. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Generate an ignition and weather � P 0 � 14

  16. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Select an Action � P 0 � 15

  17. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Fire Suppression Costs � Fire Suppression Effort � $(15,000) � P 0 � 16

  18. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Update Vegetation for Wildfire � P 0 � 17

  19. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Update Vegetation for Harvest � Harvest Revenue � $20,000 � P 0 � 18

  20. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � (Continue Until Reaching the Horizon) � P 0 � 19

  21. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � A High Dimensional Probabilistic Time Series � P 0 � …And this is just one of many! � 20

  22. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Monte Carlo Rollouts � P 0 � P 0 � P 0 � P 0 �

  23. MDPs: Basic Introduction � { � Simulators � Optimizer � Rewards � Policy � All visited states influence optimizer � P 0 � P 0 � P 0 � P 0 �

  24. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � Update Policy � P 0 � P 0 � P 0 � P 0 �

  25. MDPs: Basic Introduction � Simulators � Optimizer � Rewards � Policy � The Rollout Distribution Changes! � P 0 � P 0 � P 0 � P 0 �

  26. MDP Testing Challenges • Bugs are probabilistically expressed in a high dimensional temporal dataset. • The dataset changes with changes to parameters. • The optimizer sees more of the state and policy space than the user. Testing requires exploring rollouts and parameters � 25

  27. MDP Debugging and Fault Isolation • Deactivate/modify components to isolate fault Ø e.g. Balance reward magnitude and frequency Debug MDP specification and integration with parameter changes � 26

  28. MDPs: Testing/Debugging � Testing and Debugging Process 1. Generate Rollouts � 2. Visualize the data � 3. Change Parameters � 27

  29. Outline 1. Markov Decision Processes (MDPs) � Basic Introduction � Testing � 2. MDPvis � Design � Testing Examples � MDPvis Use Case Study � 3. Concluding � 28

  30. MDPvis: Design � Introducing MDPvis 29

  31. MDPvis: Design � What are the elements of the MDPvis design? Parameters � History � Distributions at Time Step � Distributions Through Time � State Snapshots � 30

  32. MDPvis: Design � Parameter Areas 31

  33. MDPvis: Design � History Area 32

  34. MDPvis: Design � Visualization Areas 33

  35. MDPvis: Design � State Variable Distributions for a Fixed Time Step Time step 9 � 34

  36. MDPvis: Design � State Variable Distributions for a Fixed Time Step π 1 : Let-Burn � π 2 : Suppress-All � Comparison � π 1 – π 2 � 35

  37. MDPvis: Design � State Variable Distributions for a Fixed Time Step Comparison � π 1 – π 2 � 36

  38. MDPvis: Design � State Variable Distributions for a Fixed Time Step Comparison � π 1 – π 2 � 37

  39. MDPvis: Design � State Variable Distributions for a Fixed Time Step Comparison � π 1 – π 2 � 38

  40. MDPvis: Design � State Variable Distributions for a Fixed Time Step Comparison � π 1 – π 2 � 39

  41. MDPvis: Design � State Variable Distributions for a Fixed Time Step Rescale � Comparison � π 1 – π 2 � 40

  42. MDPvis: Design � State Variable Distributions for a Fixed Time Step Take Difference in Counts � Comparison � π 1 – π 2 � 41

  43. MDPvis: Design � State Variable Distributions for a Fixed Time Step Re-plot � Comparison � π 1 – π 2 � 42

  44. MDPvis: Design � State Variable Distributions for a Fixed Time Step Let-Burn Dominates Suppress-All in this time step � Comparison � π 1 – π 2 � 43

  45. MDPvis: Design � State Variable Distributions through Time All Time Steps 44

  46. MDPvis: Design � State Variable Distributions through Time 100 th Percentile � 50 th Percentile � 0 th Percentile � Event Number � All Time Steps 45

  47. MDPvis: Design � State Variable Distributions through Time π 1 : Let-Burn � π 2 : Suppress-All � Comparison � π 1 – π 2 � π 1 percentile is greater � 46 π 2 percentile is greater �

  48. MDPvis: Design � State Variable Distributions through Time π 1 : Let-Burn � π 2 : Suppress-All � Let-Burn is Comparison � Always Better π 1 – π 2 � Across All Time Steps � 47

  49. MDPvis: Integration � State details Allow MDP Simulator to Generate State Visualizations � [ � ] � [ � ] � , � , � , � [ � ] � , � , � , � [ � ] � , � , � , � 48

  50. MDPs: Testing/Debugging � Parameter Space Analysis (PSA) “[PSA] is the systematic variation of model input parameters, generating outputs for each combination of parameters, and investigating the relation between parameter settings and corresponding outputs. ” � � Categories Sensitivity � Optimization � Outliers � Partition � Uncertainty � Fitting � Sedlmair, M., Heinzl, C., Bruckner, S., Piringer, H., & Möller, T. (2014). Visual 49 parameter space analysis: A conceptual framework. Visualization and Computer Graphics, IEEE Transactions on, 20(12). �

  51. MDPs: Testing/Debugging � Sensitivity · Optimization · Outliers · Partition · Uncertainty · Fitting � � � Is the policy Sensitive to the state? � Interaction � 1. Brush suppression choice to select Let-Burn � Expectation � 2. Date is a determinant of suppression choice � Buggy Result � 3. Date does not determine suppression choice � 50

Recommend


More recommend