for ai and robotics
play

for AI and Robotics Planning and Control: Markov Decision Processes - PowerPoint PPT Presentation

Statistical Filtering and Control for AI and Robotics Planning and Control: Markov Decision Processes Alessandro Farinelli Outline Uncertainty: localization for mobile robots State estimation based on Bayesian filters [recall]


  1. Statistical Filtering and Control for AI and Robotics Planning and Control: Markov Decision Processes Alessandro Farinelli

  2. Outline • Uncertainty: localization for mobile robots – State estimation based on Bayesian filters [recall] • Acting Under Uncertainty – Markov Decision Problem – Solution approaches • Motion planning – Markov Decision Processes for path planning • Acknowledgment: material based on – Russel and Norvig; Artificial Intelligence: a Modern Approach – Thrun, Burgard, Fox; Probabilistic Robotics

  3. Mobile robots

  4. Sensors

  5. Uncertainty open = open a door Will open actually open the door ? Problems: • 1) partial observability and noisy sensors • 2) uncertainty in action outcomes • 3) immense complexity of modelling and predicting environment

  6. Probability Probabilistic assertions summarize effects of • laziness (enumeration of all relevant facts), • ignorance (lack of relevant facts) Subjective or Bayesian probability: • Probabilities relate propositions to one's own state of knowledge – P(open|I am in front of the door) = 0.6 – P(open|I am in front of the door; door is not locked) = 0.8

  7. Simple Example of State Estimation Suppose a robot obtains measurement z What is P(open|z)?

  8. Causal vs. Diagnostic Reasoning P(open|z) is diagnostic P(z|open) is causal Often causal knowledge is easier to obtain Bayes rule allows us to use causal knowledge: ( | ) ( ) P z open P open  ( | ) P open z ( ) P z count frequencies!

  9. Example P(z|  open) = 0.3 P(z|open) = 0.6 P(open) = P(  open) = 0.5 ( | ) ( ) P z open P open  ( | ) P open z    ( | ) ( ) ( | ) ( ) P z open p open P z open p open  0 . 6 0 . 5 2    ( | ) 0 . 67 P open z    0 . 6 0 . 5 0 . 3 0 . 5 3 z raises the probability that the door is open.

  10. Combining Evidence Suppose our robot obtains another observation z2. How can we integrate this new information? More generally, how can we estimate P(x| z1...zn )?

  11. Recursive Bayesian Updating   ( | , , , ) ( | , , ) P z x z z P x z z   1 1 1 1  n n n  ( | , , ) P x z z 1 n  ( | , , ) P z z z  1 1 n n Markov assumption : z n independent of z 1 ,...,z n-1 if we know x  ( | ) ( | , , ) P z x P x z z  1 1  n n  ( | , , ) P x z z 1 n  ( | , , ) P z z z  1 1 n n    ( | ) ( | , , ) P z x P x z z  1 1 n n    ( | ) ( ) P z x P x i 1 ... n  1 ... i n

  12. Example: Second Measurement P(z2|  open) = 0.6 P(z2|open) = 0.5 P(open|z1)=2/3 ( | ) ( | ) P z open P open z  2 1 ( | , ) P open z z    2 1 ( | ) ( | ) ( | ) ( | ) P z open P open z P z open P open z 2 1 2 1 1 2  5 2 3    0 . 625 1 2 3 1 8    2 3 5 3 z 2 lowers the probability that the door is open.

  13. Actions Often the world is dynamic – actions carried out by the robot, – actions carried out by other agents, – time passing by How can we incorporate such actions?

  14. Typical Actions The robot moves The robot moves objects People move around the robot Actions are never carried out with absolute certainty. In contrast to measurements, actions generally increase the uncertainty.

  15. Modeling Actions To incorporate the outcome of an action u into the current “belief”, we use conditional pdf P( x’|u,x ) This term specifies the pdf that executing u changes the state from x to x’ . 15

  16. Example: Closing the door

  17. State Transitions • P( x’|u,x) for u = “close door”: 0.9 0.1 open closed 1 0 • If the door is open, the action “close door” succeeds in 90% of all cases.

  18. Integrating the Outcome of Actions Continuous case:   ( ' | ) ( ' | , ) ( ) P x u P x u x P x dx Discrete case:   ( ' | ) ( ' | , ) ( ) P x u P x u x P x

  19. Example: The Resulting Belief   ( | ) ( | , ) ( ) P closed u P closed u x P x  ( | , ) ( ) P closed u open P open  ( | , ) ( ) P closed u closed P closed 9 5 1 3 15      10 8 1 8 16   ( | ) ( | , ) ( ) P open u P open u x P x  ( | , ) ( ) P open u open P open  ( | , ) ( ) P open u closed P closed 1 5 0 3 1      10 8 1 8 16   1 ( | ) P closed u

  20. Bayes Filters: Framework • Given: – Stream of observations z and action data u:   { , , , } d u z u z 1 1 t t t – Sensor model P(z|x) – Action model P( x’|u,x ) – Prior probability of the system state P(x) • Compute: – Estimate of the state X of a dynamical system – The posterior of the state is also called Belief:   ( ) ( | , , , ) Bel x P x u z u z 1 1 t t t t

  21. Markov Assumption  ( | , , ) ( | ) p z x z u p z x 0 : 1 : 1 : t t t t t t  ( | , , ) ( | , ) p x x z u p x x u   1 : 1 1 : 1 : 1 t t t t t t t Underlying Assumptions • Static world (no one else changes the world) • Independent noise (over time) • Perfect model, no approximation errors

  22. Bayes Filters z = observation  u = action  ( ) ( | , , , ) Bel x P x u z u z 1 1 x = state t t t t     ( | , , , , ) ( | , , , ) P z x u z u P x u z u Bayes 1 1 1 1 t t t t t    ( | ) ( | , , , ) P z x P x u z u Markov 1 1 t t t t     ( | ) ( | , , , , ) P z x P x u z u x Total prob.  1 1 1 t t t t t  ( | , , , ) P x u z u dx   1 1 1 1 t t t     ( | ) ( | , ) ( | , , , ) P z x P x u x P x u z u dx Markov    1 1 1 1 1 t t t t t t t t     ( | ) ( | , ) ( | , , , ) P z x P x u x P x u z z dx Markov     1 1 1 1 1 1 t t t t t t t t    ( | ) ( | , ) ( ) P z x P x u x Bel x dx    1 1 1 t t t t t t t

  23. Bayes Filter Algorithm Algorithm Bayes_filter ( Bel(x),d ): 1.  0 2. 3. If d is a perceptual data item z then 4. For all x do  5. ' ( ) ( | ) ( ) Bel x P z x Bel x     6. ' x ( ) Bel 7. For all x do  8.   1 ' ( ) ' ( ) Bel x Bel x 9. Else if d is an action data item u then 10. For all x’ do   11. ' ( ' ) ( ' | , ) ( ) Bel x P x u x Bel x dx 12. Return Bel ’(x)    ( ) ( | ) ( | , ) ( ) Bel x P z x P x u x Bel x dx    1 1 1 t t t t t t t t

  24. Bayes Filters are Familiar!    ( ) ( | ) ( | , ) ( ) Bel x P z x P x u x Bel x dx    1 1 1 t t t t t t t t Kalman filters Particle filters Hidden Markov models Dynamic Bayesian networks Partially Observable Markov Decision Processes (POMDPs)

  25. Bayesian filters for localization How do I know whether I am in front of the door ? Localization as a state estimation process (filtering) State update Sensor Reading

  26. Kalman Filter for Localization Gaussian pdf for belief • Pros: closed form representation, very fast update • Cons: Works only for linear action and sensor models (can use EKF to overcome this) Works well only for unimodal beliefs

  27. Particle filters Particles to represent the belief Pros: no assumption on belief, action and sensor models Cons: update can be computationally demanding

  28. Particle Filters: prior

  29. Particle Filters: bimodal belief

  30. Particle Filters: Unimodal beliefs

  31. Mapping and SLAM Localization: given map and observations, update pose estimation Mapping: given pose and observation, update map SLAM: given observations, update map and pose New observations increase uncertainty Loop closures reduce uncertainty

  32. SLAM in action Courtesy of Sebastian Thrun and Dirk Haehnel ( link for the video)

  33. Markov Decision Process • Mathematical model to plan sequences of actions in face of uncertainty

  34. Example MDP

  35. Solving MDPs

  36. Risk and Reward

  37. Utility of State Sequences

  38. Utility of States

  39. MDPs for mobile robots Optimal path (shortest) if actions are deterministic Optimal path (safer) if actions are NOT deterministic

  40. MDPs for mobile robots: formalization Input: • States x (Assume state is known) • Actions u • Transition probabilities p( x‘|u,x ) • Reward / payoff function r(x,u) • Note: now reward depends on state and action. This is a different notation, but the core concepts do not change. output • Policy p (x) that maximizes the future expected reward

Recommend


More recommend