statistical model checking for markov decision processes
play

Statistical Model Checking for Markov Decision Processes David - PowerPoint PPT Presentation

Statistical Model Checking for Markov Decision Processes David Henriques Joint work with Jo ao Martins, Paolo Zuliani, Andr e Platzer and Edmund M. Clarke QEST, September 18 th , 2012 David Henriques (CMU) SMC for MDPs QEST12 1 / 37


  1. Probabilisitic MC and Statistical MC Bounded LTL Syntax of BLTL ϕ := λ | ¬ ϕ | ϕ ∨ ϕ | F ≤ n ϕ | G ≤ n ϕ | ϕ U ≤ n ϕ where λ ∈ Λ. Semantics of BLTL π | = λ if λ ∈ L ( π 0 ) π | = ¬ ϕ if π �| = ϕ π | = ϕ 1 ∨ ϕ 2 if π | = ϕ 1 or π | = ϕ 2 if ∃ i ≤ n : π | i | = F ≤ n ϕ π | = ϕ ∀ i ≤ n : π | i | = G ≤ n ϕ π | = ϕ ∃ i ≤ n ∀ k ≤ i : π | k | = ϕ 1 and π | i | = ϕ 1 U ≤ n ϕ 2 π | = ϕ 2 a F ≤ n a David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

  2. Probabilisitic MC and Statistical MC Bounded LTL Syntax of BLTL ϕ := λ | ¬ ϕ | ϕ ∨ ϕ | F ≤ n ϕ | G ≤ n ϕ | ϕ U ≤ n ϕ where λ ∈ Λ. Semantics of BLTL π | = λ if λ ∈ L ( π 0 ) π | = ¬ ϕ if π �| = ϕ π | = ϕ 1 ∨ ϕ 2 if π | = ϕ 1 or π | = ϕ 2 if ∃ i ≤ n : π | i | = F ≤ n ϕ π | = ϕ ∀ i ≤ n : π | i | = G ≤ n ϕ π | = ϕ ∃ i ≤ n ∀ k ≤ i : π | k | = ϕ 1 and π | i | = ϕ 1 U ≤ n ϕ 2 π | = ϕ 2 a F ≤ n a David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

  3. Probabilisitic MC and Statistical MC Bounded LTL Syntax of BLTL ϕ := λ | ¬ ϕ | ϕ ∨ ϕ | F ≤ n ϕ | G ≤ n ϕ | ϕ U ≤ n ϕ where λ ∈ Λ. Semantics of BLTL π | = λ if λ ∈ L ( π 0 ) π | = ¬ ϕ if π �| = ϕ π | = ϕ 1 ∨ ϕ 2 if π | = ϕ 1 or π | = ϕ 2 if ∃ i ≤ n : π | i | = F ≤ n ϕ π | = ϕ ∀ i ≤ n : π | i | = G ≤ n ϕ π | = ϕ ∃ i ≤ n ∀ k ≤ i : π | k | = ϕ 1 and π | i | = ϕ 1 U ≤ n ϕ 2 π | = ϕ 2 a a a a a a G ≤ n a David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

  4. Probabilisitic MC and Statistical MC Bounded LTL Syntax of BLTL ϕ := λ | ¬ ϕ | ϕ ∨ ϕ | F ≤ n ϕ | G ≤ n ϕ | ϕ U ≤ n ϕ where λ ∈ Λ. Semantics of BLTL π | = λ if λ ∈ L ( π 0 ) π | = ¬ ϕ if π �| = ϕ π | = ϕ 1 ∨ ϕ 2 if π | = ϕ 1 or π | = ϕ 2 if ∃ i ≤ n : π | i | = F ≤ n ϕ π | = ϕ ∀ i ≤ n : π | i | = G ≤ n ϕ π | = ϕ ∃ i ≤ n ∀ k ≤ i : π | k | = ϕ 1 and π | i | = ϕ 1 U ≤ n ϕ 2 π | = ϕ 2 a a a a a a G ≤ n a David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

  5. Probabilisitic MC and Statistical MC Bounded LTL Syntax of BLTL ϕ := λ | ¬ ϕ | ϕ ∨ ϕ | F ≤ n ϕ | G ≤ n ϕ | ϕ U ≤ n ϕ where λ ∈ Λ. Semantics of BLTL π | = λ if λ ∈ L ( π 0 ) π | = ¬ ϕ if π �| = ϕ π | = ϕ 1 ∨ ϕ 2 if π | = ϕ 1 or π | = ϕ 2 if ∃ i ≤ n : π | i | = F ≤ n ϕ π | = ϕ ∀ i ≤ n : π | i | = G ≤ n ϕ π | = ϕ ∃ i ≤ n ∀ k ≤ i : π | k | = ϕ 1 and π | i | = ϕ 1 U ≤ n ϕ 2 π | = ϕ 2 a a a a a a G ≤ n a David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

  6. Probabilisitic MC and Statistical MC Bounded LTL Syntax of BLTL ϕ := λ | ¬ ϕ | ϕ ∨ ϕ | F ≤ n ϕ | G ≤ n ϕ | ϕ U ≤ n ϕ where λ ∈ Λ. Semantics of BLTL π | = λ if λ ∈ L ( π 0 ) π | = ¬ ϕ if π �| = ϕ π | = ϕ 1 ∨ ϕ 2 if π | = ϕ 1 or π | = ϕ 2 if ∃ i ≤ n : π | i | = F ≤ n ϕ π | = ϕ ∀ i ≤ n : π | i | = G ≤ n ϕ π | = ϕ ∃ i ≤ n ∀ k ≤ i : π | k | = ϕ 1 and π | i | = ϕ 1 U ≤ n ϕ 2 π | = ϕ 2 a a a a a a G ≤ n a David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

  7. Probabilisitic MC and Statistical MC Bounded LTL Syntax of BLTL ϕ := λ | ¬ ϕ | ϕ ∨ ϕ | F ≤ n ϕ | G ≤ n ϕ | ϕ U ≤ n ϕ where λ ∈ Λ. Semantics of BLTL π | = λ if λ ∈ L ( π 0 ) π | = ¬ ϕ if π �| = ϕ π | = ϕ 1 ∨ ϕ 2 if π | = ϕ 1 or π | = ϕ 2 if ∃ i ≤ n : π | i | = F ≤ n ϕ π | = ϕ ∀ i ≤ n : π | i | = G ≤ n ϕ π | = ϕ ∃ i ≤ n ∀ k ≤ i : π | k | = ϕ 1 and π | i | = ϕ 1 U ≤ n ϕ 2 π | = ϕ 2 a a a a a a G ≤ n a David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

  8. Probabilisitic MC and Statistical MC Bounded LTL Syntax of BLTL ϕ := λ | ¬ ϕ | ϕ ∨ ϕ | F ≤ n ϕ | G ≤ n ϕ | ϕ U ≤ n ϕ where λ ∈ Λ. Semantics of BLTL π | = λ if λ ∈ L ( π 0 ) π | = ¬ ϕ if π �| = ϕ π | = ϕ 1 ∨ ϕ 2 if π | = ϕ 1 or π | = ϕ 2 if ∃ i ≤ n : π | i | = F ≤ n ϕ π | = ϕ ∀ i ≤ n : π | i | = G ≤ n ϕ π | = ϕ ∃ i ≤ n ∀ k ≤ i : π | k | = ϕ 1 and π | i | = ϕ 1 U ≤ n ϕ 2 π | = ϕ 2 a a a a a a G ≤ n a David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

  9. Probabilisitic MC and Statistical MC Bounded LTL Syntax of BLTL ϕ := λ | ¬ ϕ | ϕ ∨ ϕ | F ≤ n ϕ | G ≤ n ϕ | ϕ U ≤ n ϕ where λ ∈ Λ. Semantics of BLTL π | = λ if λ ∈ L ( π 0 ) π | = ¬ ϕ if π �| = ϕ π | = ϕ 1 ∨ ϕ 2 if π | = ϕ 1 or π | = ϕ 2 if ∃ i ≤ n : π | i | = F ≤ n ϕ π | = ϕ ∀ i ≤ n : π | i | = G ≤ n ϕ π | = ϕ ∃ i ≤ n ∀ k ≤ i : π | k | = ϕ 1 and π | i | = ϕ 1 U ≤ n ϕ 2 π | = ϕ 2 a a a a a b b a U ≤ n b David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

  10. Probabilisitic MC and Statistical MC Bounded LTL Syntax of BLTL ϕ := λ | ¬ ϕ | ϕ ∨ ϕ | F ≤ n ϕ | G ≤ n ϕ | ϕ U ≤ n ϕ where λ ∈ Λ. Semantics of BLTL π | = λ if λ ∈ L ( π 0 ) π | = ¬ ϕ if π �| = ϕ π | = ϕ 1 ∨ ϕ 2 if π | = ϕ 1 or π | = ϕ 2 if ∃ i ≤ n : π | i | = F ≤ n ϕ π | = ϕ ∀ i ≤ n : π | i | = G ≤ n ϕ π | = ϕ ∃ i ≤ n ∀ k ≤ i : π | k | = ϕ 1 and π | i | = ϕ 1 U ≤ n ϕ 2 π | = ϕ 2 a a a a a b b a U ≤ n b David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

  11. Probabilisitic MC and Statistical MC Bounded LTL Syntax of BLTL ϕ := λ | ¬ ϕ | ϕ ∨ ϕ | F ≤ n ϕ | G ≤ n ϕ | ϕ U ≤ n ϕ where λ ∈ Λ. Semantics of BLTL π | = λ if λ ∈ L ( π 0 ) π | = ¬ ϕ if π �| = ϕ π | = ϕ 1 ∨ ϕ 2 if π | = ϕ 1 or π | = ϕ 2 if ∃ i ≤ n : π | i | = F ≤ n ϕ π | = ϕ ∀ i ≤ n : π | i | = G ≤ n ϕ π | = ϕ ∃ i ≤ n ∀ k ≤ i : π | k | = ϕ 1 and π | i | = ϕ 1 U ≤ n ϕ 2 π | = ϕ 2 a a a a a b b a U ≤ n b David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

  12. Probabilisitic MC and Statistical MC Bounded LTL Syntax of BLTL ϕ := λ | ¬ ϕ | ϕ ∨ ϕ | F ≤ n ϕ | G ≤ n ϕ | ϕ U ≤ n ϕ where λ ∈ Λ. Semantics of BLTL π | = λ if λ ∈ L ( π 0 ) π | = ¬ ϕ if π �| = ϕ π | = ϕ 1 ∨ ϕ 2 if π | = ϕ 1 or π | = ϕ 2 if ∃ i ≤ n : π | i | = F ≤ n ϕ π | = ϕ ∀ i ≤ n : π | i | = G ≤ n ϕ π | = ϕ ∃ i ≤ n ∀ k ≤ i : π | k | = ϕ 1 and π | i | = ϕ 1 U ≤ n ϕ 2 π | = ϕ 2 a a a a a b b a U ≤ n b David Henriques (CMU) SMC for MDPs QEST’12 10 / 37

  13. Probabilisitic MC and Statistical MC Probabilistic BLTL The decision problem of MC in fully probabilistic settings is finding out if, for a given parameter θ , P σ ( { π : π | = ϕ } ) ≤ θ David Henriques (CMU) SMC for MDPs QEST’12 11 / 37

  14. Probabilisitic MC and Statistical MC Probabilistic BLTL The decision problem of MC in fully probabilistic settings is finding out if, for a given parameter θ , P σ ( { π : π | = ϕ } ) ≤ θ Proposition This is a well posed problem. David Henriques (CMU) SMC for MDPs QEST’12 11 / 37

  15. Probabilisitic MC and Statistical MC We should be so lucky... We may not have a scheduler, but we still want to guarantee properties... David Henriques (CMU) SMC for MDPs QEST’12 12 / 37

  16. Probabilisitic MC and Statistical MC We should be so lucky... We may not have a scheduler, but we still want to guarantee properties... We make claims that hold all for all schedulers, no matter how adversarial. David Henriques (CMU) SMC for MDPs QEST’12 12 / 37

  17. Probabilisitic MC and Statistical MC We should be so lucky... We may not have a scheduler, but we still want to guarantee properties... We make claims that hold all for all schedulers, no matter how adversarial. The (decision) problem for MC for MDPS is finding out if, for a given parameter θ , P σ ( { π : π | = ϕ } ) ≤ θ for all σ David Henriques (CMU) SMC for MDPs QEST’12 12 / 37

  18. SMC for MDPs Summary 1 Markov Decision Processes 2 Probabilisitic MC and Statistical MC 3 SMC for MDPs 4 Why does it work? 5 Experimental Validation David Henriques (CMU) SMC for MDPs QEST’12 13 / 37

  19. SMC for MDPs SMC for MDPS Basic idea “Learn the most adversarial scheduler (or a good enough approximation) by successively refining an initial guess” David Henriques (CMU) SMC for MDPs QEST’12 14 / 37

  20. SMC for MDPs SMC for MDPS Basic idea “Learn the most adversarial scheduler (or a good enough approximation) by successively refining an initial guess” David Henriques (CMU) SMC for MDPs QEST’12 14 / 37

  21. θ φ ≡ σ SMC for MDPs Scheduler Evaluation Same ideas as classical Statistical Model Checking David Henriques (CMU) SMC for MDPs QEST’12 15 / 37

  22. SMC for MDPs Scheduler Evaluation Same ideas as classical Statistical Model Checking Evaluate Probability Treshold Traces θ Answer BLTL formula Sample φ ≡ p 1 U <12 ( G <10 ( ¬ p 3 )) Hypothesis Sufficient Traces Testing Statistical Evidence Fully Probabilistic System + σ David Henriques (CMU) SMC for MDPs QEST’12 15 / 37

  23. SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

  24. SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

  25. SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen # samples →∞ Q σ ( s , a ) ˆ Q σ ( s , a ) ≡ P ( π | − → = ϕ | ( s , a ) ∈ π ) David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

  26. SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen # samples →∞ Q σ ( s , a ) ˆ Q σ ( s , a ) ≡ P ( π | − → = ϕ | ( s , a ) ∈ π ) b c a David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

  27. SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen # samples →∞ Q σ ( s , a ) ˆ Q σ ( s , a ) ≡ P ( π | − → = ϕ | ( s , a ) ∈ π ) b c 1000 tries 0 successes a David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

  28. SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen # samples →∞ Q σ ( s , a ) ˆ Q σ ( s , a ) ≡ P ( π | − → = ϕ | ( s , a ) ∈ π ) b c 1000 tries 0 successes a 500 tries 500 successes David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

  29. SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen # samples →∞ Q σ ( s , a ) ˆ Q σ ( s , a ) ≡ P ( π | − → = ϕ | ( s , a ) ∈ π ) 700 tries b c 525 successes 1000 tries 0 successes a 500 tries 500 successes David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

  30. SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen # samples →∞ Q σ ( s , a ) ˆ Q σ ( s , a ) ≡ P ( π | − → = ϕ | ( s , a ) ∈ π ) 700 tries b Q (s,b) = 0 c 525 successes Q(s,c) = ¾ 1000 tries 0 successes a 500 tries 500 successes Q(s,a) = 1 David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

  31. SMC for MDPs Scheduler Evalutaion Record whether state action pairs crossed by samples satisfied ϕ . Q σ of a visited ( s , a ) is #( s , a ) seen in satisfying traces Empirical quality ˆ # times ( s , a ) was seen # samples →∞ Q σ ( s , a ) ˆ Q σ ( s , a ) ≡ P ( π | − → = ϕ | ( s , a ) ∈ π ) b Q (s,b) = 0 c Q(s,c) = ¾ a Q(s,a) = 1 David Henriques (CMU) SMC for MDPs QEST’12 16 / 37

  32. SMC for MDPs Scheduler Improvement New scheduler σ ′ is obtained from σ by giving higher probability to transitions with higher quality. Update Rule ˆ Q σ ( s , a ) σ ′ ( s , a ) = b ∈A ˆ Q σ ( s , b ) � David Henriques (CMU) SMC for MDPs QEST’12 17 / 37

  33. SMC for MDPs Scheduler Improvement New scheduler σ ′ is obtained from σ by giving higher probability to transitions with higher quality. Update Rule ˆ Q σ ( s , a ) σ ′ ( s , a ) = b ∈A ˆ Q σ ( s , b ) � b Q (s,b) = 0 c Q(s,c) = ¾ a Q(s,a) = 1 David Henriques (CMU) SMC for MDPs QEST’12 17 / 37

  34. SMC for MDPs Scheduler Improvement New scheduler σ ′ is obtained from σ by giving higher probability to transitions with higher quality. Update Rule ˆ Q σ ( s , a ) σ ′ ( s , a ) = b ∈A ˆ Q σ ( s , b ) � b Q (s,b) = 0 c Q(s,c) = ¾ a σ ’(s,a) = 1/(1+ ¾+0) Q(s,a) = 1 David Henriques (CMU) SMC for MDPs QEST’12 17 / 37

  35. SMC for MDPs Scheduler Improvement New scheduler σ ′ is obtained from σ by giving higher probability to transitions with higher quality. Update Rule ˆ Q σ ( s , a ) σ ′ ( s , a ) = b ∈A ˆ Q σ ( s , b ) � b Q (s,b) = 0 c σ ’(s,c) = 3/7 σ ’(s,b) = 0 Q(s,c) = ¾ a σ ’(s,a) = 4/7 Q(s,a) = 1 David Henriques (CMU) SMC for MDPs QEST’12 17 / 37

  36. SMC for MDPs Scheduler Improvement New scheduler σ ′ is obtained from σ by giving higher probability to transitions with higher quality. Update Rule ˆ Q σ ( s , a ) σ ′ ( s , a ) = b ∈A ˆ Q σ ( s , b ) � b c σ ’(s,c) = 3/7 σ ’(s,b) = 0 a σ ’(s,a) = 4/7 David Henriques (CMU) SMC for MDPs QEST’12 17 / 37

  37. SMC for MDPs History and Greediness What if we explore too little? In case there are state action pairs such that ˆ Q ( s , a ) = 0, keep a history parameter h and update instead ˆ Q σ ( s , a ) σ ′ ( s , a ) = h σ ( s , a ) + (1 − h ) b ∈A ˆ � Q σ ( s , b ) This avoids “blocking” transitions. David Henriques (CMU) SMC for MDPs QEST’12 18 / 37

  38. SMC for MDPs History and Greediness What if we explore too little? In case there are state action pairs such that ˆ Q ( s , a ) = 0, keep a history parameter h and update instead ˆ Q σ ( s , a ) σ ′ ( s , a ) = h σ ( s , a ) + (1 − h ) b ∈A ˆ � Q σ ( s , b ) This avoids “blocking” transitions. b c σ ’(s,c) = 3/7 σ ’(s,b) = 0 a σ ’(s,a) = 4/7 David Henriques (CMU) SMC for MDPs QEST’12 18 / 37

  39. SMC for MDPs History and Greediness What if we explore too little? In case there are state action pairs such that ˆ Q ( s , a ) = 0, keep a history parameter h and update instead ˆ Q σ ( s , a ) σ ′ ( s , a ) = h σ ( s , a ) + (1 − h ) b ∈A ˆ � Q σ ( s , b ) This avoids “blocking” transitions. b c σ ’(s,c) = 3/7 σ (s,c) = 1/3 σ ’(s,b) = 0 σ (s,b) = 1/3 a σ ’(s,a) = 4/7 σ ’(s,a) = 1/3 David Henriques (CMU) SMC for MDPs QEST’12 18 / 37

  40. SMC for MDPs History and Greediness What if we explore too little? In case there are state action pairs such that ˆ Q ( s , a ) = 0, keep a history parameter h and update instead ˆ Q σ ( s , a ) σ ′ ( s , a ) = h σ ( s , a ) + (1 − h ) b ∈A ˆ � Q σ ( s , b ) This avoids “blocking” transitions. b c σ ’(s,c) = 3/7 σ (s,c) = 1/3 σ ’(s,b) = 0 σ (s,b) = 1/3 a σ ’(s,a) = 4/7 σ ’(s,b) = 1/3*h + 0 * (1-h) > 0 σ ’(s,a) = 1/3 David Henriques (CMU) SMC for MDPs QEST’12 18 / 37

  41. SMC for MDPs History and Greediness What if we explore too little? In case there are state action pairs such that ˆ Q ( s , a ) = 0, keep a history parameter h and update instead ˆ Q σ ( s , a ) σ ′ ( s , a ) = h σ ( s , a ) + (1 − h ) b ∈A ˆ � Q σ ( s , b ) This avoids “blocking” transitions. b c σ ’(s,c) = 1/3*h + 3/7 * (1-h) σ ’(s,b) = 1/3*h + 0 * (1-h) a σ ’(s,a) = 1/3*h + 4/7 * (1-h) David Henriques (CMU) SMC for MDPs QEST’12 18 / 37

  42. SMC for MDPs History and Greediness What if we explore too much? Keep a greediness parameter ǫ and give all probability to the best action except for ǫ , which is distributed according to the update rule This avoids slow updates. David Henriques (CMU) SMC for MDPs QEST’12 19 / 37

  43. SMC for MDPs History and Greediness What if we explore too much? Keep a greediness parameter ǫ and give all probability to the best action except for ǫ , which is distributed according to the update rule This avoids slow updates. b c σ ’(s,c) = 3/7 σ ’(s,b) = 0 a σ ’(s,a) = 4/7 David Henriques (CMU) SMC for MDPs QEST’12 19 / 37

  44. SMC for MDPs History and Greediness What if we explore too much? Keep a greediness parameter ǫ and give all probability to the best action except for ǫ , which is distributed according to the update rule This avoids slow updates. b c σ ’(s,c) = 3/7 σ ’(s,b) = 0 ε a σ ’(s,a) = 4/7 David Henriques (CMU) SMC for MDPs QEST’12 19 / 37

  45. SMC for MDPs History and Greediness What if we explore too much? Keep a greediness parameter ǫ and give all probability to the best action except for ǫ , which is distributed according to the update rule This avoids slow updates. b c σ ’(s,c) = 3/7 σ ’(s,b) = 0 1- ε a σ ’(s,a) = 4/7 David Henriques (CMU) SMC for MDPs QEST’12 19 / 37

  46. SMC for MDPs History and Greediness What if we explore too much? Keep a greediness parameter ǫ and give all probability to the best action except for ǫ , which is distributed according to the update rule This avoids slow updates. b c σ ’(s,c) = 3/7 *(1- ε ) σ ’(s,b) = 0 *(1- ε ) a σ ’(s,a) = ε + 4/7 *(1- ε ) David Henriques (CMU) SMC for MDPs QEST’12 19 / 37

  47. SMC for MDPs If at first you don’t succeed... If σ makes P σ ( { π : π | = ϕ } ) > θ , the property is surely false. David Henriques (CMU) SMC for MDPs QEST’12 20 / 37

  48. SMC for MDPs If at first you don’t succeed... If σ makes P σ ( { π : π | = ϕ } ) > θ , the property is surely false. If not We may be converging towards a local optimum; The property may be true; David Henriques (CMU) SMC for MDPs QEST’12 20 / 37

  49. SMC for MDPs If at first you don’t succeed... Algorithms like this are called “False-biased Monte Carlo Algorithms” We can trust False Input Algorithm We have to True reconsider a couple of times Confidence increases exponentially with the number of times we restart. Theorem David Henriques (CMU) SMC for MDPs QEST’12 21 / 37

  50. Why does it work? Summary 1 Markov Decision Processes 2 Probabilisitic MC and Statistical MC 3 SMC for MDPs 4 Why does it work? 5 Experimental Validation David Henriques (CMU) SMC for MDPs QEST’12 22 / 37

  51. Why does it work? Value Definition [Value] The Value of a state s under a scheduler σ is defined as V σ ( s ) = P ( π | = ϕ | ( s , a ) ∈ π, a ∈ A ( s )) David Henriques (CMU) SMC for MDPs QEST’12 23 / 37

  52. Why does it work? Value Definition [Value] The Value of a state s under a scheduler σ is defined as V σ ( s ) = P ( π | = ϕ | ( s , a ) ∈ π, a ∈ A ( s )) Notice that the MC problem can be reduced to finding V ( σ s i ) David Henriques (CMU) SMC for MDPs QEST’12 23 / 37

  53. Why does it work? Value Definition [Value] The Value of a state s under a scheduler σ is defined as V σ ( s ) = P ( π | = ϕ | ( s , a ) ∈ π, a ∈ A ( s )) Notice that the MC problem can be reduced to finding V ( σ s i ) V σ ( s ) = � σ ( s , a ) Q σ ( s , a ) a ∈A ( s ) David Henriques (CMU) SMC for MDPs QEST’12 23 / 37

  54. Why does it work? Value Definition [Local Update] Let σ and σ ′ be two schedulers. The local update of σ by σ ′ in s , σ [ σ ( s ) → σ ′ ( s )] is the scheduler the behaves like σ everywhere but in s , where it behaves as σ ′ . σ ′ σ s s σ [ σ ( s → σ ′ ( s ))] David Henriques (CMU) SMC for MDPs QEST’12 24 / 37

  55. Why does it work? Value Definition [Local Update] Let σ and σ ′ be two schedulers. The local update of σ by σ ′ in s , σ [ σ ( s ) → σ ′ ( s )] is the scheduler the behaves like σ everywhere but in s , where it behaves as σ ′ . σ ′ σ s s s σ [ σ ( s → σ ′ ( s ))] David Henriques (CMU) SMC for MDPs QEST’12 24 / 37

  56. Why does it work? Value Definition [Local Update] Let σ and σ ′ be two schedulers. The local update of σ by σ ′ in s , σ [ σ ( s ) → σ ′ ( s )] is the scheduler the behaves like σ everywhere but in s , where it behaves as σ ′ . σ ′ σ s s s σ [ σ ( s → σ ′ ( s ))] David Henriques (CMU) SMC for MDPs QEST’12 24 / 37

  57. Why does it work? Value Theorem [SB] Let σ and σ ′ be two schedulers and ∀ s ∈ S : V σ [ σ ( s ) → σ ′ ( s )] ( s ) ≥ V σ ( s ), then ∀ s ∈ S : V σ ′ ( s ) ≥ V σ ( s ) Corollary Let σ be the input scheduler and σ ′ be the output of Scheduler Improvement. Then ∀ s ∈ S : V σ ′ ( s ) ≥ V σ ( s ) and, in particular V σ ′ ( s i ) ≥ V σ ( s i ) Proof David Henriques (CMU) SMC for MDPs QEST’12 25 / 37

  58. Experimental Validation Summary 1 Markov Decision Processes 2 Probabilisitic MC and Statistical MC 3 SMC for MDPs 4 Why does it work? 5 Experimental Validation David Henriques (CMU) SMC for MDPs QEST’12 26 / 37

  59. Experimental Validation Experimental Validation We divided models in three categories Heavily structured models Structured models Unstructured models Comparisons were made against PRISM, a state-of-the-art probabilistic model checker David Henriques (CMU) SMC for MDPs QEST’12 27 / 37

  60. Experimental Validation Highly Structured Models CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

  61. Experimental Validation Highly Structured Models CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol   David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

  62. Experimental Validation Highly Structured Models CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol   David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

  63. Experimental Validation Highly Structured Models CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol  David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

  64. Experimental Validation Highly Structured Models CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol  David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

  65. Experimental Validation Highly Structured Models CSMA - Carrier Sense, Multiple Access protocol WLAN - IEEE 802.11 wireless LAN protocol David Henriques (CMU) SMC for MDPs QEST’12 28 / 37

  66. Experimental Validation Highly Structured Models θ 0.5 0.8 0.85 0.9 0.95 PRISM CSMA out F F F T T 0.86 3 4 t 1.7 11.5 35.9 115.7 111.9 136 0.3 0.4 0.45 0.5 0.8 PRISM θ CSMA out F F F T T 0.48 3 6 t 2.5 9.4 18.8 133.9 119.3 2995 0.5 0.7 0.8 0.9 0.95 PRISM θ CSMA out F F F F T 0.93 4 4 t 3.5 3.7 17.5 69.0 232.8 16244 θ 0.5 0.7 0.8 0.9 0.95 PRISM CSMA out F F F F F timeout 4 6 t 3.7 4.1 4.2 26.2 258.9 timeout θ 0.1 0.15 0.2 0.25 0.5 PRISM WLAN out F F T T T 0.18 5 t 4.9 11.1 124.7 104.7 103.2 1.6 θ 0.1 0.15 0.2 0.25 0.5 PRISM WLAN out F F T T T 0.18 6 t 5.0 11.3 127.0 104.9 102.9 1.6 David Henriques (CMU) SMC for MDPs QEST’12 29 / 37

  67. Experimental Validation Highly Structured Models θ 0.5 0.8 0.85 0.9 0.95 PRISM CSMA out F F F T T 0.86 3 4 t 1.7 11.5 35.9 115.7 111.9 136 0.3 0.4 0.45 0.5 0.8 PRISM θ CSMA out F F F T T 0.48 3 6 t 2.5 9.4 18.8 133.9 119.3 2995 0.5 0.7 0.8 0.9 0.95 PRISM θ CSMA out F F F F T 0.93 4 4 t 3.5 3.7 17.5 69.0 232.8 16244 θ 0.5 0.7 0.8 0.9 0.95 PRISM CSMA out F F F F F timeout 4 6 t 3.7 4.1 4.2 26.2 258.9 timeout θ 0.1 0.15 0.2 0.25 0.5 PRISM WLAN out F F T T T 0.18 5 t 4.9 11.1 124.7 104.7 103.2 1.6 θ 0.1 0.15 0.2 0.25 0.5 PRISM WLAN out F F T T T 0.18 6 t 5.0 11.3 127.0 104.9 102.9 1.6 David Henriques (CMU) SMC for MDPs QEST’12 29 / 37

  68. Experimental Validation Highly Structured Models Takeaways Symmetry makes the number of “meaningful” actions relatively small; SMC works well in highly structured systems; Exact methods still work best in most cases; David Henriques (CMU) SMC for MDPs QEST’12 30 / 37

  69. Experimental Validation Structured Models Motion Planning - Two robots move around an n by n plant Safe 1 U ≤ 30 � Safe ′ 1 U ≤ 30 RendezVous � � ��� P ≤ θ ( pickup 1 ∧ Safe 2 U ≤ 30 � Safe ′ 2 U ≤ 30 RendezVous � � ��� ∧ pickup 2 ∧ ) David Henriques (CMU) SMC for MDPs QEST’12 31 / 37

Recommend


More recommend