statistical model checking and rare events
play

Statistical Model Checking and Rare Events Paolo Zuliani Joint - PowerPoint PPT Presentation

Statistical Model Checking and Rare Events Paolo Zuliani Joint work with Edmund M. Clarke Computer Science Department, CMU Probabilistic Verification Verification of stochastic system models via statistical model checking Temporal


  1. Statistical Model Checking and Rare Events Paolo Zuliani Joint work with Edmund M. Clarke Computer Science Department, CMU

  2. Probabilistic Verification  Verification of stochastic system models via statistical model checking  Temporal logic specification:  “the amount of p53 exceeds 10 5 within 20 minutes”  If Ф = “p53 exceeds 10 5 within 20 minutes” Probability ( Ф ) = ?

  3. Equivalently  A biased coin (Bernoulli random variable):  Prob (Heads) = p Prob (Tails) = 1-p  p is unknown  Question: What is p ?  A solution: flip the coin a number of times, collect the outcomes, and use statistical estimation

  4. Statistical Model Checking Key idea (Haakan Younes, 2001)  System behavior w.r.t. property Ф can be modeled by a Bernoulli random variable of parameter p :  System satisfies Ф with (unknown) probability p  Question: What is p?  Draw a sample of system simulations and use:  Statistical estimation : returns “ p in interval (a,b)” with high probability

  5. Statistical Model Checking  Statistical Model Checking is a Monte Carlo method  Problems arise when p is very small (rare event)  The number of simulations (coin flips) needed to estimate p accurately grows too large  Need to deal with this …

  6. Rare events  Estimate Prob(X  t ) = p t , when p t is small (say 10 -9 )  

  7. Rare events  Estimate Prob(X  t ) = p t , when p t is small (say 10 -9 )  Standard (Crude) Monte Carlo: generate K i.i.d. samples of X; return the estimator e K e K =  Prob ( e K  p t ) = 1 for K   (strong law LN)

  8. Rare events  E[ e K ] = p t (  p 1 p )  Var[ e K ] = t t K    

  9. Rare events  E[ e K ] = p t (  p 1 p )  Var[ e K ] = t t K  By the Central Limit Theorem (CLT), the distribution of e K converges to a normal distribution with:  mean p t (  p 1 p )  variance t t K  var[ e ] p ( 1 p )  t t  Relative Error (RE) = K E [ e ] p K K t

  10. Rare events (  p 1 p )  RE = t t p K t  Fix K , then RE is unbounded as p t  0  More accuracy  more samples  Want confidence interval of relative accuracy δ and coverage probability c, i.e., estimate e K must satisfy: Prob(| e K – p t | < δ· p t ) ≥ c  How many samples do we need?

  11. Rare events  From the CLT, a 99% (approximate) confidence interval of relative accuracy δ needs about  1 p t K ≈ samples  2 p t Thus, Prob(| e K – p t | < δ p t ) ≈ 0.99   

  12. Rare events  From the CLT, a 99% (approximate) confidence interval of relative accuracy δ needs about  1 p t K ≈ samples  2 p t Thus, Prob(| e K – p t | < δ p t ) ≈ 0.99  Examples:  p t = 10 -9 and δ = 10 -2 ( ie , 1% relative accuracy) we need about 10 13 samples!!  Bayesian estimation requires about 6x10 6 samples with p t =10 -4 and δ = 10 -1

  13. A solution  Importance Sampling (1940s)  A variance-reduction technique  Can result in dramatic reduction in sample size

  14. Importance Sampling  The fundamental Importance Sampling identity f is the density of X

  15. Importance Sampling  The fundamental Importance Sampling identity likelihood ratio f is the density of X

  16. Importance Sampling  Estimate p t = E[ X  t ] = Prob( X  t)  A sample X 1 ,… X K iid as f  The crude Monte Carlo estimator is

  17. Importance Sampling  Estimate p t = E[ X  t ] = Prob( X  t)  A sample X 1 ,… X K iid as f  The crude Monte Carlo estimator is sampling from f

  18. Importance Sampling  Define a biasing density f *  Compute the IS estimator f ( x )  where is the likelihood ratio W ( x ) f * x ( )

  19. Importance Sampling  Define a biasing density f *  Compute the IS estimator sampling from f * ! f ( x )  where is the likelihood ratio W ( x ) f * x ( )

  20. Importance Sampling  Need to choose a “good” biasing density (low variance)  I ( x t ) f ( x )   Optimal density: f ( x ) * p t K K 1 1 f ( X )       e I ( X t ) W ( X ) I ( X t ) K K K f ( X )   i 1 i 1 * K 1 f ( X )     p I ( X t ) p  t t K I ( X t ) f ( X )  i 1  Zero variance! (But …)

  21. Importance Sampling  Need to choose a “good” biasing density (low variance)  I ( x t ) f ( x )   Optimal density: f ( x ) * p unknown t K K 1 1 f ( X )       e I ( X t ) W ( X ) I ( X t ) K K K f ( X )   i 1 i 1 * K 1 f ( X )     p I ( X t ) p  t t K I ( X t ) f ( X )  i 1  Zero variance! (But …)

  22. Cross-Entropy Method (R. Rubinstein)  Suppose the density of X in a family of densities { f ( · ; v )}  the “nominal” f is f ( x ; u )  Key idea : choose a parameter v such that the distance between f * and f ( · ; v ) is minimal  The Kullback-Leibler divergence (cross-entropy) is a measure of “distance” between two densities  First used for rare event simulation by Rubinstein (1997)

  23. Cross-Entropy Method  The KL divergence (cross-entropy) of densities g , h is   g ( X )      D ( g , h ) E  ln  g ( x ) ln g ( x ) dx g ( x ) ln h ( x ) dx g   h ( X )  D ( g , h )  0 (= 0 IFF g = h)  D ( g , h ) ≠ D ( h , g )

  24. Cross-Entropy Method  The KL divergence (cross-entropy) of densities g , h is   g ( X )      D ( g , h ) E  ln  g ( x ) ln g ( x ) dx g ( x ) ln h ( x ) dx g   h ( X )  D ( g , h )  0 (= 0 IFF g = h)  D ( g , h ) ≠ D ( h , g ) family { f ( · ; v )}

  25. Cross-Entropy Method  The KL divergence (cross-entropy) of densities g , h is   g ( X )      D ( g , h ) E  ln  g ( x ) ln g ( x ) dx g ( x ) ln h ( x ) dx g   h ( X )  D ( g , h )  0 (= 0 IFF g = h)  D ( g , h ) ≠ D ( h , g ) family { f ( · ; v )} optimal density f *

  26. Cross-Entropy Method  The KL divergence (cross-entropy) of densities g , h is   g ( X )      D ( g , h ) E  ln  g ( x ) ln g ( x ) dx g ( x ) ln h ( x ) dx g   h ( X )  D ( g , h )  0 (= 0 IFF g = h)  D ( g , h ) ≠ D ( h , g ) min D ( f * , f ( · ; v )) family { f ( · ; v )} optimal density f *

  27. Cross-Entropy Method  The Cross-Entropy Method has two basic steps  

  28. Cross-Entropy Method  The Cross-Entropy Method has two basic steps arg min D ( f ( · ), f ( · ; v )) 1. find v * = * v  

  29. Cross-Entropy Method  The Cross-Entropy Method has two basic steps arg min D ( f ( · ), f ( · ; v )) 1. find v * = * v 2. run importance sampling with biasing density f ( · ; v * )  

  30. Cross-Entropy Method  The Cross-Entropy Method has two basic steps arg min D ( f ( · ), f ( · ; v )) 1. find v * = * v 2. run importance sampling with biasing density f ( · ; v * )  

  31. Cross-Entropy Method  The Cross-Entropy Method has two basic steps arg min D ( f ( · ), f ( · ; v )) 1. find v * = * v 2. run importance sampling with biasing density f ( · ; v * )  Step 2 is “easy”  Step 1 is not so easy

  32. Cross-Entropy Method  Step 1 : v * =

  33. Cross-Entropy Method  Step 1 :   f ( X )     v * = * arg min E  ln  arg min f ( x ) ln f ( x ) dx f ( x ) ln f ( x ; v ) dx f * * *   * f ( X ; v ) v v

  34. Cross-Entropy Method  Step 1 : always  0   f ( X )     v * = * arg min E  ln  arg min f ( x ) ln f ( x ) dx f ( x ) ln f ( x ; v ) dx f * * *   * f ( X ; v ) v v

  35. Cross-Entropy Method  Step 1 : always  0   f ( X )     v * = * arg min E  ln  arg min f ( x ) ln f ( x ) dx f ( x ) ln f ( x ; v ) dx f * * *   * f ( X ; v ) v v   arg max f ( x ) ln f ( x ; v ) dx * v

  36. Cross-Entropy Method  Step 1 : always  0   f ( X )     v * = * arg min E  ln  arg min f ( x ) ln f ( x ) dx f ( x ) ln f ( x ; v ) dx f * * *   * f ( X ; v ) v v  I ( x t ) f ( x ; u )     arg max f ( x ) ln f ( x ; v ) dx arg max ln f ( x ; v ) dx * p v v t

  37. Cross-Entropy Method  Step 1 : always  0   f ( X )     v * = * arg min E  ln  arg min f ( x ) ln f ( x ) dx f ( x ) ln f ( x ; v ) dx f * * *   * f ( X ; v ) v v  I ( x t ) f ( x ; u )     arg max f ( x ) ln f ( x ; v ) dx arg max ln f ( x ; v ) dx * p v v t    arg max I ( x t ) f ( x ; u ) ln f ( x ; v ) dx v

Recommend


More recommend