winsorized importance sampling
play

Winsorized Importance Sampling Paulo Orenstein February 8, 2019 - PowerPoint PPT Presentation

Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Winsorized Importance Sampling Paulo Orenstein February 8, 2019 Stanford University Paulo Orenstein Winsorized Importance Sampling Stanford University 1 /


  1. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Winsorized Importance Sampling Paulo Orenstein February 8, 2019 Stanford University Paulo Orenstein Winsorized Importance Sampling Stanford University 1 / 23

  2. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Introduction ◮ Let f ( x ) be an arbitrary function, p ( x ) , q ( x ) probability densities. Suppose we are interested in � θ = E p [ f ( X )] = f ( x ) p ( x ) dx . R Paulo Orenstein Winsorized Importance Sampling Stanford University 2 / 23

  3. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Introduction ◮ Let f ( x ) be an arbitrary function, p ( x ) , q ( x ) probability densities. Suppose we are interested in � θ = E p [ f ( X )] = f ( x ) p ( x ) dx . R ◮ Assume we can only sample from q , which is called the sampling distribu- tion ; p is the target distribution . Paulo Orenstein Winsorized Importance Sampling Stanford University 2 / 23

  4. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Introduction ◮ Let f ( x ) be an arbitrary function, p ( x ) , q ( x ) probability densities. Suppose we are interested in � θ = E p [ f ( X )] = f ( x ) p ( x ) dx . R ◮ Assume we can only sample from q , which is called the sampling distribu- tion ; p is the target distribution . ◮ The importance sampling estimator for θ is n θ n = 1 f ( X i ) p ( X i ) ˆ � q ( X i ) , X i ∼ q . n i = 1 Paulo Orenstein Winsorized Importance Sampling Stanford University 2 / 23

  5. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Introduction ◮ The importance sampling (IS) estimator is unbiased: � � � � f ( x ) p ( X ) f ( x ) p ( x ) n →∞ ˆ θ n − → E = q ( x ) q ( x ) dx = f ( x ) p ( x ) dx = θ, q ( X ) as long as q ( x ) > 0 whenever f ( x ) p ( x ) � = 0. Paulo Orenstein Winsorized Importance Sampling Stanford University 3 / 23

  6. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Introduction ◮ The importance sampling (IS) estimator is unbiased: � � � � f ( x ) p ( X ) f ( x ) p ( x ) n →∞ ˆ θ n − → E = q ( x ) q ( x ) dx = f ( x ) p ( x ) dx = θ, q ( X ) as long as q ( x ) > 0 whenever f ( x ) p ( x ) � = 0. ◮ But it can have huge or even infinite variance, leading to terrible estimates. Paulo Orenstein Winsorized Importance Sampling Stanford University 3 / 23

  7. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Introduction ◮ The importance sampling (IS) estimator is unbiased: � � � � f ( x ) p ( X ) f ( x ) p ( x ) n →∞ ˆ θ n − → E = q ( x ) q ( x ) dx = f ( x ) p ( x ) dx = θ, q ( X ) as long as q ( x ) > 0 whenever f ( x ) p ( x ) � = 0. ◮ But it can have huge or even infinite variance, leading to terrible estimates. ◮ Can we control the variance of the terms Y i = f ( X i ) p ( X i ) q ( X i ) by sacrificing some small amount of bias? Paulo Orenstein Winsorized Importance Sampling Stanford University 3 / 23

  8. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Winsorizing ◮ Can we improve on the IS estimator by winsorizing, or capping, the weights? Paulo Orenstein Winsorized Importance Sampling Stanford University 4 / 23

  9. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Winsorizing ◮ Can we improve on the IS estimator by winsorizing, or capping, the weights? ◮ Denote the random variables winsorized at levels − M and M by Y M = max( − M , min( Y i , M )) . i Paulo Orenstein Winsorized Importance Sampling Stanford University 4 / 23

  10. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Winsorizing ◮ Can we improve on the IS estimator by winsorizing, or capping, the weights? ◮ Denote the random variables winsorized at levels − M and M by Y M = max( − M , min( Y i , M )) . i ◮ Define the winsorized importance sampling estimator at level M as n n = 1 ˆ � θ M Y M i . n i = 1 Paulo Orenstein Winsorized Importance Sampling Stanford University 4 / 23

  11. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Winsorizing ◮ Can we improve on the IS estimator by winsorizing, or capping, the weights? ◮ Denote the random variables winsorized at levels − M and M by Y M = max( − M , min( Y i , M )) . i ◮ Define the winsorized importance sampling estimator at level M as n n = 1 ˆ � θ M Y M i . n i = 1 ◮ Picking the right threshold level M is crucial. Paulo Orenstein Winsorized Importance Sampling Stanford University 4 / 23

  12. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Winsorizing ◮ Can we improve on the IS estimator by winsorizing, or capping, the weights? ◮ Denote the random variables winsorized at levels − M and M by Y M = max( − M , min( Y i , M )) . i ◮ Define the winsorized importance sampling estimator at level M as n n = 1 ˆ � θ M Y M i . n i = 1 ◮ Picking the right threshold level M is crucial. ◮ Bias-variance trade-off: smaller M implies less variance but more bias. Paulo Orenstein Winsorized Importance Sampling Stanford University 4 / 23

  13. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion How to pick M ? ◮ Let { Y i } n i = 1 be random variables distributed iid with mean θ . Paulo Orenstein Winsorized Importance Sampling Stanford University 5 / 23

  14. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion How to pick M ? ◮ Let { Y i } n i = 1 be random variables distributed iid with mean θ . ◮ Consider winsorizing Y i at different threshold levels in a pre-chosen set M j } n Λ = { M 1 , . . . , M k } to obtain winsorized samples { Y i = 1 , j = 1 , . . . , k . i Paulo Orenstein Winsorized Importance Sampling Stanford University 5 / 23

  15. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion How to pick M ? ◮ Let { Y i } n i = 1 be random variables distributed iid with mean θ . ◮ Consider winsorizing Y i at different threshold levels in a pre-chosen set M j } n Λ = { M 1 , . . . , M k } to obtain winsorized samples { Y i = 1 , j = 1 , . . . , k . i ◮ Pick the threshold level according to the rule σ M ′ + ˆ � � σ M ′′ �� ˆ M ∈ Λ : ∀ M ′ , M ′′ ≥ M , | Y M ′ − Y M ′′ | ≤ α · M ∗ = min , 2 where: Paulo Orenstein Winsorized Importance Sampling Stanford University 5 / 23

  16. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion How to pick M ? ◮ Let { Y i } n i = 1 be random variables distributed iid with mean θ . ◮ Consider winsorizing Y i at different threshold levels in a pre-chosen set M j } n Λ = { M 1 , . . . , M k } to obtain winsorized samples { Y i = 1 , j = 1 , . . . , k . i ◮ Pick the threshold level according to the rule σ M ′ + ˆ � � σ M ′′ �� ˆ M ∈ Λ : ∀ M ′ , M ′′ ≥ M , | Y M ′ − Y M ′′ | ≤ α · M ∗ = min , 2 where: t α = c · √ n − t c , t are chosen constants Y M = 1 � n i = 1 Y M n i � σ M = 1 � n i = 1 ( Y M − Y M ) 2 . ˆ n i Paulo Orenstein Winsorized Importance Sampling Stanford University 5 / 23

  17. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Why? ◮ Why is this rule sensible? Paulo Orenstein Winsorized Importance Sampling Stanford University 6 / 23

  18. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Why? ◮ Why is this rule sensible? ◮ Intuitively, if we have truncation levels M ′ > M ′′ , we are willing to truncate further to M ′′ if the increase in bias | 1 i = 1 Y M ′ i = 1 Y M ′′ � n − 1 � n | is small i i n n relative to the standard deviation. Paulo Orenstein Winsorized Importance Sampling Stanford University 6 / 23

  19. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Why? ◮ Why is this rule sensible? ◮ Intuitively, if we have truncation levels M ′ > M ′′ , we are willing to truncate further to M ′′ if the increase in bias | 1 i = 1 Y M ′ i = 1 Y M ′′ � n − 1 � n | is small i i n n relative to the standard deviation. ◮ The actual rule can be thought of as a concrete version of the Balancing Principle (or Lepski’s Method), which is reminiscent of oracle inequalities. Paulo Orenstein Winsorized Importance Sampling Stanford University 6 / 23

  20. Introduction Winsorized IS Theoretical Guarantees Empirical Performance Conclusion Why? ◮ Why is this rule sensible? ◮ Intuitively, if we have truncation levels M ′ > M ′′ , we are willing to truncate further to M ′′ if the increase in bias | 1 i = 1 Y M ′ i = 1 Y M ′′ � n − 1 � n | is small i i n n relative to the standard deviation. ◮ The actual rule can be thought of as a concrete version of the Balancing Principle (or Lepski’s Method), which is reminiscent of oracle inequalities. ◮ With high probability, the mean-squared error using M ∗ is less than 5 times the error roughly incurred by choosing the best threshold level in the set. Paulo Orenstein Winsorized Importance Sampling Stanford University 6 / 23

Recommend


More recommend