rare event simulation for a static distribution
play

Rare event simulation for a static distribution F . C erou, P . - PowerPoint PPT Presentation

1.1 Rare event simulation for a static distribution F . C erou, P . Del Moral, T. Furon, A. Guyader Resim 2008, Rennes This work was partially supported by the French Agence Nationale de la Recherche (ANR), project Nebbiano, number


  1. 1.1 Rare event simulation for a static distribution F . C´ erou, P . Del Moral, T. Furon, A. Guyader Resim 2008, Rennes This work was partially supported by the French Agence Nationale de la Recherche (ANR), project Nebbiano, number ANR-06-SETI-009

  2. 1.2 Introduction X ∼ µ , with µ probability measure on X ( R d , or a discrete space) We know how to draw samples from µ Given a function S : X �− → R , we look at the rare event R = { S ( X ) > τ } We want to compute µ ( R ) = � ( X ∈ R ) , and draw samples from 1 µ R ( dx ) = � R ( x ) µ ( dx ) µ ( R )

  3. 1.3 Motivation and examples Watermarking of digital contents: imbedding/hiding information in a digital file (typically audio or video), such that the change is not noticed, and very hard to remove (robust to any kind of transformation, coding, compression...) Used for: copy protection or fingerprinting n Watermark W ∈ X Yes Image I ∈ I Encoding Detection No Figure 1: Watermarking Our rare event occurs when the detection box anwers “yes” but the content is not watermarked

  4. 1.4 Zero-bit watermarking detection region u Figure 2: Zero-bit watermarking • u ∈ R d is a fixed and normalized secret vector. • A content X is deemed watermarked if S ( X ) = � X,u � � X � > τ . • Classic Assumption : An unwatermarked content X has a radially symmetric pdf. As S is also radially symmetric, we choose X ∼ N (0 , I ) • False detection : P fd = � ( S ( X ) > τ | X unwatermarked ) Toy example used to validate the algorithm

  5. 1.5 Probabilistic fingerprinting codes Fingerprinting: • Principle : Some personal identification sequence F i ∈ { 0 , 1 } m is hidden in the copy of each user. • Benefit : Find a dishonest user via his fingerprint • False Detections : Accusing an innocent (false alarm) or accusing none of the colluders (false negative) Tardos probabilistic codes: • Fingerprint : X = [ X 1 , . . . , X m ] , X i ∼ B ( p i ) and p i ∼ f ( p ) (same p i ’s for all users) • Pirated Copy : y = [ y 1 , . . . , y m ] ∈ { 0 , 1 } m • Accusation procedure : S ( X ) = � m i =1 y i g i ( X i ) ≷ τ The choice of f and the g i ’s is crucial (but not discussed here)

  6. 1.6 Collusions Image I Fingerprint F 1 I 1 ? Accusation I ′ Image I Fingerprint F i bad boys Procedure I N Image I Fingerprint F N Figure 3: Collusion Several users compare their digital content: they are not exactly the same... Stategies to build up a new file, different from all the users’ ones: • majority vote • random choice on parts • put the detected bits equal to 0 • ...

  7. 1.7 Multilevel approach p 2 = � ( S ( X ) > L 2 | S ( X ) > L 1 ) pdf of S ( X ) R L 1 L 2 . . . L i . . . τ = L M Figure 4: Multilevel • Ingredients : fix M and L 1 < · · · < L M = τ so that each p i = � ( S ( X ) > L i | S ( X ) > L i − 1 ) is not too small. • Bayes decomposition : α = p 1 p 2 . . . p M . • Unrealistic case : suppose you can estimate each p i independently with classic Monte-Carlo : p i ≈ ˆ p i = N i /N . • Multilevel Estimator : ˆ α N = ˆ p 1 ˆ p 2 . . . ˆ p M .

  8. 1.8 The Shaker • Recall : X ∼ µ on X . • Ingredient : a µ reversible transition kernel K ( x, dx ′ ) on X : ∀ ( x, x ′ ) ∈ X 2 µ ( dx ) K ( x, dx ′ ) = µ ( dx ′ ) K ( x ′ , dx ) . • Consequence : µK = µ . • Example : if X ∼ N (0 , 1) then X ′ = X + σW 1+ σ 2 ∼ N (0 , 1) , i.e. √ σ 2 K ( x, dx ′ ) ∼ N ( x 1+ σ 2 )( dx ′ ) is a “good shaker”. 1+ σ 2 , √ σ 2 √ x M ( x, . ) = N ( 1+ σ 2 ) 1+ σ 2 , x √ x 1+ σ 2

  9. 1.9 Feynman-Kac representation A k = S − 1 (] L k , + ∞ [) M K k ( x, dy ) = K ( x, dy ) � A k ( y ) + K ( x, A c k ) δ x ( dy ) 1 µ k ( dx ) = � A k ( x ) µ ( dx ) the normalized restriction of µ on A k µ ( A k ) µ k invariant by M K k X k Markov chain with initial distribution µ and transitions M K k For every test function ϕ , for k ∈ { 0 , . . . , n } , we have the following Feynman-Kac representation E [ ϕ ( X k ) � k � A j +1 ( X j )] j =0 µ k +1 ( ϕ ) = . E [ � k � A j +1 ( X j )] j =0

  10. 1.10 Algorithm • Initialization : Simulate an i.i.d. sample ξ 1 0 , . . . , ξ N 0 ∼ µ . p 1 = 1 � A 1 ( ξ 1 • Estimate ˆ � 0 ) N • Selection : ˆ ξ i 0 = ξ i 0 if S ( ξ i 0 ) > L 1 , else pick at random among the N 1 selected particles. • Mutation : ˜ 0 ∼ M (ˆ ξ i ξ i 0 , dx ′ ) and  ˜ if S (˜ ξ i ξ i 1 ) > L 1  1 ξ i ∀ i ∈ { 1 , . . . , N } 1 = ˆ if S (˜ ξ i ξ i 1 ) ≤ L 1  1 • Consider next level and iterate until the rare event is reached

  11. 1.11 Algorithm p 1 = 4 ˆ 8 L 1

  12. 1.12 Algorithm L 1

  13. 1.13 Algorithm L 1

  14. 1.14 Algorithm L 1

  15. 1.15 Algorithm p 2 = 3 ˆ 8 L 1 L 2

  16. 1.16 Algorithm L 1 L 2

  17. 1.17 Implementation issues Choice of K Depends on the model: Metropolis-Hastings, Gibbs sampler, Gaussian case (zero-bit watermarking), i.i.d. on some random sites (Tardos codes) Trade-off between two drawbacks : • “shaking effect” too large : most proposed mutations are refused. • “shaking effect” too small : particles almost don’t move. Levels L k Adaptive levels with fixed rate of success: p 0 quantile on S ( ξ j k ) to set L k +1 , p 0 = 0 . 75 or 0 . 8 is a good choice Less dependent sample We can iterate the kernel M K k several times to improve the variability of the particles. From well known results on Metropolis-Hastings, the sample is getting more and more independent. Rule: iterate until 90 or 95% have actually moved to an accepted transition

  18. 1.18 Asymptotic variance Best achievable asymptotic variance: • Multilevel Estimator : ˆ α N = ˆ p 1 ˆ p 2 . . . ˆ p M . • Fluctuations : If the ˆ p i ’s are independent, then M √ N · ˆ α N − α 1 − p i L � − N →∞ N (0 , − − → ) . α p i i =1 • Constrained Minimization : M M 1 − p i � � arg min s.t. p i = α. p i p 1 ,...,p M i =1 i =1 • Optimum : p 1 = · · · = p M = α 1 /M .

  19. 1.19 Simulations : The Model detection region u • The model : X ∼ N (0 , I 20 ) . � ( � X,u � • Rare event : α = � X � > 0 . 95) . • Numerical computation : α = 4 . 704 · 10 − 11 . • Parameter : p = 3 / 4 � α = r × p M = . 83 × (3 / 4) 82 .

  20. 1.20 Numerical results credit: V. Bahuon 0 10 � 1 10 � 2 10 2 3 4 5 10 10 10 10 Number N of particles Figure 5: Relative standard deviation.

  21. 1.21 0 10 � 1 10 � 2 10 � 3 10 � 4 10 2 3 4 5 10 10 10 10 Number N of particles Figure 6: Relative bias.

  22. 1.22 Perspectives Find other similar applications (e.g. probabilistic counting algotihms in a large discrete set) Work in progress: non asymptotic variance estimates

Recommend


More recommend