1.1 Rare event simulation for a static distribution F . C´ erou, P . Del Moral, T. Furon, A. Guyader Resim 2008, Rennes This work was partially supported by the French Agence Nationale de la Recherche (ANR), project Nebbiano, number ANR-06-SETI-009
1.2 Introduction X ∼ µ , with µ probability measure on X ( R d , or a discrete space) We know how to draw samples from µ Given a function S : X �− → R , we look at the rare event R = { S ( X ) > τ } We want to compute µ ( R ) = � ( X ∈ R ) , and draw samples from 1 µ R ( dx ) = � R ( x ) µ ( dx ) µ ( R )
1.3 Motivation and examples Watermarking of digital contents: imbedding/hiding information in a digital file (typically audio or video), such that the change is not noticed, and very hard to remove (robust to any kind of transformation, coding, compression...) Used for: copy protection or fingerprinting n Watermark W ∈ X Yes Image I ∈ I Encoding Detection No Figure 1: Watermarking Our rare event occurs when the detection box anwers “yes” but the content is not watermarked
1.4 Zero-bit watermarking detection region u Figure 2: Zero-bit watermarking • u ∈ R d is a fixed and normalized secret vector. • A content X is deemed watermarked if S ( X ) = � X,u � � X � > τ . • Classic Assumption : An unwatermarked content X has a radially symmetric pdf. As S is also radially symmetric, we choose X ∼ N (0 , I ) • False detection : P fd = � ( S ( X ) > τ | X unwatermarked ) Toy example used to validate the algorithm
1.5 Probabilistic fingerprinting codes Fingerprinting: • Principle : Some personal identification sequence F i ∈ { 0 , 1 } m is hidden in the copy of each user. • Benefit : Find a dishonest user via his fingerprint • False Detections : Accusing an innocent (false alarm) or accusing none of the colluders (false negative) Tardos probabilistic codes: • Fingerprint : X = [ X 1 , . . . , X m ] , X i ∼ B ( p i ) and p i ∼ f ( p ) (same p i ’s for all users) • Pirated Copy : y = [ y 1 , . . . , y m ] ∈ { 0 , 1 } m • Accusation procedure : S ( X ) = � m i =1 y i g i ( X i ) ≷ τ The choice of f and the g i ’s is crucial (but not discussed here)
1.6 Collusions Image I Fingerprint F 1 I 1 ? Accusation I ′ Image I Fingerprint F i bad boys Procedure I N Image I Fingerprint F N Figure 3: Collusion Several users compare their digital content: they are not exactly the same... Stategies to build up a new file, different from all the users’ ones: • majority vote • random choice on parts • put the detected bits equal to 0 • ...
1.7 Multilevel approach p 2 = � ( S ( X ) > L 2 | S ( X ) > L 1 ) pdf of S ( X ) R L 1 L 2 . . . L i . . . τ = L M Figure 4: Multilevel • Ingredients : fix M and L 1 < · · · < L M = τ so that each p i = � ( S ( X ) > L i | S ( X ) > L i − 1 ) is not too small. • Bayes decomposition : α = p 1 p 2 . . . p M . • Unrealistic case : suppose you can estimate each p i independently with classic Monte-Carlo : p i ≈ ˆ p i = N i /N . • Multilevel Estimator : ˆ α N = ˆ p 1 ˆ p 2 . . . ˆ p M .
1.8 The Shaker • Recall : X ∼ µ on X . • Ingredient : a µ reversible transition kernel K ( x, dx ′ ) on X : ∀ ( x, x ′ ) ∈ X 2 µ ( dx ) K ( x, dx ′ ) = µ ( dx ′ ) K ( x ′ , dx ) . • Consequence : µK = µ . • Example : if X ∼ N (0 , 1) then X ′ = X + σW 1+ σ 2 ∼ N (0 , 1) , i.e. √ σ 2 K ( x, dx ′ ) ∼ N ( x 1+ σ 2 )( dx ′ ) is a “good shaker”. 1+ σ 2 , √ σ 2 √ x M ( x, . ) = N ( 1+ σ 2 ) 1+ σ 2 , x √ x 1+ σ 2
1.9 Feynman-Kac representation A k = S − 1 (] L k , + ∞ [) M K k ( x, dy ) = K ( x, dy ) � A k ( y ) + K ( x, A c k ) δ x ( dy ) 1 µ k ( dx ) = � A k ( x ) µ ( dx ) the normalized restriction of µ on A k µ ( A k ) µ k invariant by M K k X k Markov chain with initial distribution µ and transitions M K k For every test function ϕ , for k ∈ { 0 , . . . , n } , we have the following Feynman-Kac representation E [ ϕ ( X k ) � k � A j +1 ( X j )] j =0 µ k +1 ( ϕ ) = . E [ � k � A j +1 ( X j )] j =0
1.10 Algorithm • Initialization : Simulate an i.i.d. sample ξ 1 0 , . . . , ξ N 0 ∼ µ . p 1 = 1 � A 1 ( ξ 1 • Estimate ˆ � 0 ) N • Selection : ˆ ξ i 0 = ξ i 0 if S ( ξ i 0 ) > L 1 , else pick at random among the N 1 selected particles. • Mutation : ˜ 0 ∼ M (ˆ ξ i ξ i 0 , dx ′ ) and ˜ if S (˜ ξ i ξ i 1 ) > L 1 1 ξ i ∀ i ∈ { 1 , . . . , N } 1 = ˆ if S (˜ ξ i ξ i 1 ) ≤ L 1 1 • Consider next level and iterate until the rare event is reached
1.11 Algorithm p 1 = 4 ˆ 8 L 1
1.12 Algorithm L 1
1.13 Algorithm L 1
1.14 Algorithm L 1
1.15 Algorithm p 2 = 3 ˆ 8 L 1 L 2
1.16 Algorithm L 1 L 2
1.17 Implementation issues Choice of K Depends on the model: Metropolis-Hastings, Gibbs sampler, Gaussian case (zero-bit watermarking), i.i.d. on some random sites (Tardos codes) Trade-off between two drawbacks : • “shaking effect” too large : most proposed mutations are refused. • “shaking effect” too small : particles almost don’t move. Levels L k Adaptive levels with fixed rate of success: p 0 quantile on S ( ξ j k ) to set L k +1 , p 0 = 0 . 75 or 0 . 8 is a good choice Less dependent sample We can iterate the kernel M K k several times to improve the variability of the particles. From well known results on Metropolis-Hastings, the sample is getting more and more independent. Rule: iterate until 90 or 95% have actually moved to an accepted transition
1.18 Asymptotic variance Best achievable asymptotic variance: • Multilevel Estimator : ˆ α N = ˆ p 1 ˆ p 2 . . . ˆ p M . • Fluctuations : If the ˆ p i ’s are independent, then M √ N · ˆ α N − α 1 − p i L � − N →∞ N (0 , − − → ) . α p i i =1 • Constrained Minimization : M M 1 − p i � � arg min s.t. p i = α. p i p 1 ,...,p M i =1 i =1 • Optimum : p 1 = · · · = p M = α 1 /M .
1.19 Simulations : The Model detection region u • The model : X ∼ N (0 , I 20 ) . � ( � X,u � • Rare event : α = � X � > 0 . 95) . • Numerical computation : α = 4 . 704 · 10 − 11 . • Parameter : p = 3 / 4 � α = r × p M = . 83 × (3 / 4) 82 .
1.20 Numerical results credit: V. Bahuon 0 10 � 1 10 � 2 10 2 3 4 5 10 10 10 10 Number N of particles Figure 5: Relative standard deviation.
1.21 0 10 � 1 10 � 2 10 � 3 10 � 4 10 2 3 4 5 10 10 10 10 Number N of particles Figure 6: Relative bias.
1.22 Perspectives Find other similar applications (e.g. probabilistic counting algotihms in a large discrete set) Work in progress: non asymptotic variance estimates
Recommend
More recommend