ˆ Etre bay´ esien quand on a trop de donn´ ees Pr´ ec´ ed´ e d’une introduction au mille-feuille CRIStAL emi Bardenet 1 R´ 1 CNRS & CRIStAL, Univ. Lille, France R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 1
emi ∈ SigMA ⊂ DatIng ⊂ CRIStAL ⊂ Univ. Lille R´ R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 2
emi ∈ SigMA ⊂ DatIng ⊂ CRIStAL ⊂ Univ. Lille R´ R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 2
emi ∈ SigMA ⊂ DatIng ⊂ CRIStAL ⊂ Univ. Lille R´ ◮ 222 permanents dont 22 CNRS et 27 Inria. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 2
emi ∈ SigMA ⊂ DatIng ⊂ CRIStAL ⊂ Univ. Lille R´ ◮ ∼ 40 permanents. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 2
emi ∈ SigMA ⊂ DatIng ⊂ CRIStAL ⊂ Univ. Lille R´ ◮ 13 permanents. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 2
emi ∈ SigMA ⊂ DatIng ⊂ CRIStAL ⊂ Univ. Lille R´ R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 2
emi ∈ SigMA ⊂ DatIng ⊂ CRIStAL ⊂ Univ. Lille R´ R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 2
Bayesian inference ◮ A biologist decides on ◮ a likelihood p ( x | θ ), ◮ a prior p ( θ ), ◮ Then he has implicitely decided on ◮ a posterior p ( θ | x ) = p ( x | θ ) p ( θ ) . Z ◮ Bayesian inference is all about computing integrals � h ( θ ) p ( θ | x ) d θ. ◮ MCMC samples an ergodic Markov chain ( θ t ) t =1 ,..., T with stationary distribution p ( ·| θ ), so that when T → ∞ , √ � T � 1 � d � T →∞ N (0 , σ 2 ) . h ( θ t ) − h ( θ ) p ( θ | x ) d θ − − − − → T T t =1 ◮ Sampling ( θ t ) t =1 ,..., T requires T likelihood evaluations. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 3
Bayesian inference ◮ A biologist decides on ◮ a likelihood p ( x | θ ), ◮ a prior p ( θ ), ◮ Then he has implicitely decided on ◮ a posterior p ( θ | x ) = p ( x | θ ) p ( θ ) . Z ◮ Bayesian inference is all about computing integrals � h ( θ ) p ( θ | x ) d θ. ◮ MCMC samples an ergodic Markov chain ( θ t ) t =1 ,..., T with stationary distribution p ( ·| θ ), so that when T → ∞ , √ � T � 1 � d � T →∞ N (0 , σ 2 ) . h ( θ t ) − h ( θ ) p ( θ | x ) d θ − − − − → T T t =1 ◮ Sampling ( θ t ) t =1 ,..., T requires T likelihood evaluations. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 3
Bayesian inference ◮ A biologist decides on ◮ a likelihood p ( x | θ ), ◮ a prior p ( θ ), ◮ Then he has implicitely decided on ◮ a posterior p ( θ | x ) = p ( x | θ ) p ( θ ) . Z ◮ Bayesian inference is all about computing integrals � h ( θ ) p ( θ | x ) d θ. ◮ MCMC samples an ergodic Markov chain ( θ t ) t =1 ,..., T with stationary distribution p ( ·| θ ), so that when T → ∞ , √ � T � 1 � d � T →∞ N (0 , σ 2 ) . h ( θ t ) − h ( θ ) p ( θ | x ) d θ − − − − → T T t =1 ◮ Sampling ( θ t ) t =1 ,..., T requires T likelihood evaluations. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 3
Bayesian inference ◮ A biologist decides on ◮ a likelihood p ( x | θ ), ◮ a prior p ( θ ), ◮ Then he has implicitely decided on ◮ a posterior p ( θ | x ) = p ( x | θ ) p ( θ ) . Z ◮ Bayesian inference is all about computing integrals � h ( θ ) p ( θ | x ) d θ. ◮ MCMC samples an ergodic Markov chain ( θ t ) t =1 ,..., T with stationary distribution p ( ·| θ ), so that when T → ∞ , √ � T � 1 � d � T →∞ N (0 , σ 2 ) . h ( θ t ) − h ( θ ) p ( θ | x ) d θ − − − − → T T t =1 ◮ Sampling ( θ t ) t =1 ,..., T requires T likelihood evaluations. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 3
Bayesian inference ◮ A biologist decides on ◮ a likelihood p ( x | θ ), ◮ a prior p ( θ ), ◮ Then he has implicitely decided on ◮ a posterior p ( θ | x ) = p ( x | θ ) p ( θ ) . Z ◮ Bayesian inference is all about computing integrals � h ( θ ) p ( θ | x ) d θ. ◮ MCMC samples an ergodic Markov chain ( θ t ) t =1 ,..., T with stationary distribution p ( ·| θ ), so that when T → ∞ , √ � T � 1 � d � T →∞ N (0 , σ 2 ) . h ( θ t ) − h ( θ ) p ( θ | x ) d θ − − − − → T T t =1 ◮ Sampling ( θ t ) t =1 ,..., T requires T likelihood evaluations. R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 3
Tall data ◮ Assume data are independent conditional on θ , n � p ( x | θ ) = p ( x i | θ ) i =1 . ◮ Can you get the same central limit theorem while never evaluating all terms in the product? ◮ Yes [1], sometimes using o ( n ) datapoints per iteration! [2] ◮ Unanswered yet: What is the equivalent of stochastic gradient for integration? R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 4
Tall data ◮ Assume data are independent conditional on θ , n � p ( x | θ ) = p ( x i | θ ) i =1 . ◮ Can you get the same central limit theorem while never evaluating all terms in the product? ◮ Yes [1], sometimes using o ( n ) datapoints per iteration! [2] ◮ Unanswered yet: What is the equivalent of stochastic gradient for integration? R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 4
Metropolis-Hastings � p ( x | θ ) , p ( θ ) , q ( θ ′ | θ ) , θ 0 , N iter , X � MH 1 for k ← 1 to N iter 2 θ ← θ k − 1 θ ′ ∼ q ( . | θ ) , u ∼ U (0 , 1) , 3 � n i =1 p ( x i | θ ′ ) p ( θ ′ ) q ( θ | θ ′ ) 4 α = � n i =1 p ( x i | θ ) p ( θ ) q ( θ ′ | θ ) 5 if u < α θ k ← θ ′ 6 ⊲ Accept 7 else θ k ← θ ⊲ Reject 8 return ( θ k ) k =1 ,..., N iter R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 5
Metropolis-Hastings � p ( x | θ ) , p ( θ ) , q ( θ ′ | θ ) , θ 0 , N iter , X � MH 1 for k ← 1 to N iter 2 θ ← θ k − 1 θ ′ ∼ q ( . | θ ) , u ∼ U (0 , 1) , 3 � n i =1 p ( x i | θ ′ ) p ( θ ′ ) q ( θ | θ ′ ) 4 α = � n i =1 p ( x i | θ ) p ( θ ) q ( θ ′ | θ ) 5 if u < α θ k ← θ ′ 6 ⊲ Accept 7 else θ k ← θ ⊲ Reject 8 return ( θ k ) k =1 ,..., N iter R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 5
Metropolis-Hastings � p ( x | θ ) , p ( θ ) , q ( θ ′ | θ ) , θ 0 , N iter , X � MH 1 for k ← 1 to N iter 2 θ ← θ k − 1 θ ′ ∼ q ( . | θ ) , u ∼ U (0 , 1) , 3 � n i =1 p ( x i | θ ′ ) p ( θ ′ ) q ( θ | θ ′ ) 4 α = � n i =1 p ( x i | θ ) p ( θ ) q ( θ ′ | θ ) 5 if u < α θ k ← θ ′ 6 ⊲ Accept 7 else θ k ← θ ⊲ Reject 8 return ( θ k ) k =1 ,..., N iter R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 5
Metropolis-Hastings � p ( x | θ ) , p ( θ ) , q ( θ ′ | θ ) , θ 0 , N iter , X � MH 1 for k ← 1 to N iter 2 θ ← θ k − 1 θ ′ ∼ q ( . | θ ) , u ∼ U (0 , 1) , 3 � n i =1 p ( x i | θ ′ ) p ( θ ′ ) q ( θ | θ ′ ) 4 α = � n i =1 p ( x i | θ ) p ( θ ) q ( θ ′ | θ ) 5 if u < α θ k ← θ ′ 6 ⊲ Accept 7 else θ k ← θ ⊲ Reject 8 return ( θ k ) k =1 ,..., N iter R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 5
Metropolis-Hastings � p ( x | θ ) , p ( θ ) , q ( θ ′ | θ ) , θ 0 , N iter , X � MH 1 for k ← 1 to N iter 2 θ ← θ k − 1 θ ′ ∼ q ( . | θ ) , u ∼ U (0 , 1) , 3 � n i =1 p ( x i | θ ′ ) p ( θ ′ ) q ( θ | θ ′ ) 4 α = � n i =1 p ( x i | θ ) p ( θ ) q ( θ ′ | θ ) 5 if u < α θ k ← θ ′ 6 ⊲ Accept 7 else θ k ← θ ⊲ Reject 8 return ( θ k ) k =1 ,..., N iter R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 5
Subsampling approaches p ( x | θ ) , p ( θ ) , q ( θ ′ | θ ) , θ 0 , N iter , X � � MH 1 for k ← 1 to N iter 2 θ ← θ k − 1 θ ′ ∼ q ( . | θ ) , u ∼ U (0 , 1) , 3 � n i =1 p ( x i | θ ′ ) p ( θ ′ ) q ( θ | θ ′ ) 4 α = � n q ( θ ′ | θ ) i =1 p ( x i | θ ) p ( θ ) 5 if u < α θ k ← θ ′ 6 ⊲ Accept 7 else θ k ← θ ⊲ Reject 8 return ( θ k ) k =1 ,..., N iter R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 6
Subsampling approaches p ( x | θ ) , p ( θ ) , q ( θ ′ | θ ) , θ 0 , N iter , X � � MH 1 for k ← 1 to N iter 2 θ ← θ k − 1 θ ′ ∼ q ( . | θ ) , u ∼ U (0 , 1) , 3 � � u p ( θ ) q ( θ ′ | θ ) ψ ( u , θ, θ ′ ) ← 1 4 n log p ( θ ′ ) q ( θ | θ ′ ) � � p ( x i | θ ′ ) � n Λ n ( θ, θ ′ ) ← 1 5 i =1 log n p ( x i | θ ) if Λ n ( θ, θ ′ ) > ψ ( u , θ, θ ′ ) 6 θ k ← θ ′ 7 ⊲ Accept 8 else θ k ← θ ⊲ Reject 9 return ( θ k ) k =1 ,..., N iter ◮ Can we use t � p ( x ∗ i | θ ′ ) t ( θ, θ ′ ) = 1 � Λ ∗ � log ? p ( x ∗ t i | θ ) i =1 R´ emi Bardenet CRIStAL, DatIng, SigMA, Bayes et les tall data 6
Recommend
More recommend