large sample robustness bayes nets with incomplete
play

Large Sample Robustness Bayes Nets with Incomplete Information Jim - PowerPoint PPT Presentation

Large Sample Robustness Bayes Nets with Incomplete Information Jim Smith and Ali Daneshkhah Universities of Warwick and Strathclyde Denmark PGM September 2008 Denmark PGM September 2008 1 / Jim Smith (Warwick) Robust Bayes Nets 30


  1. Large Sample Robustness Bayes Nets with Incomplete Information Jim Smith and Ali Daneshkhah Universities of Warwick and Strathclyde Denmark PGM September 2008 Denmark PGM September 2008 1 / Jim Smith (Warwick) Robust Bayes Nets 30

  2. Motivation We often worry about convergence of samplers etc. in a Bayesian analysis. How precise does the the prior on a BN have to be? In particular what is the overall e¤ect of local and global independence assumptions on a given model? What are the overall inferential implications of using standard priors like product Dirichlets or product logistics? In general how hard do I need to think about these issues a priori when I know I will collect a large sample? Denmark PGM September 2008 2 / Jim Smith (Warwick) Robust Bayes Nets 30

  3. Messy Analyses Large BN - some expert knowledge incorporated. Nodes in our graph are systematically missing/ sample not random. Possible unidenti…ablity even taking account of aliasing as n ! ∞ θ 2 θ 3 θ 4 θ 5 # # # # θ 1 � � � � θ 6 # " % " % . � � � � � ! � ! � % % % # - θ 7 θ 8 θ 9 � θ 11 " θ 10 Denmark PGM September 2008 3 / Jim Smith (Warwick) Robust Bayes Nets 30

  4. The Problems For a given prior only a numerical or algebraic approximation of posterior density. Just have approximate summary statistics (e.g. means, variances, sampled low dimensional margins, ...) Robustness issues: even for complete sampling. Variation distance d V ( f , g ) = R j f � g j between two posteriors can diverge quickly as sample size increases , especially when the parameter space is large with outliers (Dawid, 1973) and more generally (Gustafson and Wasserman,1995). So when and how are posterior inferences strongly in‡uenced by prior? Local De Robertis separations the key to addressing this issue! Denmark PGM September 2008 4 / Jim Smith (Warwick) Robust Bayes Nets 30

  5. About LDR Local De Robertis (LDR) separations are easy to calculate and extend natural parametrizations in exponential families. Have an intriguing prior to posterior invariance property. BN factorization of a density implies linear relationships between clique marginal separations and joint. Bounds on the variation distance between two posterior distributions associated with di¤erent priors calculated explicitly as a function of prior LDR bounds and posterior statistics associated with the functioning prior. Bounds apply posterior to an observed likelihood, even when the sample density is misspeci…ed . Denmark PGM September 2008 5 / Jim Smith (Warwick) Robust Bayes Nets 30

  6. Contents De Robertis local Separations Some Properties of Local De Robertis Separations Some useful Theorems concerning LDR and BNs. What this means for the robustness of BN’s Denmark PGM September 2008 6 / Jim Smith (Warwick) Robust Bayes Nets 30

  7. The Setting Let g 0 , ( g n ) our genuine prior (posterior) density : f 0 , ( f n ) our functioning prior (posterior) density Default for Bayes f 0 often products of Dirichlets x n = ( x 1 , x 2 , . . . x n ) , n � 1. with observed sample densities f p n ( x n j θ ) g n � 1 , With missing data, typically these sample densities are typically f p n ( x n j θ ) g n � 1 (and hence f n and g n ) intractable f n therefore approximated either by drawing samples or algebraically. Denmark PGM September 2008 7 / Jim Smith (Warwick) Robust Bayes Nets 30

  8. A Bayes Rule Identity Let Θ ( n ) = f θ 2 Θ : p ( x n j θ ) > 0 g For all θ 2 Θ ( n ) then log g n ( θ ) = log g 0 ( θ ) + log p n ( x n j θ ) � log p g ( x n ) log f n ( θ ) = log f 0 ( θ ) + log p n ( x n j θ ) � log p f ( x n ) where p g ( x n ) = R θ 2 Θ ( n ) p ( x n j θ ) g 0 ( θ ) d θ , p f ( x n ) = R θ 2 Θ ( n ) p ( x n j θ ) f 0 ( θ ) d θ , (When θ 2 Θ n Θ ( n ) set g n ( θ ) = f n ( θ ) = 0) So log f n ( θ ) � log g n ( θ ) = log f 0 ( θ ) � log g 0 ( θ ) + log p g ( x n ) � log p f ( x n ) Denmark PGM September 2008 8 / Jim Smith (Warwick) Robust Bayes Nets 30

  9. From Bayes Rule to LDR For any subset A � Θ ( n ) let A ( f , g ) , sup d L ( log f ( θ ) � log g ( θ )) � inf φ 2 A ( log f ( φ ) � log g ( φ )) θ 2 A Then since log f n ( θ ) � log g n ( θ ) = log f 0 ( θ ) � log g 0 ( θ ) + log p g ( x n ) � log p f ( x n ) for any sequence f p ( x n j θ ) g n � 1 - however complicated - d L A ( f n , g n ) = d L A ( f 0 , g 0 ) Denmark PGM September 2008 9 / Jim Smith (Warwick) Robust Bayes Nets 30

  10. Isoseparation d L A ( f n , g n ) = d L A ( f 0 , g 0 ) So for A � Θ ( n ) the posterior approximation of f n to g n is identical in quality to that of f 0 to g 0 . Denmark PGM September 2008 10 / Jim Smith (Warwick) Robust Bayes Nets 30

  11. Isoseparation d L A ( f n , g n ) = d L A ( f 0 , g 0 ) So for A � Θ ( n ) the posterior approximation of f n to g n is identical in quality to that of f 0 to g 0 . When A = Θ ( n ) this property (De Robertis,1978) used for density ratio metrics and the speci…cation of neighbourhoods. Denmark PGM September 2008 10 / Jim Smith (Warwick) Robust Bayes Nets 30

  12. Isoseparation d L A ( f n , g n ) = d L A ( f 0 , g 0 ) So for A � Θ ( n ) the posterior approximation of f n to g n is identical in quality to that of f 0 to g 0 . When A = Θ ( n ) this property (De Robertis,1978) used for density ratio metrics and the speci…cation of neighbourhoods. Trivially posterior distances between densities can be calculated e¤ortlessly from priors. Denmark PGM September 2008 10 / Jim Smith (Warwick) Robust Bayes Nets 30

  13. Isoseparation d L A ( f n , g n ) = d L A ( f 0 , g 0 ) So for A � Θ ( n ) the posterior approximation of f n to g n is identical in quality to that of f 0 to g 0 . When A = Θ ( n ) this property (De Robertis,1978) used for density ratio metrics and the speci…cation of neighbourhoods. Trivially posterior distances between densities can be calculated e¤ortlessly from priors. Separation of two priors lying in standard families can usually be expressed explicitly and always explicitly bounded. Denmark PGM September 2008 10 / Jim Smith (Warwick) Robust Bayes Nets 30

  14. Some notation We will be especially interested in small sets A . Let B ( µ ; ρ ) denote the open ball centred at µ = ( µ 1 , µ 2 , . . . , µ k ) and of radius ρ Let µ ; ρ ( f , g ) , d L d L B ( µ ; ρ ) ( f , g ) For any subset Θ 0 � Θ , let d L d L Θ 0 ; ρ ( f , g ) = sup µ ; ρ ( f , g ) µ 2 Θ 0 Obviously for any A � B ( µ ; ρ ) , µ 2 Θ 0 � Θ , d L A ( f , g ) � d L Θ 0 ; ρ ( f , g ) Denmark PGM September 2008 11 / Jim Smith (Warwick) Robust Bayes Nets 30

  15. Separation of two Dirichlets . Let θ = ( θ 1 , θ 2 , . . . , θ k ) α = ( α 1 , α 2 , . . . , α k ) , θ i , α i > 0 , ∑ k i = 1 θ i = 1 Let f 0 ( θ j α f ) and g 0 ( θ j α g ) be Dirichlet( α ) so that k k θ α i , f � 1 θ α i , g � 1 f 0 ( θ j α f ) _ g 0 ( θ j α g ) _ ∏ ∏ , i i i = 1 i = 1 Let µ n = ( µ 1 , n , µ 2 , n , . . . , µ k , n ) be the mean of f n If ρ n < µ 0 n = min f µ n : 1 � i � k g � � � 1 α ( f 0 , g 0 ) d L µ 0 µ ; ρ n ( f 0 , g 0 ) � 2 k ρ n n � ρ n where k α ( f 0 , g 0 ) = k � 1 ∑ j α i , f � α i , g j i = 1 is the average distance between hyperparameters of f 0 and g 0 . Denmark PGM September 2008 12 / Jim Smith (Warwick) Robust Bayes Nets 30

  16. Where Separations might be large � � � 1 k d L µ 0 ∑ µ ; ρ n ( f 0 , g 0 ) � 2 ρ n n � ρ n j α i , f � α i , g j i = 1 So d L µ ; ρ n ( f 0 , g 0 ) is uniformly bounded whenever µ n all away from 0 and converging approximately linearly in n . OTOH if f n tends to mass near a zero probability, then even when α ( f , g ) is small, it can be shown that at least some likelihoods will force the variation distance between the posterior densities to stay large for increasing n : Smith(2007). The smaller the smallest probability tended to the slower any convergence. Denmark PGM September 2008 13 / Jim Smith (Warwick) Robust Bayes Nets 30

  17. BN’s with local and global independence If functioning prior f ( θ ) and genuine prior g ( θ ) factorize on subvectors f θ 1 , θ 2 , . . . θ k g so that k k ∏ ∏ f ( θ ) = f i ( θ i ) , g ( θ ) = g i ( θ i ) i = 1 i = 1 where f i ( θ i ) ( g i ( θ i ) ) are the functioning (genuine) margin on θ i , 1 � i � k , then (like K-L separations) k d L d L ∑ A ( f , g ) = A i ( f i , g i ) i = 1 So local prior distances grow linearly with no. of de…ning conditional probability vectors. Denmark PGM September 2008 14 / Jim Smith (Warwick) Robust Bayes Nets 30

  18. Some conclusions BN’s with larger nos of edges intrinsically less stable However - like K-L - marginal densities are never more separated than their joint densities - so if a utility is only on a particular margin then these distances may be much less. Bayes Factors automatically select simpler models but note also inferences of a more complex model tends to be more sensitive to wrongly speci…ed priors. Denmark PGM September 2008 15 / Jim Smith (Warwick) Robust Bayes Nets 30

Recommend


More recommend