Large Sample Robustness Bayes Nets with Incomplete Information Jim Smith and Ali Daneshkhah Universities of Warwick and Strathclyde Denmark PGM September 2008 Denmark PGM September 2008 1 / Jim Smith (Warwick) Robust Bayes Nets 30
Motivation We often worry about convergence of samplers etc. in a Bayesian analysis. How precise does the the prior on a BN have to be? In particular what is the overall e¤ect of local and global independence assumptions on a given model? What are the overall inferential implications of using standard priors like product Dirichlets or product logistics? In general how hard do I need to think about these issues a priori when I know I will collect a large sample? Denmark PGM September 2008 2 / Jim Smith (Warwick) Robust Bayes Nets 30
Messy Analyses Large BN - some expert knowledge incorporated. Nodes in our graph are systematically missing/ sample not random. Possible unidenti…ablity even taking account of aliasing as n ! ∞ θ 2 θ 3 θ 4 θ 5 # # # # θ 1 � � � � θ 6 # " % " % . � � � � � ! � ! � % % % # - θ 7 θ 8 θ 9 � θ 11 " θ 10 Denmark PGM September 2008 3 / Jim Smith (Warwick) Robust Bayes Nets 30
The Problems For a given prior only a numerical or algebraic approximation of posterior density. Just have approximate summary statistics (e.g. means, variances, sampled low dimensional margins, ...) Robustness issues: even for complete sampling. Variation distance d V ( f , g ) = R j f � g j between two posteriors can diverge quickly as sample size increases , especially when the parameter space is large with outliers (Dawid, 1973) and more generally (Gustafson and Wasserman,1995). So when and how are posterior inferences strongly in‡uenced by prior? Local De Robertis separations the key to addressing this issue! Denmark PGM September 2008 4 / Jim Smith (Warwick) Robust Bayes Nets 30
About LDR Local De Robertis (LDR) separations are easy to calculate and extend natural parametrizations in exponential families. Have an intriguing prior to posterior invariance property. BN factorization of a density implies linear relationships between clique marginal separations and joint. Bounds on the variation distance between two posterior distributions associated with di¤erent priors calculated explicitly as a function of prior LDR bounds and posterior statistics associated with the functioning prior. Bounds apply posterior to an observed likelihood, even when the sample density is misspeci…ed . Denmark PGM September 2008 5 / Jim Smith (Warwick) Robust Bayes Nets 30
Contents De Robertis local Separations Some Properties of Local De Robertis Separations Some useful Theorems concerning LDR and BNs. What this means for the robustness of BN’s Denmark PGM September 2008 6 / Jim Smith (Warwick) Robust Bayes Nets 30
The Setting Let g 0 , ( g n ) our genuine prior (posterior) density : f 0 , ( f n ) our functioning prior (posterior) density Default for Bayes f 0 often products of Dirichlets x n = ( x 1 , x 2 , . . . x n ) , n � 1. with observed sample densities f p n ( x n j θ ) g n � 1 , With missing data, typically these sample densities are typically f p n ( x n j θ ) g n � 1 (and hence f n and g n ) intractable f n therefore approximated either by drawing samples or algebraically. Denmark PGM September 2008 7 / Jim Smith (Warwick) Robust Bayes Nets 30
A Bayes Rule Identity Let Θ ( n ) = f θ 2 Θ : p ( x n j θ ) > 0 g For all θ 2 Θ ( n ) then log g n ( θ ) = log g 0 ( θ ) + log p n ( x n j θ ) � log p g ( x n ) log f n ( θ ) = log f 0 ( θ ) + log p n ( x n j θ ) � log p f ( x n ) where p g ( x n ) = R θ 2 Θ ( n ) p ( x n j θ ) g 0 ( θ ) d θ , p f ( x n ) = R θ 2 Θ ( n ) p ( x n j θ ) f 0 ( θ ) d θ , (When θ 2 Θ n Θ ( n ) set g n ( θ ) = f n ( θ ) = 0) So log f n ( θ ) � log g n ( θ ) = log f 0 ( θ ) � log g 0 ( θ ) + log p g ( x n ) � log p f ( x n ) Denmark PGM September 2008 8 / Jim Smith (Warwick) Robust Bayes Nets 30
From Bayes Rule to LDR For any subset A � Θ ( n ) let A ( f , g ) , sup d L ( log f ( θ ) � log g ( θ )) � inf φ 2 A ( log f ( φ ) � log g ( φ )) θ 2 A Then since log f n ( θ ) � log g n ( θ ) = log f 0 ( θ ) � log g 0 ( θ ) + log p g ( x n ) � log p f ( x n ) for any sequence f p ( x n j θ ) g n � 1 - however complicated - d L A ( f n , g n ) = d L A ( f 0 , g 0 ) Denmark PGM September 2008 9 / Jim Smith (Warwick) Robust Bayes Nets 30
Isoseparation d L A ( f n , g n ) = d L A ( f 0 , g 0 ) So for A � Θ ( n ) the posterior approximation of f n to g n is identical in quality to that of f 0 to g 0 . Denmark PGM September 2008 10 / Jim Smith (Warwick) Robust Bayes Nets 30
Isoseparation d L A ( f n , g n ) = d L A ( f 0 , g 0 ) So for A � Θ ( n ) the posterior approximation of f n to g n is identical in quality to that of f 0 to g 0 . When A = Θ ( n ) this property (De Robertis,1978) used for density ratio metrics and the speci…cation of neighbourhoods. Denmark PGM September 2008 10 / Jim Smith (Warwick) Robust Bayes Nets 30
Isoseparation d L A ( f n , g n ) = d L A ( f 0 , g 0 ) So for A � Θ ( n ) the posterior approximation of f n to g n is identical in quality to that of f 0 to g 0 . When A = Θ ( n ) this property (De Robertis,1978) used for density ratio metrics and the speci…cation of neighbourhoods. Trivially posterior distances between densities can be calculated e¤ortlessly from priors. Denmark PGM September 2008 10 / Jim Smith (Warwick) Robust Bayes Nets 30
Isoseparation d L A ( f n , g n ) = d L A ( f 0 , g 0 ) So for A � Θ ( n ) the posterior approximation of f n to g n is identical in quality to that of f 0 to g 0 . When A = Θ ( n ) this property (De Robertis,1978) used for density ratio metrics and the speci…cation of neighbourhoods. Trivially posterior distances between densities can be calculated e¤ortlessly from priors. Separation of two priors lying in standard families can usually be expressed explicitly and always explicitly bounded. Denmark PGM September 2008 10 / Jim Smith (Warwick) Robust Bayes Nets 30
Some notation We will be especially interested in small sets A . Let B ( µ ; ρ ) denote the open ball centred at µ = ( µ 1 , µ 2 , . . . , µ k ) and of radius ρ Let µ ; ρ ( f , g ) , d L d L B ( µ ; ρ ) ( f , g ) For any subset Θ 0 � Θ , let d L d L Θ 0 ; ρ ( f , g ) = sup µ ; ρ ( f , g ) µ 2 Θ 0 Obviously for any A � B ( µ ; ρ ) , µ 2 Θ 0 � Θ , d L A ( f , g ) � d L Θ 0 ; ρ ( f , g ) Denmark PGM September 2008 11 / Jim Smith (Warwick) Robust Bayes Nets 30
Separation of two Dirichlets . Let θ = ( θ 1 , θ 2 , . . . , θ k ) α = ( α 1 , α 2 , . . . , α k ) , θ i , α i > 0 , ∑ k i = 1 θ i = 1 Let f 0 ( θ j α f ) and g 0 ( θ j α g ) be Dirichlet( α ) so that k k θ α i , f � 1 θ α i , g � 1 f 0 ( θ j α f ) _ g 0 ( θ j α g ) _ ∏ ∏ , i i i = 1 i = 1 Let µ n = ( µ 1 , n , µ 2 , n , . . . , µ k , n ) be the mean of f n If ρ n < µ 0 n = min f µ n : 1 � i � k g � � � 1 α ( f 0 , g 0 ) d L µ 0 µ ; ρ n ( f 0 , g 0 ) � 2 k ρ n n � ρ n where k α ( f 0 , g 0 ) = k � 1 ∑ j α i , f � α i , g j i = 1 is the average distance between hyperparameters of f 0 and g 0 . Denmark PGM September 2008 12 / Jim Smith (Warwick) Robust Bayes Nets 30
Where Separations might be large � � � 1 k d L µ 0 ∑ µ ; ρ n ( f 0 , g 0 ) � 2 ρ n n � ρ n j α i , f � α i , g j i = 1 So d L µ ; ρ n ( f 0 , g 0 ) is uniformly bounded whenever µ n all away from 0 and converging approximately linearly in n . OTOH if f n tends to mass near a zero probability, then even when α ( f , g ) is small, it can be shown that at least some likelihoods will force the variation distance between the posterior densities to stay large for increasing n : Smith(2007). The smaller the smallest probability tended to the slower any convergence. Denmark PGM September 2008 13 / Jim Smith (Warwick) Robust Bayes Nets 30
BN’s with local and global independence If functioning prior f ( θ ) and genuine prior g ( θ ) factorize on subvectors f θ 1 , θ 2 , . . . θ k g so that k k ∏ ∏ f ( θ ) = f i ( θ i ) , g ( θ ) = g i ( θ i ) i = 1 i = 1 where f i ( θ i ) ( g i ( θ i ) ) are the functioning (genuine) margin on θ i , 1 � i � k , then (like K-L separations) k d L d L ∑ A ( f , g ) = A i ( f i , g i ) i = 1 So local prior distances grow linearly with no. of de…ning conditional probability vectors. Denmark PGM September 2008 14 / Jim Smith (Warwick) Robust Bayes Nets 30
Some conclusions BN’s with larger nos of edges intrinsically less stable However - like K-L - marginal densities are never more separated than their joint densities - so if a utility is only on a particular margin then these distances may be much less. Bayes Factors automatically select simpler models but note also inferences of a more complex model tends to be more sensitive to wrongly speci…ed priors. Denmark PGM September 2008 15 / Jim Smith (Warwick) Robust Bayes Nets 30
Recommend
More recommend