bayesian non parametric inference of discrete valued
play

Bayesian non parametric inference of discrete valued networks L. - PowerPoint PPT Presentation

Bayesian non parametric inference of discrete valued networks L. Nouedoui, P . Latouche Universit e Paris 1 Panth eon-Sorbonne Laboratoire SAMM ESANN 13 L. Nouedoui, P . Latouche 1 Contents Introduction Real networks Graph


  1. Bayesian non parametric inference of discrete valued networks L. Nouedoui, P . Latouche Universit´ e Paris 1 Panth´ eon-Sorbonne Laboratoire SAMM ESANN 13 L. Nouedoui, P . Latouche 1

  2. Contents Introduction Real networks Graph clustering Stochastic block models The model Poisson mixture model Infinite Poisson mixture model Chinese restaurant process Inference Experiments L. Nouedoui, P . Latouche 2

  3. Real networks Subset of the yeast transcriptional regulatory network (Milo et al., 2002). L. Nouedoui, P . Latouche 3

  4. Graph clustering ◮ Existing methods look for : ◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure L. Nouedoui, P . Latouche 4

  5. Graph clustering ◮ Existing methods look for : ◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure L. Nouedoui, P . Latouche 4

  6. Graph clustering ◮ Existing methods look for : ◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure L. Nouedoui, P . Latouche 4

  7. Graph clustering ◮ Existing methods look for : ◮ Community structure ◮ Disassortative mixing ◮ Heterogeneous structure L. Nouedoui, P . Latouche 4

  8. Stochastic Block Model (SBM) ◮ Nowicki and Snijders (2001) ◮ Earlier work : Govaert et al. (1977) ◮ Z i independent hidden variables : � � ◮ Z i ∼ M 1 , α = ( α 1 , α 2 , . . . , α K ) ◮ Z ik = 1 : vertex i belongs to class k ◮ X | Z edges drawn independently : X ij |{ Z ik Z jl = 1 } ∼ B ( π kl ) ◮ A mixture model for graphs : K K � � X ij ∼ α k α l B ( π kl ) k =1 l =1 L. Nouedoui, P . Latouche 5

  9. π •• 6 6 π •• 5 5 7 7 3 π •• 4 4 8 8 1 2 π •• 10 π •• 9 Approximations Gibbs : Nowicki and Snijders (2001) VEM : Daudin et al. (2008) VBEM : Latouche et al. (2012) L. Nouedoui, P . Latouche 6

  10. π •• 6 6 π •• 5 5 7 7 3 π •• 4 4 8 8 1 2 π •• 10 π •• 9 Approximations Gibbs : Nowicki and Snijders (2001) VEM : Daudin et al. (2008) VBEM : Latouche et al. (2012) L. Nouedoui, P . Latouche 6

  11. Poisson mixture model for networks ◮ Many networks have discrete edges ◮ Extension of SBM to discrete edges : X ij ∈ N ◮ X ij |{ Z ik Z jl = 1 } ∼ P ( λ kl ) ◮ Poisson mixture model (PM) (Mariadassou et al., 2010) ◮ Inference : VEM + ICL ICL ◮ Based on a Laplace (asymptotic) approximation ◮ Problem for small sample sizes L. Nouedoui, P . Latouche 7

  12. Poisson mixture model for networks ◮ Many networks have discrete edges ◮ Extension of SBM to discrete edges : X ij ∈ N ◮ X ij |{ Z ik Z jl = 1 } ∼ P ( λ kl ) ◮ Poisson mixture model (PM) (Mariadassou et al., 2010) ◮ Inference : VEM + ICL ICL ◮ Based on a Laplace (asymptotic) approximation ◮ Problem for small sample sizes L. Nouedoui, P . Latouche 7

  13. Chinese restaurant process ◮ Non parametric prior for PM ◮ Each class attracts new data points depending on its current size ◮ Assume they are the m − 1 observations classified ◮ A new data point is assigned to ◮ class k with probability ∝ n k ◮ a new class with probability ∝ η 0 ◮ Exchangeable distribution L. Nouedoui, P . Latouche 8

  14. Chinese restaurant process ◮ Stick-Breaking prior ◮ β k ∼ Beta (1; η 0 ) , ∀ k ◮ α 1 = β 1 � k − 1 ◮ α k = β k l =1 (1 − β l ) ◮ Z i | α ∼ M (1 , α ) ◮ Conjugate prior ◮ λ kl | a, b ∼ Gamma ( a, b ) ◮ Choice for the hyperparameters a and b L. Nouedoui, P . Latouche 9

  15. Gibbs sampling ◮ p ( Z , α , λ | X ) not tractable ◮ Gibbs sampling procedure : ◮ β ∼ p ( β | X , Z , λ ) then compute α ◮ Z i ∼ p ( Z i | X , Z \ i , α , λ ) ◮ λ ∼ p ( λ | X , Z , α ) ◮ Start with K = K up classes ◮ Some classes get empty during the algorithm ◮ Number of non empty classes : estimate of K L. Nouedoui, P . Latouche 10

  16. Experiments ◮ Simulate networks ◮ N = 50 , 100 , 500 , 1000 ◮ K = 3 ◮ Unbalanced proportions : α k ∝ (1 / 2) k ◮ α = (80 . 6 , 16 . 1 , 3 . 3) ′ and λ kl = (1 / 2) λ ◮ λ kl = λ ′ L. Nouedoui, P . Latouche 11

  17. Experiments � � � Network size Model K n = 3 K n = 2 K n = 4 IPM 0.59 0.41 0.00 N = 50 PM 0.17 0.82 0.01 IPM 0.96 0.04 0.00 N = 100 PM 0.90 0.07 0.03 IPM 1.00 0.00 0.00 N = 500 PM 1.00 0.00 0.00 N = 1000 IPM 1.00 0.00 0.00 PM 1.00 0.00 0.00 L. Nouedoui, P . Latouche 12

  18. Real data : Zachary on UCINET Mr. Hi John A. L. Nouedoui, P . Latouche 13

  19. References ◮ K. Nowicki and T.A.B. Snijders (2001), Estimation and prediction for stochastic blockstructures. 96, 1077-1087 ◮ E.M. Airoldi, D.M. Blei, S.E. Fienberg, E.P . Xing (2008), Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9, 1981-2014 ◮ J-J. Daudin, F . Picard et S. Robin (2008), A mixture model for random graphs. Statistics and Computing, 18, 2, 151-171 ◮ P . Latouche, E. Birmel´ e, C. Ambroise (2011), Overlapping stochastic block models with application to the French political blogosphere network. Annals of Applied Statistics, 5, 1, 309-336 ◮ P . Latouche, E. Birmel´ e, C. Ambroise (2012), Variational Bayesian inference and complexity control for stochastic block models. Statistical Modelling, 12, 1, 93-115 L. Nouedoui, P . Latouche 14

  20. Maximum likelihood estimation ◮ Log-likelihoods of the model : ◮ Observed-data : log p ( X | α , Π ) = log { � Z p ( X , Z | α , Π ) } → K N terms ֒ ◮ Expectation Maximization (EM) algorithm requires the knowledge of p ( Z | X , α , Π ) Problem p ( Z | X , α , Π ) is not tractable (no conditional independence) Approximations Gibbs : Nowicki and Snijders (2001) VEM : Daudin et al. (2008) VBEM : Latouche et al. (2012) L. Nouedoui, P . Latouche 15

  21. Maximum likelihood estimation ◮ Log-likelihoods of the model : ◮ Observed-data : log p ( X | α , Π ) = log { � Z p ( X , Z | α , Π ) } → K N terms ֒ ◮ Expectation Maximization (EM) algorithm requires the knowledge of p ( Z | X , α , Π ) Problem p ( Z | X , α , Π ) is not tractable (no conditional independence) Approximations Gibbs : Nowicki and Snijders (2001) VEM : Daudin et al. (2008) VBEM : Latouche et al. (2012) L. Nouedoui, P . Latouche 15

  22. Maximum likelihood estimation ◮ Log-likelihoods of the model : ◮ Observed-data : log p ( X | α , Π ) = log { � Z p ( X , Z | α , Π ) } → K N terms ֒ ◮ Expectation Maximization (EM) algorithm requires the knowledge of p ( Z | X , α , Π ) Problem p ( Z | X , α , Π ) is not tractable (no conditional independence) Approximations Gibbs : Nowicki and Snijders (2001) VEM : Daudin et al. (2008) VBEM : Latouche et al. (2012) L. Nouedoui, P . Latouche 15

  23. Model selection Criteria Since log p ( X | α , Π ) is not tractable, we cannot rely on: α , ˆ ◮ AIC = log p ( X | ˆ Π ) − C α , ˆ 2 log N ( N − 1) Π ) − C ◮ BIC = log p ( X | ˆ 2 ICL Biernacki et al. (2000) ֒ → Daudin et al. (2008) Variational Bayes EM ֒ → ILvb Latouche et al. (2012) Exact ICL ֒ → ICL ex Cˆ ome and Latouche (2013) L. Nouedoui, P . Latouche 16

  24. Model selection Criteria Since log p ( X | α , Π ) is not tractable, we cannot rely on: α , ˆ ◮ AIC = log p ( X | ˆ Π ) − C α , ˆ 2 log N ( N − 1) Π ) − C ◮ BIC = log p ( X | ˆ 2 ICL Biernacki et al. (2000) ֒ → Daudin et al. (2008) Variational Bayes EM ֒ → ILvb Latouche et al. (2012) Exact ICL ֒ → ICL ex Cˆ ome and Latouche (2013) L. Nouedoui, P . Latouche 16

  25. Model selection Criteria Since log p ( X | α , Π ) is not tractable, we cannot rely on: α , ˆ ◮ AIC = log p ( X | ˆ Π ) − C α , ˆ 2 log N ( N − 1) Π ) − C ◮ BIC = log p ( X | ˆ 2 ICL Biernacki et al. (2000) ֒ → Daudin et al. (2008) Variational Bayes EM ֒ → ILvb Latouche et al. (2012) Exact ICL ֒ → ICL ex Cˆ ome and Latouche (2013) L. Nouedoui, P . Latouche 16

Recommend


More recommend