Statistical inference of network structure Part 2 Tiago P. Peixoto University of Bath Berlin, August 2017
Weighted graphs C. Aicher et al. Journal of Complex Networks 3(2), 221-248 (2015); T.P.P arXiv: 1708.01432 Adjacency: A ij ∈ { 0 , 1 } or N Weights: x ij ∈ N or R SBMs with edge covariates: P ( A , x | θ , γ , b ) = P ( x | A , γ , b ) P ( A | θ , b ) Adjacency: e − λ bi,bj κ i κ j ( λ b i ,bj κ i κ j ) A ij � P ( A | θ = { λ , κ } , b ) = , A ij ! i<j Edge covariates: � P ( x | A , γ , b ) = P ( x rs | γ rs ) r ≤ s P ( x | γ ) → Exponential, Normal, Geometric, Binomial, Poisson, . . .
Weighted graphs T.P.P arXiv: 1708.01432 Nonparametric Bayesian approach P ( b | A , x ) = P ( A , x | b ) P ( b ) , P ( A , x ) Marginal likelihood: � P ( A , x | b ) = P ( A , x | θ , γ , b ) P ( θ ) P ( γ ) d θ d γ = P ( A | b ) P ( x | A , b ) , Adjacency part (unweighted): � P ( A | b ) = P ( A | θ , b ) P ( θ ) d θ Weights part: � P ( x | A , b ) = P ( x | A , γ , b ) P ( γ ) d γ � � = P ( x rs | γ rs ) P ( γ rs ) d γ rs r ≤ s
UN Migrations
UN Migrations 10 − 1 SBM fit with geometric weights 10 − 2 Geometric distribution fit 10 − 3 10 − 4 Probability 10 − 5 10 − 6 10 − 7 10 − 8 10 − 9 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Migrations
Votes in congress O p p o s i t i o n 0 . 8 0 . 6 Vote correlation Deputy 0 . 4 0 . 2 t n e 0 . 0 m n r e v o G Deputy SBM fit on original data 4 SBM fit on shuffled data Probability density 3 2 1 0 − 0 . 2 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Vote correlation
Human connectome Right hemisphere SBM fit 10 0 Probability density 10 − 2 10 − 4 10 − 6 10 − 8 10 − 1 10 0 10 1 10 2 Electrical connectivity (mm − 1 ) 5 SBM fit 4 Probability density 3 2 1 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 Fractional anisotropy (dimensionless) Left hemisphere
Overlapping groups c) (Palla et al 2005)
Overlapping groups c) (Palla et al 2005)
Overlapping groups c) (Palla et al 2005) ◮ Number of nonoverlapping partitions: B N ◮ Number of overlapping partitions: 2 BN
Overlapping groups c) (Palla et al 2005) ◮ Number of nonoverlapping partitions: B N ◮ Number of overlapping partitions: 2 BN
Group overlap A ij e − λ ij λ e − λ ii / 2 ( λ ii / 2) A ii / 2 � � � ij P ( A | κ , λ ) = × , λ ij = κ ir λ rs κ js A ij ! A ii / 2! i<j i rs � � G rs Labelled half-edges: A ij = ij , P ( A | κ , λ ) = P ( G | κ , λ ) rs G
Group overlap A ij e − λ ij λ e − λ ii / 2 ( λ ii / 2) A ii / 2 � � � ij P ( A | κ , λ ) = × , λ ij = κ ir λ rs κ js A ij ! A ii / 2! i<j i rs � � G rs Labelled half-edges: A ij = ij , P ( A | κ , λ ) = P ( G | κ , λ ) rs G � P ( G | κ , λ ) P ( κ ) P ( λ | ¯ P ( G ) = λ ) d κ d λ , � r<s e rs ! � λ E ¯ r e rr !! ( N − 1)! � � k r = ii !! × ( e r + N − 1)! × i ! , � � ij ! � (¯ i<j G rs i G rs λ + 1) E + B ( B +1) / 2 rs r ir
Group overlap A ij e − λ ij λ e − λ ii / 2 ( λ ii / 2) A ii / 2 � � � ij P ( A | κ , λ ) = × , λ ij = κ ir λ rs κ js A ij ! A ii / 2! i<j i rs � � G rs Labelled half-edges: A ij = ij , P ( A | κ , λ ) = P ( G | κ , λ ) rs G � P ( G | κ , λ ) P ( κ ) P ( λ | ¯ P ( G ) = λ ) d κ d λ , � r<s e rs ! � λ E ¯ r e rr !! ( N − 1)! � � k r = ii !! × ( e r + N − 1)! × i ! , � � ij ! � (¯ i<j G rs i G rs λ + 1) E + B ( B +1) / 2 rs r ir Microcanonical equivalence: P ( G ) = P ( G | k , e ) P ( k | e ) P ( e ) , � r<s e rs ! � r e rr !! � ir k r i ! P ( G | k , e ) = r e r ! , � � ij ! � ii !! � i<j G rs i G rs rs � � e r � � − 1 � P ( k | e ) = N r
Overlap vs. non-overlap Social “ego” network (from Facebook) 4 6 3 4 n k 2 n k 2 1 0 0 0 8 16 24 0 5 10 15 20 k k 3 3 2 2 n k n k 1 1 0 0 3 6 9 4 8 12 16 k k B = 4 , Λ ≃ 0 . 053
Overlap vs. non-overlap Social “ego” network (from Facebook) 4 6 4 4 3 3 4 n k n k 2 n k n k 2 2 2 1 1 0 0 0 0 0 8 16 24 0 5 10 15 20 0 8 16 24 0 3 6 k k k k 3 3 3 3 2 2 2 2 n k n k n k n k 1 1 1 1 0 0 0 0 3 6 9 4 8 12 16 0 3 6 9 4 8 12 k k k k B = 4 , Λ ≃ 0 . 053 B = 5 , Λ = 1
Overlap vs. non-overlap 6 . 0 5 . 5 Σ /E 5 . 0 B = 4 (overlapping) B = 15 (nonoverlapping) 4 . 5 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 µ
SBM with layers T.P.P, Phys. Rev. E 92, 042807 (2015) ◮ Fairly straightforward. Easily combined with l = 3 degree-correction, overlaps, etc. ◮ Edge probabilities are in general different in each layer. l = 2 ◮ Node memberships can move or stay the same across layer. l = 1 ◮ Works as a general model for discrete as well as discretized edge covariates. Collapsed ◮ Works as a model for temporal networks.
SBM with layers Edge covariates l m l � rs ! � P ( { A l }|{ θ } ) = P ( A c |{ θ } ) m rs ! r ≤ s Independent layers � P ( { A l }|{{ θ } l } , { φ } , { z il }} ) = P ( A l |{ θ } l , { φ } ) l Embedded models can be of any type: Traditional, degree-corrected, overlapping.
Layer information can reveal hidden structure
Layer information can reveal hidden structure
... but it can also hide structure! → · · · × C 1 . 0 Collapsed E/C = 500 0 . 8 E/C = 100 E/C = 40 0 . 6 NMI E/C = 20 E/C = 15 0 . 4 E/C = 12 E/C = 10 0 . 2 E/C = 5 0 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 c
Model selection Null model: Collapsed (aggregated) SBM + fully random layers � l E l ! P ( { G l }|{ θ } , { E l } ) = P ( G c |{ θ } ) × E ! (we can also aggregate layers into larger layers)
Model selection Example: Social network of physicians N = 241 Physicians Survey questions: ◮ “When you need information or advice about questions of therapy where do you usually turn?” ◮ “And who are the three or four physicians with whom you most often find yourself discussing cases or therapy in the course of an ordinary week – last week for instance?” ◮ “Would you tell me the first names of your three friends whom you see most often socially?” T.P.P, Phys. Rev. E 92, 042807 (2015)
Model selection Example: Social network of physicians
Model selection Example: Social network of physicians
Model selection Example: Social network of physicians Λ = 1 log 10 Λ ≈ − 50
Example: Brazilian chamber of deputies Voting network between members of congress (1999-2006) C e n t P M e r D B , P P , P T B B T P , B D M P , P P P P , M E D B D P T S P L t h F P g T P i L R P e , D f M T t , P E S D B , P C R o d P B , B P D B T S P , P D M B P T , P P B
Example: Brazilian chamber of deputies Voting network between members of congress (1999-2006) G o v e r n m e n t B , M D P P P , P T B C P e M n D t B P M e , r P D P B S T P B P D , , P T P , , P B S P B P D , , P P C M d T o B B 2003-2006 B P T P P B T , B T D P M P T B D P D M E , P M P , L F P , , R P P L F P B D S P n o i t i s o p p O P P D B , , M M P P P , P T B E D P M D B B , P P B S D P T P P D P , T T P , S P O , S B B p P p t D , o n P s e C i t m M d i L o o n 1999-2002 B n e r P t v h F o P P G T g T B P i L T R P e P , D f M P T t T , B D P D E M E S P M D B , L , F P , P , R P C P L F B D P R o d S P B , B P D B T S P , P D M B T P , P P B
Example: Brazilian chamber of deputies Voting network between members of congress (1999-2006) G o v e r n m e n t B , M D P P P , P T B C P e M n D t B P M e , r P D P B S T P B P D , , P T P , , P B S P B P D , , P P C M d T o B B 2003-2006 B P T P P B T , B T D P M P T B D P D M E , P M P , L F P , , R P P L F P B D S P n o i t i s o p p O P P D B , , M M P P P , P T B E D P M D B B , P P B S D P T P P D P , T T P , S P O , S B B p P p t D , o n P s e C t i m M d i L o o n 1999-2002 B n r e P t v h F o P P G T g T B P i L T R P e P , D f M P T t T , B D P D E M E S P M D B , L , F P , P , R P C P L F B D P R o d S P B , B P D B T S P , P D M B P T , P P B log 10 Λ ≈ − 111 Λ = 1
Real-valued edges? Idea: Layers { ℓ } → bins of edge values! � P ( { G x }|{ θ } { ℓ } , { ℓ } ) = P ( { G l }|{ θ } { ℓ } , { ℓ } ) × ρ ( x l ) l Bayesian posterior → Number (and shape) of bins
Movement between groups... P M D B T P D P T , P S B P M D B P T T D P P , P B S S , P P T P F L , D E B M T P , R P , P P B P S S D P B , B o d C P
Networks with metadata Many network datasets contain metadata : Annotations that go beyond the mere adjacency between nodes. Often assumed as indicators of topological structure, and used to validate community detection methods. A.k.a. “ground-truth”.
Recommend
More recommend