Statistical Inference for Networks 4th Lehmann Symposium, Rice - PowerPoint PPT Presentation

Statistical Inference for Networks 4th Lehmann Symposium, Rice University, May 2011 Peter Bickel Statistics Dept. UC Berkeley (Joint work with Aiyou Chen, Google , E. Levina, U. Mich , S. Bhattacharyya, UC Berkeley )

Outline 1 Networks: Examples 2 Descriptive statistics 3 Statistical issues and selected models 4 A nonparametric model for infinite networks and asymptotic theory 5 Statistical fitting approaches a) ’Moments’ b) Pseudo likelihood c) Estimation of w

Example: Social Networks Figure: Karate Club (Newman, PNAS 2006)

Example: Social Networks Figure: Facebook Network for Caltech with 769 nodes and average degree 43.

References 1. M.E.J. Newman (2010) Networks: An introduction. Oxford 2. Fan Chung, Linyuan Lu (2004) Complex graphs and networks. CBMS # 107 AMS 3. Eric D. Kolaczyk (2009) Statistical Analysis of Network Data 4. Bela Bollobas, Svante Janson, Oliver Riordan (2007) The Phase Transition in Random Graphs. Random Structures and Algorithms, 31 (1) 3-122 5. B. and A. Chen (2009) A nonparametric view of network models and Newman-Girvan and other modularities, PNAS 6. David Easley and Jon Kleinberg (2010) Networks, crowds and markets: Reasoning about a highly connected world. Cambridge University Press

A Mathematical Formulation • G = ( V , E ): undirected graph • { 1 , · · · , n } : Arbitrarily labeled vertices • A : adjacency matrix • A ij = 1 if edge between i and j (relationship) • A ij = 0 otherwise • D i = � n j =1 A ij = Degree of vertex i .

Descriptive Statistics (Newman, Networks, 2010) • Degree of vertex, Average degree of graph, D i = � j A ij , D • # and size of connected components • Geodesic distance # of ∆ ’s • Homophily := # of ∆ ’s + # of V ′ s . • etc

Implications of Mathematical Description • Undirected: Relations to or from not distinguished. • Arbitrary labels: individual, geographical information not used. But will touch on covariates.

Stochastic Models The Erd˝ os-R´ enyi Model • Probability distributions on graphs of n vertices. • P on { Symmetric n × n matrices of 0’s and 1’s } . • E-R (modified): place edges independently with probability � n � λ/ n ( Bernoulli trials ). 2 λ ≈ E (ave degree)

Nonparametric Asymptotic Model for Unlabeled Graphs Given: P on ∞ graphs Aldous/Hoover (1983) L ( A ij : i , j ≥ 1 } = L ( A π i ,π j : i , j ≥ 1) , for all permutations π ⇐ ⇒ g : [0 , 1] 4 → { 0 , 1 } such that A ij ∃ = g ( α, ξ i , ξ j , η ij ) , where α, ξ i , η ij , all i , j ≥ i , i.i.d. U (0 , 1), g ( α, u , v , w ) = g ( α, v , u , w ), η ij = η ji .

Block Models (Holland, Laskey and Leinhardt 1983) Probability model: • Community label: c = ( c 1 , · · · , c n ) i.i.d. multinomial ( π 1 , · · · , π K ) ≡ K “communities”. • Relation: P ( A ij = 1 | c i = a , c j = b ) = P ab . • A ij conditionally independent � 1 − P ( A ij = 0) = π a π b P ab . 1 ≤ a , b ≤ K • K = 1: E-R model.

Ergodic Models L is an ergodic probability iff for g with g ( u , v , w ) = g ( v , u , w ) ∀ ( u , v , w ), A ij = g ( ξ i , ξ j , η ij ) . L is determined by h ( u , v ) ≡ P ( A ij = 1 | ξ i = u , ξ j = v ) , h ( u , v ) = h ( v , u ) . Notes: 1. K -block models and many other special cases 2. Model (also referred to as threshhold models) also suggested by Diaconis, Janson (2008) 3. More general models (Bollob´ as, Riordan & Janson (2007))

“Parametrization” of NP Model • h is not uniquely defined. � � • h ϕ ( u ) , ϕ ( v ) , where ϕ is measure-preserving, gives same model. But, h can = that h ( · , · ) in equivalence class such that � 1 P [ A ij = 1 | ξ i = z ] = 0 h can ( z , v ) dv ≡ τ ( z ) with τ ( · ) monotone increasing characterizes uniquely. • ξ i could be replaced by any continuous variables or vectors - but there is no natural unique representation.

Examples of models i) Block models: on block of sizes π a , π b h CAN ( u , v ) = F ab ii) Power law: w ( u , v ) = a ( u ) a ( v ) (1 − u ) − α as u ↑ 1 ∼ a ( u ) iii) Dynamically defined model (preferential attachment): w ( u , v ) = a ( u )1( u ≤ v ) + a ( v )1( u > v ) New vertex attaches to random old vertex and neighbors (not Hilbert-Schmidt) a CAN ( u ) = (1 − u ) − 1 + τ ( u ) , a CAN ( u ) = (1 − u ) − 1 − log( u (1 − u ))

Questions i) Community identification and block models ii) Checking “nonparametrically” with p ”moments” whether 2 graphs are same (permutation tests used in social science literature for “block models”, e.g., Wasserman and Faust, 1994). iii) Link prediction: predicting relations to unobserved vertices on the basis of an observed graph. iv) Model selection for hierarchies (block models). v) Error bars on descriptive statistics. vi) Linking graph features with covariates.

Asymptotic Approximation • h n ( u , v ) = ρ n w n ( u , v ) • ρ n = P [Edge] • w ( u , v ) dudv = P [ ξ 1 ∈ [ u , u + du ] , ξ 2 ∈ [ v , v + dv ] | Edge] � w ( u , v ) , ρ − 1 � • w n ( u , v ) = min n • Average Degree = E ( D + ) ≡ λ n ≡ ρ n ( n − 1) . n

Nonparametric Theory: The Operator Corresponding to w can ∈ L 2 (0 , 1) there is operator: T : L 2 (0 , 1) → L 2 (0 , 1) � 1 Tf ( · ) = 0 f ( v ) w ( · , v ) dv T - Hermitian Note: τ ( · ) = T ( 1 )( · ).

Nonparametric Theory Let F and ˆ F be the distribution and empirical distribution of τ ( ξ ) ≡ T ( 1 )( ξ ) where ξ has a U (0 , 1) distribution. Let ρ = λ/ n . Theorem 1 If λ → ∞ , then n 1 � 2 � O ( λ − 1 ) � D i / D − T (1)( ξ i ) E = n i =1 This implies, ˆ F ⇒ F in probability.

Identifiability of NP Model Theorem 2 The joint distribution ( T (1)( ξ ) , T 2 (1)( ξ ) , ..., T m (1)( ξ ) , ... ) where ξ ∼ U (0 , 1) determines P Idea of proof: identify the eigen-structure of T .

Theorem 3 If T corresponds to a K -block model, then, the marginal distributions, � � T k (1)( ξ ) : k = 1 , ..., K determine ( π, W ) uniquely provided that the vectors π , W π , ..., W K − 1 π are linearly independent.

Methods of Estimation – Method of “Moments” ( k , ℓ )-wheel i) A “hub” vertex ii) l spokes from hub iii) Each spoke has k connected vertices. Total # of vertices (order): k ℓ + 1. Total # of edges (size): k ℓ . Eg: a (2,3)-wheel

”Moments” • For R ⊂ { ( i , j ) : 1 ≤ i < j ≤ n } , identify R as a graph with vertex set V ( R ) = { i : ( i , j ) or ( j , i ) ∈ R for some j } and E ( R ) = R . • Let G n ( R ) be the subgraph induced by R in graph G n . • Define, Q ( R ) = P ( A ij = 1 , all ( i , j ) ∈ R ) P ( R ) = P ( E ( G n ( R )) = R ) • We can estimate P ( R ) and Q ( R ) in a graph G n by 1 � � � ˆ 1 ( G ∼ R : G ⊂ G n ) , P ( R ) = E ˆ P ( R ) ≡ P ( R ) � n � N ( R ) p N ( R ) ≡ |{ G ⊂ G n : G ∼ R }| ˆ � { ˆ P ( S ) : S ⊃ R } , Q ( R ) = E ˆ Q ( R ) ≡ Q ( R )

Estimates of P and Q Suppose | R | = p fixed, ρ n → 0. Let P ( h n ( ξ 1 , ξ 2 ) > ρ ) = o ( n − 1 ). Then, define, • ˜ P ( R ) = ρ − p n P ( R ) = ˜ Q ( R ) + O ( λ n / n ). �� • ˜ Q ( R ) = ρ − p n Q ( R ) → E ( i , j ) ∈ R w n ( ξ i , ξ j ) . � − p ˆ • ˆ � ˜ D P ( R ) = P ( R ). n � − p ˆ • ˆ � ˜ D Q ( R ) = Q ( R ). n

Moment Convergence Theorem ( λ → ∞ and λ = O (1) ) Theorem 4 a) Suppose R is acyclic, and λ → ∞ . √ n (ˆ P ( R ) − ˜ ˜ P ( R )) ⇒ N (0 , σ 2 ( R , P )) and multivariate normality holds for R 1 , · · · , R k acyclic. b) If λ = O (1), a) continues to hold except that σ 2 depends on λ as well as R . c) Even if R is not acyclic, the same conclusions apply to ˆ ˜ P and ˆ ˜ Q if λ ≥ n 1 − 2 / p .

Connection With Wheels Lemma 1 Let G be a random graph generated according to P , | V ( G ) | = k ℓ + 1. Then if R is a ( k , ℓ )-wheel, E [ T k (1)( ξ 1 )] ℓ Q ( R ) = ( kl + 1)! N ( R ) = ℓ ! P ( R ) = Q ( R ) + O ( λ/ n )

Difficulties Even for sparse models (i) Empirical moments of trees are hard to compute. (ii) Empirical moments of small size converge reasonably even in sparse case, but block model parameters expressed as nonlinear function of moments not so well.

Extensions: Generalized Wheels A ( k , l )-wheel, where k = ( k 1 , . . . , k t ), l = ( l 1 , . . . , l t ) are vectors and the k j ’s, l j ’s are distinct integers, is the union R 1 ∪ · · · ∪ R t , where R j is a ( k j , l j )-wheel, sharing a common hub but all their spokes are disjoint. • Trees are examples of ( k , l )-wheels. T ( ξ ) , T 2 ( ξ ) , . . . � � • Their limits yield cross-moments of . • So, in principle, we can estimate parameters of block model, using the ( k , l )-wheels. • Using ( k , l )-wheels, we can estimate the parameters of models approximating NP model.

Method of fitting: Pseudo likelihood (Combining ideas of Besag (1974) and Newman & Leicht (2007)) Partition n into K communities of equal size S 1 = { 1 , · · · , m } , S 2 = { m + 1 , · · · , 2 m } , · · · m = n / K .

Statistical Inference for Networks 4th Lehmann Symposium, Rice - PowerPoint PPT Presentation

Statistical Inference for Networks 4th Lehmann Symposium, Rice University, May 2011 Peter Bickel Statistics Dept. UC Berkeley (Joint work with Aiyou Chen, Google , E. Levina, U. Mich , S. Bhattacharyya, UC Berkeley ) Outline 1 Networks: Examples

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Foundations for Inference I Dajiang Liu @PHS525 Feb-09-2016 Statistical Inference

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Statistical Natural Language Processing Statistical models: learning, inference, estimation,

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Lifted Inference in Statistical Relational Models Guy Van den Broeck BUDA Invited Tutorial June

Statistical Inference https://people.bath.ac.uk/masss/APTS/apts.html Simon Shaw University of

Modes of Statistical Inference for Causal Efgects Plus an overview of the testing based approach

Inference of human of human Inference transcription regulatory networks regulatory networks

Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

A Distance Measure for the Analysis of Polar Opinion Dynamics in Social Networks Victor Amelkin

Kipf, T., Welling, M.: Semi-Supervised Classification with Graph Convolutional Networks Radim

An Information Flow Model for Conflict and Fission in Small Groups By: Wayne W. Zachary

Sean P. Cornelius With Emma K. Towlson and Albert-Lszl Barabsi www.BarabasiLab.com

CMU 15-251 Graphs: Basics Teachers: Anil Ada Ariel Procaccia (this time) Zachary Karate Club

Recommender Systems Instructor: Ekpe Okorafor 1. Accenture Big Data Academy 2. Computer

Social and Technological Networks Rik Sarkar Social Networks Network of friends Node:

DeepWalk: Online Learning of Social Representations ACM SIG-KDD August 26, 2014 Bryan Perozzi ,