Community Detection with Secondary Latent Variables Mohammad - PowerPoint PPT Presentation

Community Detection with Secondary Latent Variables Mohammad Esmaeili and Aria Nosratinia The University of Texas at Dallas { esmaeili, aria } @utdallas.edu 21-26 June 2020 1 / 18

Problem and Motivation Current Models: Edges independent conditioned on communities. Reality: Communities don’t completely explain edge dependencies Our work: Brings modeling and analysis closer to reality. 2 / 18

Examples Graphs and their Latent Variables: Social networks: republican/democrat communities leave dependence according to localities Product co-purchasing networks: products/buyer(men and women)/age Movie networks: type of movies (action, comedy, and romance)/ audiences (men and women)/age ratings 3 / 18

Introduction We consider: Known second latent variable as an auxiliary latent variable Unknown second latent variable as a nuisance latent variable Related Models in the literature: Overlapping communities Latent space models Graphs with additional non-graph observations 4 / 18

Our Contributions Community detection for the stochastic block model with secondary latent variable Applying semidefinite programming to community detection with secondary latent variable Calculating the exact recovery thresholds when the secondary latent variable is either known or unknown Showing that semidefinite programming is asymptotically optimal 5 / 18

System Model Primary and secondary latent variables (binary): x , y Adjacency matrix: A Edges are drawn from a Bernoulli i.i.d. distribution, conditioned on both x and y Assumption for estimator: x T 1 = 0 (can be relaxed) Goals: Recovering x when y is known y is unknown 6 / 18

Maximum Likelihood Detectors When y is known: T 1 x T ( A ∗ yy T ) x + T 2 x T Ax ˆ x =arg max x (1) subject to x i ∈ {± 1 } , i ∈ [ n ] x T 1 = 0 , where T 1 and T 2 are constants. When y is unknown: x T Ax x =arg max ˆ x (2) subject to x i ∈ {± 1 } , i ∈ [ n ] x T 1 = 0 , 7 / 18

Semidefinite Programming Relaxation Define Z � xx T and B � A ∗ yy T . Semidefinite relaxation arises from: x T Ax = Tr ( xx T A ) and x T Bx = Tr ( xx T B ) substituting xx T → Z relaxing the rank-1 constraint on Z to a positivity constraint 8 / 18

Semidefinite Programming Relaxation When y is known: ˆ Z =arg max � Z , T 1 B + T 2 A � Z subject to Z � 0 (3) Z ii = 1 , i ∈ [ n ] � Z , J � = 0 . When y is unknown: x =arg max ˆ � Z , A � x subject to Z � 0 (4) Z ii = 1 , i ∈ [ n ] � Z , J � = 0 . where J is all-one matrix. 9 / 18

Exact Recovery Conditions Exact recovery metric: n →∞ P ( e = 0) = 1 lim (5) Lemma: Consider the Lagrange multipliers D ∗ = diag ( d ∗ λ ∗ , S ∗ . i ) , If we have � D ∗ + λ ∗ J − T 1 B − T 2 A when y is known S ∗ = when y is unknown , D ∗ + λ ∗ J − A S ∗ � 0 , λ 2 ( S ∗ ) > 0 , S ∗ x ∗ = 0 then ( λ ∗ , D ∗ , S ∗ ) is the dual optimal solution and Z SDP = x ∗ x ∗ T is the unique primal optimal solution of (3) and (4). 10 / 18

Outline of Proof Optimality: via properties of Lagrangian and conditions of the Lemma Uniqueness: Simple contraposition exercise Showing S ∗ � 0 and λ 2 ( S ∗ ) > 0 with probability at least 1 − o (1). In other words, we show that � � V ⊥ x ∗ , � V � =1 V T S ∗ V > 0 P inf ≥ 1 − o (1) . 11 / 18

Parameters and Quality Metrics ρ y is the empirical fraction of nodes with y i = 1. a , b , c , d define the distribution of graph edges conditioned on latent variables  Bern( a log n  n ) if x i = x j , y i = y j     Bern( b log n n ) if x i = x j , y i � = y j A ij ∼ Bern( c log n  n ) if x i � = x j , y i = y j     Bern( d log n n ) if x i � = x j , y i � = y j Quality Metrics η 1 , η 2 reflect the likelihood of graph edge w.r.t. x Quality Metrics η 3 , η 4 reflect the likelihood of graph edge w.r.t. x , averaged over y . 12 / 18

Achievability Theorem When y is known, if min { η 1 , η 2 } > 1 and when y is unknown, if min { η 3 , η 4 } > 1 then the semidefinite programming estimator is asymptotically optimal, i.e., P ( Z SDP = Z ∗ ) ≥ 1 − o (1) . 13 / 18

Converses Theorem When y is known, if min { η 1 , η 2 } < 1 and when y is unknown, if min { η 3 , η 4 } < 1 then for any sequence of estimators � Z n , P ( � Z n = Z ∗ ) → 0 as n → ∞ . The converse is obtained via failure of Maximum Likelihood. 14 / 18

Results & Discussion Let γ 1 � min { η 1 , η 2 } and γ 2 � min { η 3 , η 4 } 2.5 y = 0.3, 1 y = 0.3, 2 2 y = 0.4, 1 y = 0.4, 2 y = 0.5, 1 1.5 y = 0.5, 2 1 , 2 Exact recovery region 1 0.5 0 4 6 8 10 12 14 a Figure: Exact recovery region of x with b = 3 , c = d = 1. 15 / 18

Results & Discussion 2.5 y = 0.3, 1 y = 0.3, 2 2 y = 0.4, 1 y = 0.4, 2 y = 0.5, 1 1.5 y = 0.5, 2 1 , 2 Exact recovery region 1 0.5 0 4 6 8 10 12 14 a Figure: Exact recovery region of x with b = c = d = 1. 16 / 18

Results & Discussion We introduced a generalization for stochastic block models with a secondary latent variable. Semidefinite programming relaxation of ML detector achieves exact recovery down to the optimal threshold 17 / 18

Thank You 18 / 18

Community Detection with Secondary Latent Variables Mohammad - PowerPoint PPT Presentation

Community Detection with Secondary Latent Variables Mohammad Esmaeili and Aria Nosratinia The University of Texas at Dallas { esmaeili, aria } @utdallas.edu 21-26 June 2020 1 / 18 Problem and Motivation Current Models: Edges independent

Secondary Framing Secondary Framing Secondary Framing Secondary Framing 1 1 Secondary Framing

COMMUNITY MANAGEMENT jono bacon COMMUNITY COMMUNITY COMMUNITY COMMUNITY COMMUNITY COMMUNITY

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

Exercise and Secondary Exercise and Secondary Exercise and Secondary Exercise and Secondary

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Closures & Scoping Variables Parameters Local variables Free variables

Latent variables Michel Bierlaire Transport and Mobility Laboratory School of Architecture,

Factor Analysis Professor Patrick Sturgis Plan Measuring concepts using latent variables

Estimation of moment-based models with latent variables work in progress Raaella Giacomini and

Analyzing multiple time series using a dynamic latent variables principal component analysis

Community detection and cascades Rik Sarkar Today Community Detection Spectral

Exploratory Factor Analysis Applied Multivariate Statistics Spring 2012 Latent-variable models

Look Ma, No Latent Variables: Accurate Cutset Networks via Compilation Tahrima Rahman, Shasha

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

The Beauty and Joy of The Beauty and Joy of Computing Computing Lecture #4 Lectur e #4

Microservices Lessons Learned Susanne Kaiser Independent Tech Consultant @suksr CTO at Just

Linear Regression Implementation from Scratch Linear Regression Implementation from Scratch In

Coding a graph application from scratch with GRANDstack Christian Miles - NODES 2019 GRANDstack

Basics of Intrusion Detection Watch whats going on in the system Try to detect behavior

BELONG A family support programme promoting a sense of belonging for black & minority ethnic

MODELLING & SIMULATION CENTRE OF EXCELLENCE STATE OF THE ART OF IMMERSIVE TRAINING TECHNOLOGY

The Confusion & Distraction of Full Autonomy Chris Jenks SMU Dedman School of Law

Community Detection with Secondary Latent Variables Mohammad - PowerPoint PPT Presentation

Community Detection with Secondary Latent Variables Mohammad Esmaeili and Aria Nosratinia The University of Texas at Dallas { esmaeili, aria } @utdallas.edu 21-26 June 2020 1 / 18 Problem and Motivation Current Models: Edges independent

Secondary Framing Secondary Framing Secondary Framing Secondary Framing 1 1 Secondary Framing

COMMUNITY MANAGEMENT jono bacon COMMUNITY COMMUNITY COMMUNITY COMMUNITY COMMUNITY COMMUNITY

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

Exercise and Secondary Exercise and Secondary Exercise and Secondary Exercise and Secondary

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Closures &amp; Scoping Variables Parameters Local variables Free variables

Latent variables Michel Bierlaire Transport and Mobility Laboratory School of Architecture,

Factor Analysis Professor Patrick Sturgis Plan Measuring concepts using latent variables

Estimation of moment-based models with latent variables work in progress Raaella Giacomini and

Analyzing multiple time series using a dynamic latent variables principal component analysis

Community detection and cascades Rik Sarkar Today Community Detection Spectral

Exploratory Factor Analysis Applied Multivariate Statistics Spring 2012 Latent-variable models

Look Ma, No Latent Variables: Accurate Cutset Networks via Compilation Tahrima Rahman, Shasha

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

The Beauty and Joy of The Beauty and Joy of Computing Computing Lecture #4 Lectur e #4

Microservices Lessons Learned Susanne Kaiser Independent Tech Consultant @suksr CTO at Just

Linear Regression Implementation from Scratch Linear Regression Implementation from Scratch In

Coding a graph application from scratch with GRANDstack Christian Miles - NODES 2019 GRANDstack

Basics of Intrusion Detection Watch whats going on in the system Try to detect behavior

BELONG A family support programme promoting a sense of belonging for black &amp; minority ethnic

MODELLING &amp; SIMULATION CENTRE OF EXCELLENCE STATE OF THE ART OF IMMERSIVE TRAINING TECHNOLOGY

The Confusion &amp; Distraction of Full Autonomy Chris Jenks SMU Dedman School of Law

Closures & Scoping Variables Parameters Local variables Free variables

BELONG A family support programme promoting a sense of belonging for black & minority ethnic

MODELLING & SIMULATION CENTRE OF EXCELLENCE STATE OF THE ART OF IMMERSIVE TRAINING TECHNOLOGY

The Confusion & Distraction of Full Autonomy Chris Jenks SMU Dedman School of Law