Community detection with the non-backtracking operator Marc Lelarge - PowerPoint PPT Presentation

Community detection with the non-backtracking operator Marc Lelarge INRIA-ENS Aalto University, Helsinki, October 2016

Motivation Community detection in social or biological networks in the sparse regime with a small average degree. Adamic Glance ’05 Performance analysis of spectral algorithms on a toy model (where the ground truth is known!).

A model: the stochastic block model

The sparse stochastic block model A random graph model on n nodes with three parameters, a , b , c ≥ 0. total population

The sparse stochastic block model A random graph model on n nodes with three parameters, a , b , c ≥ 0. Assign each vertex spin + 1 or − 1 uniformly at random. + 1 and − 1

The sparse stochastic block model A random graph model on n nodes with three parameters, a , b , c ≥ 0. Independently for each pair ( u , v ) : if σ u = σ v = + 1, draw the edge w.p. a / n . if σ u � = σ v , draw the edge w.p. b / n . if σ u = σ v = − 1, draw the edge w.p. c / n . a / n , b / n , c / n .

Community detection problem Reconstruct the underlying communities (i.e. spin configuration σ ) based on one realization of the graph. Asymptotics: n → ∞ Sparse graph: the parameters a , b , c are fixed. notion of performance: w.h.p. strictly less than half of the vertices are misclassified = positively correlated partition.

A first attempt: looking at degrees Degree in community + 1 is: � n � n � � 2 − 1 , a 2 , b D + ∼ Bin + Bin n n We have E [ D + ] ≈ a + b , and Var ( D + ) ≈ a + b . 2 2 and similarly, in community − 1: E [ D − ] ≈ c + b , and Var ( D − ) ≈ c + b . 2 2 Clustering based on degrees should ’work’ as soon as: ( E [ D + ] − E [ D − ]) 2 ≻ max ( Var ( D + ) , Var ( D − )) i.e. (ignoring constant factors) ( a − c ) 2 ≻ b + max ( a , c ) .

Is it any good? Data: A the adjacency matrix of the graph. We define the mean column for each community:     a b . .  .   .  . .         A + = 1 A − = 1 a b     , and     b c n n         . .     . . . .     b c The variance of each entry is ≤ max ( a , b , c ) / n . Pretend the columns are i.i.d., spherical Gaussian and k = n ...

Clustering a mixture of Gaussians Consider a mixture of two spherical Gaussians in R n with respective means m 1 and m 2 and variance σ 2 . Pb: given k samples ∼ 1 / 2 N ( m 1 , σ 2 ) + 1 / 2 N ( m 2 , σ 2 ) , recover the unknown parameters m 1 , m 2 and σ 2 .

Doing better than naive algorithm If � m 1 − m 2 � 2 ≻ n σ 2 , then the densities ’do not overlap’ in R n . Projection preserves variance σ 2 . So projecting onto the line formed by m 1 and m 2 gives 1-dim. Gaussian variables with no overlap as soon as � m 1 − m 2 � 2 ≻ σ 2 . We gain a factor of n .

Algorithm for clustering a mixture of Gaussians Each sample is a column of the following matrix: A = ( A 1 , A 2 , . . . , A k ) ∈ R n × k Consider the SVD of A : n � λ i u i v T u i ∈ R n , v i ∈ R k , λ 1 ≥ λ 2 ≥ . . . A = i , i = 1 Then the best approximation for the direction ( m 1 , m 2 ) given by the data is u 1 . Project the points from R n onto this line and then do clustering. Provided k is large enough, this ’works’ as soon as: � m 1 − m 2 � 2 ≻ σ 2 .

Back to our clustering problem Data: A the adjacency matrix of the graph. The mean columns for each community are:     a b . .     . . . .         A + = 1 A − = 1 a b     , and     n b n c         . .     . . . .     b c The variance of each entry is ≤ max ( a , b , c ) / n .

Heuristics for community detection The naive algorithm should work as soon as n max ( a , b , c ) � A + − A − � 2 ≻ n � �� Var ( a − b ) 2 + ( b − c ) 2 ≻ n max ( a , b , c ) Spectral clustering should allow you a gain of n , i.e. ( a − b ) 2 + ( b − c ) 2 ≻ max ( a , b , c ) Our previous analysis shows that clustering based on degrees works as soon as ( a − c ) 2 ≻ max ( a , b , c ) . When a = c , no information given by the degrees.

The sparse symmetric stochastic block model A random graph model on n nodes with two parameters, a , b ≥ 0. Independently for each pair ( u , v ) : if σ u = σ v , draw the edge w.p. a / n . if σ u � = σ v , draw the edge w.p. b / n . a / n , b / n , a / n . Heuristic: spectral should work as soon as ( a − b ) 2 ≻ a + b

Efficiency of Spectral Algorithms Boppana ’87, Condon, Karp ’01, Carson, Impagliazzo ’01, McSherry ’01, Kannan, Vempala, Vetta ’04... Theorem Suppose that for sufficiently large K and K ′ , ( a − b ) 2 ≥ ( ≻ ) K + K ′ ln ( a + b ) , a + b then ’trimming+spectral+greedy improvement’ outputs a positively correlated (almost exact) partition w.h.p. Coja-Oghlan ’10 Heuristic based on analogy with mixture of Gaussians: ( a − b ) 2 ≻ a + b

Another look at spectral algorithms Take a finite, simple, non-oriented graph G = ( V , E ) . Adjacency matrix : symmetric, indexed on vertices, for u , v ∈ V , A uv = 1 ( { u , v } ∈ E ) . Low rank approximation of the adjacency matrix works as soon as ( a − b ) 2 ≻ a + b

Community detection with the non-backtracking operator Marc Lelarge - PowerPoint PPT Presentation

Community detection with the non-backtracking operator Marc Lelarge INRIA-ENS Aalto University, Helsinki, October 2016 Motivation Community detection in social or biological networks in the sparse regime with a small average degree. Adamic

In SVN: explain the concept of backtracking solve the n-queens problem using backtracking Qu

COMMUNITY MANAGEMENT jono bacon COMMUNITY COMMUNITY COMMUNITY COMMUNITY COMMUNITY COMMUNITY

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Backtracking Local Search Wheeler Ruml

Exhaustive Generation: Backtracking and Branch-and-bound Lucia Moura Fall 2013 Exhaustive

CS 310 Advanced Data Structures and Algorithms Backtracking July 2, 2018 Mohammad Hadian

24.1 CSP Algorithms 22.23. Introduction 24.26. Basic Algorithms 24. Backtracking

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Shuffle algebra perspective on operator valued probability theory 30 mars 2020 1/25 Operator

Community detection and cascades Rik Sarkar Today Community Detection Spectral

Ch Check out f from S SVN VN: Queens eens Exhaustive search, backtracking, and

Backtracking A short list of categories Algorithm types we will consider include: Simple

Foundations of Artificial Intelligence 24. Constraint Satisfaction Problems: Backtracking Malte

Constraint Satisfaction Problems Chapter 6 Outline Topics: CSP examples Backtracking

Constraint Satisfaction Problems: Backtracking Search Alice Gao Lecture 6 Based on work by K.

Tests, Facts and Backtracking Contents Tests Returning results Facts More on

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Chemometric Methods for the Kinetic Hard-modelling of Spectroscopic Data ETH Zurich, March 23 rd

Structured adaptive control, or how to solve LMIs with Simulink Alexandru - Razvan LUZI Dimitri

Indices Tomasz Bartoszewski Inverted Index Search Construction Compression Inverted

Scaling Methods to obtain Doubly stochastic matrices Krishna Acharya, Nacim Oijid December 10,

are: Opposite sides of a rectangle are parallel. 1. Opposite sides of a rectangle are equal. 2.

Parallelogram This is a four sided figure where opposite sides are equal and parallel. The

Nicholas Reed Structural Option Seneca Allegany Casino Hotel Addition AE Senior Thesis 2013

H O W E T R U S S HISTORY, USE AND STRUCTURAL ANALYSIS 143 bridges are supported by the Howe