Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Today’s Lecture: Networks • Networks introduction • Web as a network • Networks properties • Random graph model: Erdos-Renyi Random Graph Model • Random graph model: Small-world Random Graph Model Some slides are inspired by Prof. Jure Leskovec’s slides 2 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Simplest Model of Graphs ¡ Erdös-Renyi Random Graphs [Erdös-Renyi, 1960] • Two variants: – G n,p : undirected graph on n nodes and each edge (u,v) appears i.i.d. with probability p – G n,m : undirected graph with n nodes and m edges, where edges are picked uniformly at random • What kind of networks do such models produce? 3 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Random Graph Models: Intuition • n and p do not uniquely determine the graph! – The graph is a result of a random process • We can have many different realizations given the same n and p n = 10 p= 1/6 4 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Random Graph Model: Edges • How likely is a graph on E edges? • P(E): the probability that a given G np generates a graph on exactly E edges: æ ö E - = ç max ÷ - E E E P ( E ) p ( 1 p ) max ç ÷ E è ø where E max =n(n-1)/2 is the maximum possible number of edges in an undirected graph of n nodes • P(E) is a Binomial distribution: Number of successes in a sequence of E max independent yes/no experiments 5 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Node Degrees in a Random Graph • What is expected degree of a node? n − 1 ∑ E [ X v ] = E [ X vu ] = ( n − 1) p u = 1 • Probability of node u linking to node v is p • u can link (flips a coin) to all other (n-1) nodes • Thus, the expected degree of node u is: p(n-1) 6 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Key Network Properties • Degree distribution: P(k) • Clustering coefficient: C • Path length: h What are the values of these properties for G np ? 7 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Degree Distribution • Degree distribution of G np is binomial • Let P(k) denote the fraction of nodes with degree k: - æ ö n 1 ç ÷ - - = - k n 1 k P ( k ) p ( 1 p ) ç ÷ k è ø Probability of Probability of Select k nodes missing the rest of the having k edges out of n-1 n-1-k edges • Mean and variance of a binomial distributio n = ( - k p n 1 ) σ 2 = p (1 − p )( n − 1) 8 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Degree Distribution • As the network size increases, the distribution becomes increasingly narrow—we are increasingly confident that the degree of a node is in the vicinity of k. 1/2 " % k = 1 − p 1 1 σ P(k) ≈ $ ' ( n − 1) 1/2 p ( n − 1) # & k 9 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Clustering Coefficient of G np 2 e = C i • Clustering coefficient i - k ( k 1 ) i i – Where e i is the number of edges between i’s neighbors e i = p k i ( k i − 1) 2 Each pair is connected Number of distinct pairs of with prob. p neighbors of node i of degree k i × - p k ( k 1 ) k k • So, = = = » C i i p - - k ( k 1 ) n 1 n i i • Clustering coefficient of a random graph is small – Bigger graphs with the same average degree k have lower clustering coefficient 10 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Key Network Properties æ - ö n 1 • Degree distribution: - - = ç ÷ - k n 1 k P ( k ) p ( 1 p ) ç ÷ k è ø • Clustering coefficient: C=p=k/n • Path length: h What are the values of these properties for G np ? 11 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Average Shortest Path • Average path length = O (log n ) • Erdös-Renyi networks can grow to be very large but nodes will be just a few hops apart 20 average shortest path 15 10 5 0 0 200000 400000 600000 800000 1000000 num nodes 12 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

MSN Network Properties vs. G np Properties MSN G np Degree distribution: Path length: 6.6 O (log n ) ~ 8.2 Clustering coefficient: 0.11 k / n ≈ 8·10-8 13 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Clustering Implies Edge Locality • MSN network has 7 orders of magnitude larger clustering than the corresponding G np ! • Other examples: – Actor Collaborations (IMDB): N = 225,226 nodes, avg. degree k = 61 – Electrical power grid: N = 4,941 nodes, k = 2.67 – Network of neurons: N = 282 nodes, k = 14 Network h actual h random C actual C random Film actors 3.65 2.99 0.00027 Power Grid 18.70 12.40 0.005 C. elegans 2.65 2.25 0.05 14 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

G np Simulation Experiment: Giant Component • n = 100,000, k=p(n-1) = 0.5 … 3 • Emergence of a giant component: average degree k=2E/n or p=k/(n-1) – When k=1- ε : all components p*(n-1)=1 are of size Ω (log n) – k=1+ ε : 1 component of size Ω (n), others have size Ω (log n) Fraction of nodes in the largest component 15 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Real Networks vs. G np • Are real networks like random graphs? – Giant connected component: YES – Average path length: YES – Clustering Coefficient: NO – Degree Distribution: NO 16 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Real Networks vs. G np • Problems with the random networks model: – Degree distribution differs from that of real networks – Giant component in most real networks does NOT emerge through a phase transition – No local structure – clustering coefficient is too low • Most important: Are real networks random? – The answer is simply: NO! 17 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Real Networks vs. G np • If G np is wrong, why did we spend time on it? – It is the reference model for the rest of the class. – It will help us calculate many quantities, that can then be compared to the real data – It will help us understand to what degree is a particular property the result of some random process • While G np is not realistic, it is useful 18 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Problem with the ER Model • G np model has short paths: O(log n) – This is the smallest diameter we can get if we have a constant degree. Low diameter – But clustering is low! Low clustering coefficient • But real networks have “local” structure – Triadic closure: Friend of a friend is my friend – High clustering but diameter is also high High clustering coefficient • Can we generate graphs with high clustering High diameter coefficient while having short paths (low diameter) ? • Solution: Small-World Model 19 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Today’s Lecture: Networks • Networks introduction • Web as a network • Networks properties • Random graph model: Erdos-Renyi Random Graph Model • Random graph model: Small-world Random Graph Model Some slides are inspired by Prof. Jure Leskovec’s slides 20 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Six Degrees of Kevin Bacon Origins of a small-world idea: • The Bacon number: – Create a network of Hollywood actors – Connect two actors if they co-appeared in the movie – Bacon number: number of steps to Kevin Bacon • As of Dec 2007, the highest Bacon number reported is 8 • Only approx. 12% of all actors cannot be linked to Bacon 21 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Erdos Number • Erdos Number: number of hops in scientific co-author graph to reach Paul Erdos • Srijan’ Erdos number is 4. • Find out your Erdos number: http://www.ams.org/mathscin et/collaborationDistance.html 22 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

The Small-World Experiment • What is the typical shortest path length between any two people? – Experiment on the global friendship network Can’t measure, need to probe explicitly • • Small-world experiment [Milgram ’67] – Picked 300 people in Omaha, Nebraska and Wichita, Kansas – Ask them to get a letter to a stock-broker in Boston by passing it through friends only • How many steps do you think it took? 23 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

The Small-World Experiment • 64 chains completed (letters reached) – It took 6.2 steps on the average, thus Milgram’s small world experiment “6 degrees of separation” • Further observations: – People who owned stock had shorter paths to the stockbroker than random people: 5.4 vs. 6.7 – People from the Boston area have even closer paths: 4.4 • On average, you are 6 hops away from anyone in the world! 24 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Todays Lecture: Networks Networks introduction Web as a network

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Back to Random Walks on Graphs Random walk on a graph: Stationary distribution: Back to Random

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Algorithms for random k -SAT and k -colourings of a random graph Michael Molloy Dept of Computer

Random graph methods October 16, 2018 Random graph methods October 16, 2018 1 / 37 Graphs and

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Chapter 2: Random Variables In this chapter we will cover: 1. Discrete Random variables, ( 2.1

Random Numbers, Files, and Onwards Random Numbers Computers cannot produce truly random numbers.

Random Walks on Graphs Larry Fenn DATE Larry Fenn Random Walks on Graphs Introduction

Random Graphs Will Perkins February 5, 2013 Graph Terminology A graph G = ( V , E ) is a set of

Exponential Random Graph Models and Their Polytopes Johannes Rauh York University (the one in

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Multilevel Models Session 3: Random coefficient models Outline Random coefficient models

W HAT DOES IT MEAN ? Given a graph G with vertex set [ n ] : Pr ( G ( n , p ) = G ) = p e ( G ) ( 1

Graphical Models - Part II Oliver Schulte - CMPT 726 Bishop PRML Ch. 8 Markov Random Fields

Why experimenters should not randomize, and what they should do instead Maximilian Kasy

Randomized techniques for parameterized algorithms Dniel Marx 1 1 Institute of Computer Science

Approximation and Randomized Algorithms Lecturer: Shi Li Department of Computer Science and

15-251 Great Theoretical Ideas in Computer Science Lecture 20: Randomized Algorithms November

Randomized algorithms Review basics from ``Think like the pros'' Recall QuickSort(low,

Certified Adversarial Robustness via Randomized Smoothing Jeremy Cohen Elan Rosenfeld

Biostatistics and Design Core up to 2016 Andrea J Cook, PhD Senior Investigator Biostatistics

Demystifying Biostatistical Concepts for Embedded Pragmatic Clinical Trials June 19, 2020

Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Todays Lecture: Networks Networks introduction Web as a network

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Back to Random Walks on Graphs Random walk on a graph: Stationary distribution: Back to Random

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Algorithms for random k -SAT and k -colourings of a random graph Michael Molloy Dept of Computer

Random graph methods October 16, 2018 Random graph methods October 16, 2018 1 / 37 Graphs and

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Chapter 2: Random Variables In this chapter we will cover: 1. Discrete Random variables, ( 2.1

Random Numbers, Files, and Onwards Random Numbers Computers cannot produce truly random numbers.

Random Walks on Graphs Larry Fenn DATE Larry Fenn Random Walks on Graphs Introduction

Random Graphs Will Perkins February 5, 2013 Graph Terminology A graph G = ( V , E ) is a set of

Exponential Random Graph Models and Their Polytopes Johannes Rauh York University (the one in

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Multilevel Models Session 3: Random coefficient models Outline Random coefficient models

W HAT DOES IT MEAN ? Given a graph G with vertex set [ n ] : Pr ( G ( n , p ) = G ) = p e ( G ) ( 1

Graphical Models - Part II Oliver Schulte - CMPT 726 Bishop PRML Ch. 8 Markov Random Fields

Why experimenters should not randomize, and what they should do instead Maximilian Kasy

Randomized techniques for parameterized algorithms Dniel Marx 1 1 Institute of Computer Science

Approximation and Randomized Algorithms Lecturer: Shi Li Department of Computer Science and

15-251 Great Theoretical Ideas in Computer Science Lecture 20: Randomized Algorithms November

Randomized algorithms Review basics from ``Think like the pros'' Recall QuickSort(low,

Certified Adversarial Robustness via Randomized Smoothing Jeremy Cohen Elan Rosenfeld

Biostatistics and Design Core up to 2016 Andrea J Cook, PhD Senior Investigator Biostatistics

Demystifying Biostatistical Concepts for Embedded Pragmatic Clinical Trials June 19, 2020

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,