random graph models
play

Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia - PowerPoint PPT Presentation

CSE 6240: Web Search and Text Mining. Spring 2020 Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining Todays Lecture: Networks Networks introduction Web as a network


  1. CSE 6240: Web Search and Text Mining. Spring 2020 Random Graph Models Prof. Srijan Kumar 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  2. Today’s Lecture: Networks • Networks introduction • Web as a network • Networks properties • Random graph model: Erdos-Renyi Random Graph Model • Random graph model: Small-world Random Graph Model Some slides are inspired by Prof. Jure Leskovec’s slides 2 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  3. Simplest Model of Graphs ¡ Erdös-Renyi Random Graphs [Erdös-Renyi, 1960] • Two variants: – G n,p : undirected graph on n nodes and each edge (u,v) appears i.i.d. with probability p – G n,m : undirected graph with n nodes and m edges, where edges are picked uniformly at random • What kind of networks do such models produce? 3 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  4. Random Graph Models: Intuition • n and p do not uniquely determine the graph! – The graph is a result of a random process • We can have many different realizations given the same n and p n = 10 p= 1/6 4 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  5. Random Graph Model: Edges • How likely is a graph on E edges? • P(E): the probability that a given G np generates a graph on exactly E edges: æ ö E - = ç max ÷ - E E E P ( E ) p ( 1 p ) max ç ÷ E è ø where E max =n(n-1)/2 is the maximum possible number of edges in an undirected graph of n nodes • P(E) is a Binomial distribution: Number of successes in a sequence of E max independent yes/no experiments 5 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  6. Node Degrees in a Random Graph • What is expected degree of a node? n − 1 ∑ E [ X v ] = E [ X vu ] = ( n − 1) p u = 1 • Probability of node u linking to node v is p • u can link (flips a coin) to all other (n-1) nodes • Thus, the expected degree of node u is: p(n-1) 6 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  7. Key Network Properties • Degree distribution: P(k) • Clustering coefficient: C • Path length: h What are the values of these properties for G np ? 7 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  8. Degree Distribution • Degree distribution of G np is binomial • Let P(k) denote the fraction of nodes with degree k: - æ ö n 1 ç ÷ - - = - k n 1 k P ( k ) p ( 1 p ) ç ÷ k è ø Probability of Probability of Select k nodes missing the rest of the having k edges out of n-1 n-1-k edges • Mean and variance of a binomial distributio n = ( - k p n 1 ) σ 2 = p (1 − p )( n − 1) 8 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  9. Degree Distribution • As the network size increases, the distribution becomes increasingly narrow—we are increasingly confident that the degree of a node is in the vicinity of k. 1/2 " % k = 1 − p 1 1 σ P(k) ≈ $ ' ( n − 1) 1/2 p ( n − 1) # & k 9 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  10. Clustering Coefficient of G np 2 e = C i • Clustering coefficient i - k ( k 1 ) i i – Where e i is the number of edges between i’s neighbors e i = p k i ( k i − 1) 2 Each pair is connected Number of distinct pairs of with prob. p neighbors of node i of degree k i × - p k ( k 1 ) k k • So, = = = » C i i p - - k ( k 1 ) n 1 n i i • Clustering coefficient of a random graph is small – Bigger graphs with the same average degree k have lower clustering coefficient 10 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  11. Key Network Properties æ - ö n 1 • Degree distribution: - - = ç ÷ - k n 1 k P ( k ) p ( 1 p ) ç ÷ k è ø • Clustering coefficient: C=p=k/n • Path length: h What are the values of these properties for G np ? 11 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  12. Average Shortest Path • Average path length = O (log n ) • Erdös-Renyi networks can grow to be very large but nodes will be just a few hops apart 20 average shortest path 15 10 5 0 0 200000 400000 600000 800000 1000000 num nodes 12 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  13. MSN Network Properties vs. G np Properties MSN G np Degree distribution: Path length: 6.6 O (log n ) ~ 8.2 Clustering coefficient: 0.11 k / n ≈ 8·10-8 13 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  14. Clustering Implies Edge Locality • MSN network has 7 orders of magnitude larger clustering than the corresponding G np ! • Other examples: – Actor Collaborations (IMDB): N = 225,226 nodes, avg. degree k = 61 – Electrical power grid: N = 4,941 nodes, k = 2.67 – Network of neurons: N = 282 nodes, k = 14 Network h actual h random C actual C random Film actors 3.65 2.99 0.00027 Power Grid 18.70 12.40 0.005 C. elegans 2.65 2.25 0.05 14 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  15. G np Simulation Experiment: Giant Component • n = 100,000, k=p(n-1) = 0.5 … 3 • Emergence of a giant component: average degree k=2E/n or p=k/(n-1) – When k=1- ε : all components p*(n-1)=1 are of size Ω (log n) – k=1+ ε : 1 component of size Ω (n), others have size Ω (log n) Fraction of nodes in the largest component 15 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  16. Real Networks vs. G np • Are real networks like random graphs? – Giant connected component: YES – Average path length: YES – Clustering Coefficient: NO – Degree Distribution: NO 16 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  17. Real Networks vs. G np • Problems with the random networks model: – Degree distribution differs from that of real networks – Giant component in most real networks does NOT emerge through a phase transition – No local structure – clustering coefficient is too low • Most important: Are real networks random? – The answer is simply: NO! 17 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  18. Real Networks vs. G np • If G np is wrong, why did we spend time on it? – It is the reference model for the rest of the class. – It will help us calculate many quantities, that can then be compared to the real data – It will help us understand to what degree is a particular property the result of some random process • While G np is not realistic, it is useful 18 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  19. Problem with the ER Model • G np model has short paths: O(log n) – This is the smallest diameter we can get if we have a constant degree. Low diameter – But clustering is low! Low clustering coefficient • But real networks have “local” structure – Triadic closure: Friend of a friend is my friend – High clustering but diameter is also high High clustering coefficient • Can we generate graphs with high clustering High diameter coefficient while having short paths (low diameter) ? • Solution: Small-World Model 19 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  20. Today’s Lecture: Networks • Networks introduction • Web as a network • Networks properties • Random graph model: Erdos-Renyi Random Graph Model • Random graph model: Small-world Random Graph Model Some slides are inspired by Prof. Jure Leskovec’s slides 20 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  21. Six Degrees of Kevin Bacon Origins of a small-world idea: • The Bacon number: – Create a network of Hollywood actors – Connect two actors if they co-appeared in the movie – Bacon number: number of steps to Kevin Bacon • As of Dec 2007, the highest Bacon number reported is 8 • Only approx. 12% of all actors cannot be linked to Bacon 21 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  22. Erdos Number • Erdos Number: number of hops in scientific co-author graph to reach Paul Erdos • Srijan’ Erdos number is 4. • Find out your Erdos number: http://www.ams.org/mathscin et/collaborationDistance.html 22 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  23. The Small-World Experiment • What is the typical shortest path length between any two people? – Experiment on the global friendship network Can’t measure, need to probe explicitly • • Small-world experiment [Milgram ’67] – Picked 300 people in Omaha, Nebraska and Wichita, Kansas – Ask them to get a letter to a stock-broker in Boston by passing it through friends only • How many steps do you think it took? 23 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

  24. The Small-World Experiment • 64 chains completed (letters reached) – It took 6.2 steps on the average, thus Milgram’s small world experiment “6 degrees of separation” • Further observations: – People who owned stock had shorter paths to the stockbroker than random people: 5.4 vs. 6.7 – People from the Boston area have even closer paths: 4.4 • On average, you are 6 hops away from anyone in the world! 24 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining

Recommend


More recommend