http cs224w stanford edu course website course website
play

http://cs224w.stanford.edu Course website: Course website: - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University , y http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu Slides will be available online Reading material will be


  1. CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University , y http://cs224w.stanford.edu

  2.  Course website:  Course website: http://cs224w.stanford.edu  Slides will be available online  Reading material will be posted online:  Chapters from the book from Jon Kleinberg and David Easley from Cornell  Whole book is available at: http://www.cs.cornell.edu/home/kleinber/networks ‐ book htt // ll d /h /kl i b / t k b k 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 2

  3.  Contact (buddy) list C t t (b dd ) li t  Messaging window 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 3

  4.  Observe social and communication  Observe social and communication phenomena at a planetary scale  Largest social network analyzed to date  Largest social network analyzed to date Questions:  What is the structure of the communication network ? 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 4

  5.  Data for June 2006 D t f J 2006  Log size: 150Gb/day (compressed) 150Gb/day (compressed)  Total: 1 month of communication data: 4.5Tb of compressed data  Activity over June 2006 (30 days)  245 million users logged in  180 million users engaged in conversations  17,5 million new accounts activated  More than 30 billion conversations  More than 30 billion conversations  More than 255 billion exchanged messages 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 5

  6. Activity on a typical day (June 1 2006): Activity on a typical day (June 1 2006):  1 billion conversations  93 million users login  93 million users login  65 million different users talk (exchange messages) messages)  1.5 million invitations for new accounts sent 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 6

  7. Fraction of country’s population on MSN: • Iceland: 35% • Spain: 28% • Netherlands, Canada Sweden Canada, Sweden, Norway: 26% • France, UK: 18% • USA, Brazil: 8% 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 7

  8. Buddy Conversation 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 8

  9.  Buddy graph  Buddy graph  240 million people (people that login in June ’06)  9 1 billi  9.1 billion buddy edges (friendship links) b dd d (f i d hi li k )  Communication graph (take only 2 ‐ user conversations) conversations)  Edge if the users exchanged at least 1 message  180 million people 180 illi l  1.3 billion edges  30 billion conversations 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9

  10. 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 10

  11.  Remove nodes (in some order) and observe  Remove nodes (in some order) and observe how network falls apart:  Number of edges deleted  Number of edges deleted  Size of largest connected component O d Order nodes by: d b  Number of links  Total conversations  Total conversations  Total conv. Duration  Messages/conversation g /  Avg. sent, avg. duration 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 11

  12. 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 12

  13. Origins of a small ‐ world idea: Origins of a small world idea:  Bacon number:  Create a network of Hollywood actors  Connect two actors if they co ‐ appeared in the movie  Bacon number: number of steps to B b b f t t Kevin Bacon  As of Dec 2007, the highest (finite) , g ( ) Bacon number reported is 8  Only approx. 12% of all actors cannot be linked to Bacon t b li k d t B 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 13

  14. Erdos numbers are small Hollywood and science are small ‐ worlds 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 14

  15. 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 15

  16.  What is the typical shortest path What is the typical shortest path length between any two people?  Experiment on the global friendship network  Can’t measure, need to probe explicitly  The Small ‐ world experiment [Stanley  The Small world experiment [Stanley Milgram ’67]  Picked 300 people at random p p Stanley Milgram St l Mil  Ask them to get a letter to a by passing it through friends to a stockbroker in Boston Boston  How many steps does it take? 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 16

  17. Milgram’s small world experiment  64 chains completed:  64 chains completed:  6.2 on the average, thus “6 degrees of separation” 6 degrees of separation  Further observations:  Further observations:  People what owned stock had shortest paths to the stockbroker than had shortest paths to the stockbroker than random people: 5.4 vs. 5.7  People from the Boston area have even closer People from the Boston area have even closer paths: 4.4 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 17

  18. Hops Nodes 0 1 1 10 2 2 78 78 3 3,96 4 8,648 MSN Messenger network 5 3,299,252 6 6 28 395 849 28,395,849 7 79,059,497 Number of steps 8 52,995,778 between pairs of 9 10,321,008 people people 10 1,955,007 11 518,410 12 149,945 13 44,616 14 13,740 15 4,476 16 1,542 17 536 18 167 19 71 20 29 21 16 A Avg. path length 6.6 th l th 6 6 22 10 23 3 90% of the people can be reached in < 8 hops 24 2 25 3 9/22/2010 18

  19.  People use different networks: Boston vs. occupation  Criticism:  Funneling: Funneling:  31 of 64 chains passed through 1 of 3 people ass their final step  Not all links/nodes are equal  Choice of starting points and the target were non ‐ random  Choice of starting points and the target were non ‐ random  People refuse to participate (25% for Milgram)  Some sort of social search: People in the experiment follow some strategy (e.g., geographic routing) instead of forwarding the letter to everyone. They are not finding the shortest path.  There are not many samples.  People might have used extra information resources. 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 19

  20.  What is the structure of a social network?  What is the structure of a social network?  How people behave in those networks and which mechanisms do they use to route and which mechanisms do they use to route and find information? 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 20

  21. [Dodds ‐ Muhamad ‐ Watts, ’03]  In 2003 Dodds Muhamad and Watts  In 2003 Dodds, Muhamad and Watts performed the experiment using email:  18 targets of various backgrounds  18 targets of various backgrounds  24,000 first steps (~1,500 per target)  65% dropout per step 65% d t t  384 chains completed (1.5%) Avg. chain length = 4.01 PROBLEM: Huge drop ‐ out rate, i.e., longer chains are less likely to complete longer chains are less likely to complete Chain length, L 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 21

  22.  Huge drop ‐ out rate: Huge drop out rate:  Longer chains don’t complete Correction proposed by Harrison ‐ White. Let:  f j = true (unobserved) fraction of chains that would f ( b d) f i f h i h ld have length j  N = total # of starters  N j = # starters who reached target in j steps * := N j /N  Then: f j  Assume drop out rate 1  in each step so f * : f  j  Assume drop ‐ out rate 1 ‐  in each step, so f j := f j  j   j f j =1   j f j *  j =1 * , calculate the average dropout rate 1 ‐   Observe f j f j , g p and 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 22

  23.  After the correction: After the correction:  Typical path length L=7  Some not well understood S t ll d t d phenomena in social networks:  Funneling effect: some target’s friends  Funneling effect: some target s friends are more likely to be the final step.  Conjecture: High reputation/authority  Effects of target’s characteristics: structurally why are high ‐ status target easier to find g  Conjecture: Core ‐ periphery net structure 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 23

  24. • N… # people assigned to correspond to target • N c …# completed chains chains • r… frac. of people who did not forward • L… mean path length 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 24

  25.  Assume each human is connected to 100 other  Assume each human is connected to 100 other people:  So: So:  In step 1 she can reach 100 people  In step 2 she can reach 100*100 = 10,000 people In step 2 she can reach 100 100 10,000 people  In step 3 she can reach 100*100*100 = 100,000 people  In 5 steps she can reach 10 billion people p p p  What’s wrong here?  Many edges are local (“short”): friend of a friend 9/22/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 25

Recommend


More recommend