estimating sizes of social networks via biased sampling
play

Estimating Sizes of Social Networks via Biased Sampling Liran - PowerPoint PPT Presentation

Estimating Sizes of Social Networks via Biased Sampling Liran Katzir, Edo Liberty, and Oren Somekh Yahoo! Labs, Haifa, Israel International World Wide Web Conference, 28th March - 1st April 2011, Hyderabad, India Yahoo! Labs: WWW2011 1 /


  1. Estimating Sizes of Social Networks via Biased Sampling Liran Katzir, Edo Liberty, and Oren Somekh Yahoo! Labs, Haifa, Israel International World Wide Web Conference, 28th March - 1st April 2011, Hyderabad, India Yahoo! Labs: WWW’2011 1 / 20

  2. Social Network size estimation Goal: Obtaining estimates for sizes of (sub)populations in social network. Why: Advertisement - estimate of market share. Business development - merger/acquisition or asset valuation. Yahoo! Labs: WWW’2011 2 / 20

  3. The Problem Difficulties: Social network have become pretty big: Facebook (650,000,000) Qzone (200,000,000) Twitter (175,000,000) ... No public API for population size queries. What is the total number of registered users? What is the number of registered (self-declared) 20–30 year olds living in New-York? Even if a public API is provided an independent estimate is needed. Exhaustive crawl is time/space/communication intensive and violates “politeness”. Yahoo! Labs: WWW’2011 3 / 20

  4. Population size estimation Population sizes can be estimated efficiently using the “birthday paradox”. The “birthday paradox”: Given r uniform samples from a set of n elements, the expected number of collisions is r ( r − 1) . 2 n A collision is a pair of identical samples. Example: Samples: X = ( d , b , b , a , b , e ). Total 3 collisions, ( x 2 , x 3 ), ( x 2 , x 5 ), and ( x 3 , x 5 ). Yahoo! Labs: WWW’2011 4 / 20

  5. Population size estimation Using the birthday paradox inversely: When observing C collisions the pouplation can be estimated by ⇒ n ≃ r 2 2 C If r = const · √ n this gives a rather good estimator. Similar to mark-and-recapture which counts collisions between two sample sets (but is essentially equivalent). Newer version of mark-and-recapture also handles non-uniform but a-priory known distributions [Chao, 1987]. Social network size estimation [Ye and Wu, 2010] Alas, we cannot sample users uniformly from most social networks... Yahoo! Labs: WWW’2011 5 / 20

  6. Uniform distribution on graphs Social networks can be viewed as an undirected graph which we can traverse using their public APIs. Special random walks can generate close to uniform sampling: 1 Bipartite Query-Web page graph [Bharat and Broder, 1998] [Bar-Yossef and Gurevich, 2007]. 2 Social network [Gjoka et al, 2010]. Uses only r = const √ n samples, but obtaining each sample might be hard. Yahoo! Labs: WWW’2011 6 / 20

  7. Graph size estimation It is possible to estimate the size of some graphs directly. 1 Estimate the size of a tree [Knuth, 1974]. 2 Estimate the size of a directed acyclic graph [Pitt, 1987]. We give an estimator for the size of undirected graphs (and sub graphs) which: 1 Counts collisions but uses the graph’s stationary distribution. (does not require a uniform sample) 2 Requires asymptotically less than √ n samples to converge. 3 Obtains samples efficiently. (provable small number of random walk steps.) Yahoo! Labs: WWW’2011 7 / 20

  8. Assumptions The graph can be traversed from nodes to neighboring nodes. We can perform a random walk the graph: start at any node In each step, proceed to one of the neighbors uniformly at random. Yahoo! Labs: WWW’2011 8 / 20

  9. Facts about random walks This random walk yields the stationary distribution. 1 The probability to get the i ’th node is d i D . 2 d i – i ’th node’s degree. 3 D = � n i =1 d i . taking a few steps/several walks ensures independence between two consecutive samples. Yahoo! Labs: WWW’2011 9 / 20

  10. Algorithm Outline 1 Sample r users using random walk. 2 C – the number of collisions. 3 Ψ 1 – the sum of the sampled nodes’ degrees. 4 Ψ − 1 – the sum of the inverse sampled nodes’ degrees. The estimated number of nodes: n = Ψ 1 Ψ − 1 ˆ . 2 C Yahoo! Labs: WWW’2011 10 / 20

  11. Example Sampling process: Sampled Nodes: Sampled Node Degree: C: Ψ 1 : Ψ − 1 : n : ˆ Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  12. Example Sampling process: Sampled Nodes: Sampled Node Degree: C: Ψ 1 : Ψ − 1 : n : ˆ Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  13. Example Sampling process: Sampled Nodes: Sampled Node Degree: C: Ψ 1 : Ψ − 1 : n : ˆ Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  14. Example Sampling process: Sampled Nodes: Sampled Node Degree: C: Ψ 1 : Ψ − 1 : n : ˆ Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  15. Example Sampling process: Sampled Nodes: Sampled Node Degree: C: Ψ 1 : Ψ − 1 : n : ˆ Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  16. Example Sampling process: Sampled Nodes: Sampled Node Degree: C: Ψ 1 : Ψ − 1 : n : ˆ Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  17. Example Sampling process: Sampled Nodes: d Sampled Node Degree: 3 C: 0 Ψ 1 : 3 Ψ − 1 : 1/3 n : ˆ – Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  18. Example Sampling process: Sampled Nodes: d Sampled Node Degree: 3 C: 0 Ψ 1 : 3 Ψ − 1 : 1/3 n : ˆ – Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  19. Example Sampling process: Sampled Nodes: d Sampled Node Degree: 3 C: 0 Ψ 1 : 3 Ψ − 1 : 1/3 n : ˆ – Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  20. Example Sampling process: Sampled Nodes: d Sampled Node Degree: 3 C: 0 Ψ 1 : 3 Ψ − 1 : 1/3 n : ˆ – Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  21. Example Sampling process: Sampled Nodes: d Sampled Node Degree: 3 C: 0 Ψ 1 : 3 Ψ − 1 : 1/3 n : ˆ – Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  22. Example Sampling process: Sampled Nodes: d Sampled Node Degree: 3 C: 0 Ψ 1 : 3 Ψ − 1 : 1/3 n : ˆ – Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  23. Example Sampling process: Sampled Nodes: d f Sampled Node Degree: 3 2 C: 0 0 Ψ 1 : 3 5 Ψ − 1 : 1/3 5/6 n : ˆ – – Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  24. Example Sampling process: Sampled Nodes: d f Sampled Node Degree: 3 2 C: 0 0 Ψ 1 : 3 5 Ψ − 1 : 1/3 5/6 n : ˆ – – Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  25. Example Sampling process: Sampled Nodes: d f Sampled Node Degree: 3 2 C: 0 0 Ψ 1 : 3 5 Ψ − 1 : 1/3 5/6 n : ˆ – – Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  26. Example Sampling process: Sampled Nodes: d f Sampled Node Degree: 3 2 C: 0 0 Ψ 1 : 3 5 Ψ − 1 : 1/3 5/6 n : ˆ – – Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  27. Example Sampling process: Sampled Nodes: d f Sampled Node Degree: 3 2 C: 0 0 Ψ 1 : 3 5 Ψ − 1 : 1/3 5/6 n : ˆ – – Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  28. Example Sampling process: Sampled Nodes: d f Sampled Node Degree: 3 2 C: 0 0 Ψ 1 : 3 5 Ψ − 1 : 1/3 5/6 n : ˆ – – Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  29. Example Sampling process: Sampled Nodes: d f f Sampled Node Degree: 3 2 2 C: 0 0 1 Ψ 1 : 3 5 7 Ψ − 1 : 1/3 5/6 16/12 n : ˆ – – 4 Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  30. Example Sampling process: Sampled Nodes: d f f Sampled Node Degree: 3 2 2 C: 0 0 1 Ψ 1 : 3 5 7 Ψ − 1 : 1/3 5/6 16/12 n : ˆ – – 4 Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  31. Example Sampling process: Sampled Nodes: d f f Sampled Node Degree: 3 2 2 C: 0 0 1 Ψ 1 : 3 5 7 Ψ − 1 : 1/3 5/6 16/12 n : ˆ – – 4 Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  32. Example Sampling process: Sampled Nodes: d f f Sampled Node Degree: 3 2 2 C: 0 0 1 Ψ 1 : 3 5 7 Ψ − 1 : 1/3 5/6 16/12 n : ˆ – – 4 Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  33. Example Sampling process: Sampled Nodes: d f f Sampled Node Degree: 3 2 2 C: 0 0 1 Ψ 1 : 3 5 7 Ψ − 1 : 1/3 5/6 16/12 n : ˆ – – 4 Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  34. Example Sampling process: Sampled Nodes: d f f Sampled Node Degree: 3 2 2 C: 0 0 1 Ψ 1 : 3 5 7 Ψ − 1 : 1/3 5/6 16/12 n : ˆ – – 4 Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  35. Example Sampling process: Sampled Nodes: d f f c Sampled Node Degree: 3 2 2 4 C: 0 0 1 1 Ψ 1 : 3 5 7 11 Ψ − 1 : 1/3 5/6 16/12 19/12 n : ˆ – – 4 8 Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  36. Example Sampling process: Sampled Nodes: d f f c Sampled Node Degree: 3 2 2 4 C: 0 0 1 1 Ψ 1 : 3 5 7 11 Ψ − 1 : 1/3 5/6 16/12 19/12 n : ˆ – – 4 8 Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  37. Example Sampling process: Sampled Nodes: d f f c Sampled Node Degree: 3 2 2 4 C: 0 0 1 1 Ψ 1 : 3 5 7 11 Ψ − 1 : 1/3 5/6 16/12 19/12 n : ˆ – – 4 8 Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

  38. Example Sampling process: Sampled Nodes: d f f c Sampled Node Degree: 3 2 2 4 C: 0 0 1 1 Ψ 1 : 3 5 7 11 Ψ − 1 : 1/3 5/6 16/12 19/12 n : ˆ – – 4 8 Input social network graph: Yahoo! Labs: WWW’2011 11 / 20

Recommend


More recommend