inference in osns via lightweight partial crawls
play

Inference in OSNs via Lightweight Partial Crawls Jithin K. - PowerPoint PPT Presentation

Inference in OSNs via Lightweight Partial Crawls Jithin K. Sreedharan Inria, France Konstantin Avrachenkov Bruno Ribeiro Inria, France Purdue University, USA Sigmetrics 2016, June 16 Motivation Estimation and inference in Online Social


  1. Estimator Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 10

  2. Estimator Key property of tours: Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 10

  3. Estimator Length of 𝑙 th tour True value of the contracted graph Key property of tours: 𝑔 𝑣, 𝑀 ∢= 𝑕(𝑣, 𝑀) Samples in 𝑙 th tour Degree of super-node except when 𝑣 or 𝑀 is 𝑇 π‘œ Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 10

  4. Estimator Length of 𝑙 th tour True value of the contracted graph Key property of tours: 𝑔 𝑣, 𝑀 ∢= 𝑕(𝑣, 𝑀) Samples in 𝑙 th tour Degree of super-node except when 𝑣 or 𝑀 is 𝑇 π‘œ Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 10

  5. Estimator Length of 𝑙 th tour True value of the contracted graph Key property of tours: 𝑔 𝑣, 𝑀 ∢= 𝑕(𝑣, 𝑀) Samples in 𝑙 th tour Degree of super-node except when 𝑣 or 𝑀 is 𝑇 π‘œ Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 10

  6. Estimator Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 11

  7. Estimator  Unbiased (unlike asymptotic in [Ribeiro and Towsley β€˜10]) Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 11

  8. Estimator  Unbiased (unlike asymptotic in [Ribeiro and Towsley β€˜10])  Strongly consistent Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 11

  9. Estimator  Unbiased (unlike asymptotic in [Ribeiro and Towsley β€˜10])  Strongly consistent Confidence interval Sampled variance Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 11

  10. Estimator  Unbiased (unlike asymptotic in [Ribeiro and Towsley β€˜10])  Strongly consistent Confidence interval Sampled variance Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 11

  11. Bayesian formulation Find a posterior probability distribution with suitable prior distribution Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 12

  12. Bayesian formulation (contd.) Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 13

  13. Bayesian formulation (contd.) Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 13

  14. Bayesian formulation (contd.) Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 13

  15. Bayesian formulation (contd.) Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 13

  16. Simulations on real-world networks Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 14

  17. Simulations on real-world networks Dogster network: Online social network for dogs ? Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 14

  18. Simulations on real-world networks: Dogster network Percentage of graph covered: 2.72% (edges), 14.86% (nodes) 415K nodes, 8.27M edges 15

  19. Simulations on real-world networks: Dogster network Percentage of graph covered: 2.72% (edges), 14.86% (nodes) 415K nodes, 8.27M edges Estimated value 15

  20. Simulations on real-world networks: Dogster network Percentage of graph covered: 2.72% (edges), 14.86% (nodes) 415K nodes, 8.27M edges Estimated value 15

  21. Simulations on real-world networks: Dogster network Percentage of graph covered: 2.72% (edges), 14.86% (nodes) 415K nodes, 8.27M edges Estimated value 15

  22. Simulations on real-world networks: Dogster network Percentage of graph covered: 2.72% (edges), 14.86% (nodes) 415K nodes, 8.27M edges Estimated value 15

  23. Simulations on real-world networks: Dogster network Percentage of graph covered: 2.72% (edges), 14.86% (nodes) 415K nodes, 8.27M edges Estimated value 15

  24. Simulations on real-world networks: Dogster network Percentage of graph covered: 2.72% (edges), 14.86% (nodes) 415K nodes, 8.27M edges Estimated value 15

  25. Simulations on real-world networks: Dogster network Percentage of graph covered: 2.72% (edges), 14.86% (nodes) 415K nodes, 8.27M edges Estimated value 15

  26. Simulations on real-world networks: Dogster network Percentage of graph covered: 2.72% (edges), 14.86% (nodes) 415K nodes, 8.27M edges Estimated value 15

  27. Simulations on real-world networks: Dogster network Percentage of graph covered: 2.72% (edges), 14.86% (nodes) 415K nodes, 8.27M edges Estimated value 15

  28. Simulations on real-world networks: Dogster network Percentage of graph covered: 2.72% (edges), 14.86% (nodes) 415K nodes, 8.27M edges Estimated value 15

  29. Simulations on real-world networks: Dogster network Percentage of graph covered: 2.72% (edges), 14.86% (nodes) 415K nodes, 8.27M edges Estimated value 15

  30. Simulations on real-world networks: Dogster network Percentage of graph covered: 2.72% (edges), 14.86% (nodes) 415K nodes, 8.27M edges Estimated value 15

  31. Simulations on real-world networks: Friendster network 64K nodes, 1.25M edges Percentage of graph covered: 7.43% (edges), 18.52% (nodes) Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 16

  32. Simulations on real-world networks: Friendster network 64K nodes, 1.25M edges Percentage of graph covered: 7.43% (edges), 18.52% (nodes) Estimated value Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 16

  33. Simulations on real-world networks: Friendster network 64K nodes, 1.25M edges Percentage of graph covered: 7.43% (edges), 18.52% (nodes) Estimated value Estimated value Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 16

  34. Simulations on real-world networks: ADD Health data A friendship network among high school students in USA 1545 nodes, 4003 edges Percentage of graph covered: 10.87% (edges), 19.76% (nodes) Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 17

  35. Simulations on real-world networks: ADD Health data A friendship network among high school students in USA 1545 nodes, 4003 edges Percentage of graph covered: 10.87% (edges), 19.76% (nodes) Estimated value Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 17

  36. Simulations on real-world networks: ADD Health data A friendship network among high school students in USA 1545 nodes, 4003 edges Percentage of graph covered: 10.87% (edges), 19.76% (nodes) Estimated value Estimated value Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 17

  37. What if the super- node is not that β€œsuper”? Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

  38. What if the super- node is not that β€œsuper”? Adaptive crawler: super-node gets bigger as crawling progresses Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

  39. What if the super- node is not that β€œsuper”? Adaptive crawler: super-node gets bigger as crawling progresses How to add nodes to super-node: Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

  40. What if the super- node is not that β€œsuper”? Adaptive crawler: super-node gets bigger as crawling progresses How to add nodes to super-node:  via any method as long as independent of already observed tours Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

  41. What if the super- node is not that β€œsuper”? Adaptive crawler: super-node gets bigger as crawling progresses How to add nodes to super-node:  via any method as long as independent of already observed tours  Emulates retrospectively adding new node 𝑗 into super-node from the start Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

  42. What if the super- node is not that β€œsuper”? Adaptive crawler: super-node gets bigger as crawling progresses How to add nodes to super-node:  via any method as long as independent of already observed tours  Emulates retrospectively adding new node 𝑗 into super-node from the start  Checks previous tours. Breaks them when 𝑗 is found. Tour 4 Tour 2 Tour 3 Tour 1 Original tour: : node 𝑗 ……. sample 2 sample 1 sample 𝑙 = 𝑇 π‘œ Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

  43. What if the super- node is not that β€œsuper”? Adaptive crawler: super-node gets bigger as crawling progresses How to add nodes to super-node:  via any method as long as independent of already observed tours  Emulates retrospectively adding new node 𝑗 into super-node from the start  Checks previous tours. Breaks them when 𝑗 is found.  Start 𝑙 new tours from newly added node 𝑗 ; k ~ negative Binomial distribution (function of degrees of 𝑗, and no of tours) b a d β€œCorrection” tours from 𝒋 : e h f Start at 𝑗 , end in 𝑗 or 𝑇 4 i l r n m p Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

  44. What if the super- node is not that β€œsuper”? Adaptive crawler: super-node gets bigger as crawling progresses How to add nodes to super-node:  via any method as long as independent of already observed tours  Emulates retrospectively adding new node 𝑗 into super-node from the start  Checks previous tours. Breaks them when 𝑗 is found.  Start 𝑙 new tours from newly added node 𝑗 ; k ~ negative Binomial distribution (function of degrees of 𝑗, and no of tours) b a d β€œCorrection” tours from 𝒋 : e h f Start at 𝑗 , end in 𝑗 or 𝑇 4 i l r n m p Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

  45. What if the super- node is not that β€œsuper”? Adaptive crawler: super-node gets bigger as crawling progresses How to add nodes to super-node:  via any method as long as independent of already observed tours  Emulates retrospectively adding new node 𝑗 into super-node from the start  Checks previous tours. Breaks them when 𝑗 is found.  Start 𝑙 new tours from newly added node 𝑗 ; k ~ negative Binomial distribution (function of degrees of 𝑗, and no of tours) Theorem Dynamic and static super-node sample paths are equivalent in distribution Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 18

  46. From metric 𝜈(𝐻) does network look random ? Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 19

  47. Estimation and hypothesis testing in Chung-Lu or configuration model Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 20

  48. Estimation and hypothesis testing in Chung-Lu or configuration model Assumption: edges labels can be written as a function of node labels Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 20

  49. Estimation and hypothesis testing in Chung-Lu or configuration model Assumption: edges labels can be written as a function of node labels  Does the true value of the given graph belongs to the class of values when the edges are formed purely at random? Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 20

  50. Estimation and hypothesis testing in Chung-Lu or configuration model Assumption: edges labels can be written as a function of node labels  Does the true value of the given graph belongs to the class of values when the edges are formed purely at random? Jithin K. Sreedharan (jithin.sreedharan@inria.fr) 20

Recommend


More recommend