scaling choice models of relational social data
play

Scaling choice models of relational social data Jan Overgoor - PowerPoint PPT Presentation

Scaling choice models of relational social data Jan Overgoor Stanford University SIAM-NS July 09, 2020 Slides: bit.ly/c2g-venmo Joint work with George Pakapol Supaniratisai (Stanford) & Johan Ugander (Stanford) Events on networks


  1. Scaling choice models of relational social data Jan Overgoor · Stanford University SIAM-NS July 09, 2020 Slides: bit.ly/c2g-venmo Joint work with George Pakapol Supaniratisai (Stanford) & Johan Ugander (Stanford)

  2. Events on networks

  3. Observed data

  4. "Choosing to Grow a Graph" [Overgoor, Benson & Ugander, WWW’19] • Model edges as choices • Conditional on i initiating an edge, which j to pick from choice set C ? • Conditional Logit model:

  5. Conditional Logit choice process

  6. "Choosing to Grow a Graph" [Overgoor, Benson & Ugander, WWW’19] • Generalizes multiple known formation models and dynamics preferential attachment, local search, fitness, homophily, … • Efficient maximum likelihood estimation of model parameters, existing tools

  7. "Choosing to Grow a Graph" [Overgoor, Benson & Ugander, WWW’19] • Generalizes multiple known formation models and dynamics preferential attachment, local search, fitness, homophily, … • Efficient maximum likelihood estimation of model parameters, existing tools • Straightforward extension to events

  8. Two problems at scale 1. Estimation on large networks infeasible as n options for all m choices - features change at each event

  9. Two problems at scale 1. Estimation on large networks infeasible as n options for all m choices 2. Conditional logit model class less realistic - availability assumption of complete information ● ● ● ● ● ● ● ● ● ●

  10. Solution to Problem #1 – Negative sampling • Sample non-chosen alternatives and do estimation on the reduced choice set also called case-control sampling (see Vu 2015, Lerner 2019) • Update likelihood with sampling probabilities of data points: • Estimates on data with reduced choice sets generated with importance sampling are consistent for the estimates using complete choice sets. [McFadden 1977]

  11. Negative sampling strategies ● ● Uniform sampling ● + no adjustment necessary, weights cancel out − inefficient for rare (but important) features ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

  12. Negative sampling strategies ● ● Uniform sampling ● + no adjustment necessary, weights cancel out − inefficient for rare (but important) features ● ● ● ● ● ● Stratified sampling sample according to strata, adjust with ● ● ● ● ● ● ● ● ● ●

  13. Negative sampling strategies ● ● Uniform sampling ● + no adjustment necessary, weights cancel out − inefficient for rare (but important) features ● ● ● ● ● ● Stratified sampling sample according to strata, adjust with ● ● ● ● Importance sampling sample according to likelihood of being chosen ● ● ● − optimal weights are what we’re trying to estimate ● ● ●

  14. Sampling with synthetic data • Simulate 160k events with 5k nodes n Constant • Utility function with popularity, ● ● Uniform 1.00 repetition, reciprocity, and FoFs ● Importance ● ● • Estimate known parameter values ● MSE ● 0.10 ● ● ● ● ● 0.01 ● ● • Samples n constant at 10k, vary s ● ● ● ● ● ● • Stratification requires factors less 3 6 12 24 48 96 192 384 768 Number of samples (s) negative samples for comparable MSE

  15. Run time is linear in n and s 1000 Runtime Number of samples (s) (sec) 100 10 3 10 1 10 10 2 10 3 10 4 10 5 Number of data points (n)

  16. Sampling with synthetic data • Simulate 160k events with 5k nodes n*s Constant • Utility function with popularity, .300 repetition, reciprocity, and FoFs ● ● ● .100 ● ● ● ● ● ● ● • Estimate known parameter values ● MSE .030 ● .010 ● ● • Value of n and s at constant n*s budget ● .003 ● Uniform ● ● ● ● Importance • More choice samples ( n ) is better, but 3 6 12 24 48 96 192 384 768 Number of samples (s) diminishing returns below s = 24

  17. Back to problem #2 2. Conditional logit model class less realistic ● ● ● ● ● ● ● ● ● ●

  18. Mixed Logit • Combines multiple latent logits • Each ”mode” has it’s own utility function and choice set for example: social neighborhood Problems: • Log-likelihood not convex in general, need much slower EM • No sampling guarantees

  19. Solution to Problem #2 – De-mixed logit • Simplify: assume that each mode has a disjoint choice set • Reduces to m individual conditional logits, simple to estimate • The chosen item indicates the mode Friends FoFs Rest

  20. De-mixed logit choice process chooser neighborhood

  21. De-mixing with synthetic data • Simulate 80k events with 5k nodes • ”local” and “rest” mode with different utility functions = 0.75

  22. De-mixing with synthetic data • Simulate 80k events with 5k nodes log Degree • ”local” and “rest” mode with different 1.00 utility functions = 0.75 0.75 ● ● ● ● ● ● ● ● ● ● CL Estimates ● ● ● ● 0.50 • Conditional logit 0.25 • Estimates in between the two modes ● Uniform ● Importance (true values are 0.5 and 1.0) 0.00 16 32 64 128 256 512 1024 • Importance sampling doesn ’t help accuracy s

  23. De-mixing with synthetic data • Simulate 80k events with 5k nodes Reciprocity (ind) • ”local” and “rest” mode with different 3.00 utility functions = 0.75 ● ● CL Estimates ● 2.00 ● ● ● ● ● ● ● ● ● ● ● • Conditional logit 1.00 • Estimates not stable for different !! ● Uniform ● Importance values of s outside the model class 0.00 16 32 64 128 256 512 1024 s

  24. De-mixing with synthetic data • Simulate 80k events with 5k nodes Reciprocity (ind) • ”local” and “rest” mode with different 3.00 utility functions = 0.75 Demixed ML Estimates 2.00 • De-mixed logit 1.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● • Estimates accurate and stable ● Uniform ● Importance 0.00 16 32 64 128 256 512 1024 s

  25. Venmo Data 3M ● Scraped public transactions Transactions per week ● 25M users and 501M transactions 2M ● 80% transactions are “local” 1M ● Analyze stratified CL and de-mixed CL 2012 2014 2016 2018 Week

  26. Venmo Non-parametric estimates 10 2 ● Easy to test hypotheses over different modes. 10 1.5 Relative Probability 10 1 ● Degree is number of incoming transactions 10 0.5 ● ● Degree is less important 10 0 Local ● within social neighborhood, Non − local ● ● 10 − 0.5 super-linear outside. 0 1 3 10 30 100 300 In − degree

  27. Discussion ● Leverage existing results from sampling and econometrics literatures ● Make feasible to estimate complex models on very large graphs ● Think carefully about limitations of model class Future work ● Theory on “to sample or to negatively sample?” ● Sampling guarantees for mixed logit ● Empirical comparison with similar modeling frameworks (SAOM, REM) ● More applications THANKS! bit.ly/c2g-code overgoor@stanford.edu

Recommend


More recommend