on network analysis and user behavior
play

On network analysis and user behavior Ramayya Krishnan iLab, The H. - PowerPoint PPT Presentation

On network analysis and user behavior Ramayya Krishnan iLab, The H. John Heinz III College Carnegie Mellon University Pittsburgh, PA rk2x@cmu.edu Outline Two examples Intra-organizational KM the role of triadic closure or cliques


  1. On network analysis and user behavior Ramayya Krishnan iLab, The H. John Heinz III College Carnegie Mellon University Pittsburgh, PA rk2x@cmu.edu

  2. Outline • Two examples – Intra-organizational KM – the role of triadic closure or cliques in determining user behavior – Product adoption – the role of social influence vs. homophily • Key points – Multi-disciplinary perspective that blends computational and social science is needed – New estimation methods to work with novel data sets – Need for new methods to design and conduct experiments in a networked world

  3. Example 1: Social Media and Knowledge Management in a Global Organization

  4. Sample data posting of query and responses

  5. Sample Query • Query on: Singleton class and threads in Java • Responses: 1. Singleton class means that any given time only one instance of the class is present, in one JVM. So, it is present at JVM level. 2. The thing is if two users(on two different machines which has separate JVMs) are requesting for singleton class then both can get one-one instance of that class in their JVM.

  6. Data description • Message level and thread-level data from forum • Message characteristics – Posting time, EmployeeID, Thread, Type of message (query or response), content of message etc. • User characteristics – EmployeeID, Tenure at firm, Age, Gender, Location, Division, Job Title

  7. Network structure evolution Sequence of Actions:  User 301 posts a 301 641 query Q1000  Users 502, 641 post responses  User 900 posts a 502 900 query Q1001  Users 301, 641 post responses Directed Response Graph

  8. Network structure Asymmetric tie: • A as responded to B’s query but B has not responded to A Sole-symmetric tie: • Users have responded to each other, but not as part of a clique Simmelian Tie: • Users are part of a ‘clique’, whose members have all responded to one another

  9. Simmelian Ties Research Questions 1. Can Simmelian ties be established in an electronic communications medium with repeated interactions? Will they matter? 2. Do these ties depend upon the context? Do more instrumental contexts result in weaker Simmelian ties or less effective Simmelian ties? 3. Do both current context (what type of query) or past context in which the tie was established matter?

  10. Dyadic QAP Regression Results Dependent variable: Number of response by A to B in period two

  11. Dyadic QAP Regression Results Dependent variable: Number of response by A to B in period two Explanatory Variables: Dyadic Homophily Measures, Structural Properties in period one

  12. Dyadic QAP Regression Results Dependent variable: Number of response by A to B in period two Explanatory Variables: Dyadic Homophily Measures, Structural Properties in period one

  13. Dyadic QAP Regression Results Dependent variable: Number of response by A to B in period two Explanatory Variables: Dyadic Homophily Measures, Structural Properties in period one

  14. Dyadic QAP Regression Results Dependent variable: Number of response by A to B in period two Explanatory Variables: Dyadic Homophily Measures, Structural Properties in period one

  15. Dyadic QAP Regression Results Dependent variable: Number of response by A to B in period two Explanatory Variables: Dyadic Homophily Measures, Structural Properties in period one

  16. Example 2: Social Influence vs. Homophily in product/service adoption • Focus on identifying users that can help diffuse “information” over the network • Learn about the power of “social influence” as trigger for the diffusion process • Learn about how social influence is associated to “contagious churn”

  17. Research Question  Can we predict consumers’ product purchase decisions…  Using social network information? 17

  18. Theoretical Foundation  Homophily (Mcpherson et al. 2001)  “Birds of a feather flock together” Looks good Like this? Looks good

  19. The Challenge  Large-scale network Adam I like it Bob ? No, I don’t Chris 19

  20. Literature  A rich literature on networks from various fields (e.g. Kleinberg 1999, Brin and Page 1998)  Network-based marketing  Network Neighbors: Hill, Provost, Volinsky (2006)  Viral Marketing: Richardson and Domingos (2002)  Classification: Macskassy and Provost (2003, 2007)  What about unobserved product taste ?  For small, tightly connected groups: Hartmann (2010)  But what about large-scale networks of arbitrary connection structure? 20

  21. This Study  Model correlated purchase behaviors of consumers in a large social network…  Using Gaussian Markov Random Field (GMRF) to characterize latent product taste  Handle networks of arbitrary topology  Encapsulate conditional independence  Estimation result confirms the positive taste correlation among connected people  Predictive performance better than existing LR based models, and better than SVM based models, too. 21

  22. Data  Obtained from a large Asian telecom company  231,416 customers  6 month period  Detailed phone call data  Who called whom, when  Demographics information: gender, age  Purchase records of caller ringback tone (CRBT)  Who purchased what, when  Can we predict CRBT adoption decisions? 22

  23. Descriptive Statistics Mean SD Min Max Gender Male 218017 Female 13399 Age 40.56 13.67 Number of Consumers Called by Each Consumer 13.73 22.9 1 2858 Number of Phone Calls Per Consumer 410.4 942.7 1 59016 Adoption Number Percentage Number of Consumers 231416 Number of Consumers Who Adopted CRBT 79505 34.36% Adoption Percentage by Gender Male 34.50% Female 31.89% Preliminary analysis: gender doesn’t help much in prediction… 23

  24. Data – Preliminary Analysis Age doesn’t help much, either… Adoption By Age 80000 0.45 0.4 70000 Number of Consumers Adoption Percentage 0.35 60000 0.3 50000 0.25 40000 0.2 30000 0.15 20000 0.1 10000 0.05 0 0 <20 20-29 30-39 40-49 50-59 >=60 Age Number of Consumers Adoption Percentage 24

  25. Data – Preliminary Analysis Node degree helps a lot (need for social network)! Consumer Adoptions By Degree 1000000 0.7 0.6 Number of Consumers 100000 Adoption Percentage 0.5 10000 0.4 1000 0.3 100 0.2 10 0.1 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90+ Degree Number of Consumers Adoption Percentage 25

  26. Data – Preliminary Analysis Can we do better? B A C D Non-Adopter Adopter Maybe, but need the discipline of a model 26

  27. Model There are I consumers in a social network C  Connection matrix: [ ij ] c  1 if consumers and are connected i j   c ij  0 otherwise  1 if consumers adopts the product i  Adoption decision:  D i  0 otherwise 27

  28. Adoption Probability Binary Probit Model    Pr( 1 ) Pr( 0 ) D U i i       U X i i i i  ~ N ( 0 , 1 ) Random disturbance i Observed individual characteristic X i (gender, age, connection degree)  Unobserved product taste i Modeled as a GMRF! 28

  29. Gaussian Markov Random Field (GMRF)   T Definition (GMRF) : A random vector is called GMRF w.r.t. the undirected ( 1 ,... ) x x x n  and precision matrix   with mean   graph if and only if its ( { 1 .. }, ) 0 G V n E Q density has the form:  1              / 2 1 / 2 n T ( ) ( 2 ) | | exp( ( ) ( )) x Q x Q x 2 And     0 { , } , , Q ij i j E i j  A multivariate normal vector  Connection structure encoded in its precision matrix  Non-zero off-diagonal elements correspond to connections 29

  30. Properties of GMRF  Can model connections of arbitrary topology  Better than using in-group correlation  Encodes conditional independence     | 0 , , x x x Q i j  i j ij ij e.g. 1 2 3 Consumers 1 and 3 should be correlated But conditional on consumer 2, they should be independent  Model parameters have intuitive explanations 30

  31. Model Latent Product Taste Using GMRF           1       1   ... ~ ( ... , ) N Q [ ] , where 0 if 0 Q q q c ij ij ij             I    Precision ( |  ) Straightforward Interpretation : q i i ii      Cor ( , | ) q / q q  i j ij ij ii jj Parameterization (base model, model B ):        0 ... r r      Conditional correlation between   r 0 ... 0 r   connected consumers     0 0 ... Q r      Conditional precision  ... ... ... ...          0 ... r r 31

Recommend


More recommend