applications of mining
play

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS Yizhou - PowerPoint PPT Presentation

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS Yizhou Sun College of Computer and Information Science Northeastern University yzsun@ccs.neu.edu July 25, 2015 Heterogeneous Information Networks Multiple object types and/or


  1. APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS Yizhou Sun College of Computer and Information Science Northeastern University yzsun@ccs.neu.edu July 25, 2015

  2. Heterogeneous Information Networks • Multiple object types and/or multiple link types Movie Studio Director Actor Venue Paper Author Movie DBLP Bibliographic Network The IMDB Movie Network The Facebook Network 1. Homogeneous networks are Information loss projection of heterogeneous networks! 2. New problems are emerging in heterogeneous networks! Directly Mining information richer heterogeneous networks 1

  3. Outline • Why Heterogeneous Information Networks? • Entity Recommendation • Information Diffusion • Ideology Detection • Summary 2

  4. Recommendation Paradigm feedback user community user- item feedback recommender system recommendation product features Collaborative Filtering Content-Based Methods Hybrid Methods E.g., K-Nearest Neighbor (Sarwar WWW’01) , Matrix E.g., (Balabanovic Comm. ACM’ 97, Zhang SIGIR’02) E.g., Content-Based CF (Antonopoulus , IS’06) , Factorization (Hu ICDM’08, Koren IEEE- CS’09) , External Knowledge CF (Ma WSDM’11) Probabilistic Model (Hofmann SIGIR’03) external knowledge 3

  5. Problem Definition feedback user implicit user feedback recommender system recommendation hybrid collaborative filtering with information networks information network 4

  6. Hybrid Collaborative Filtering with Networks • Utilizing network relationship information can enhance the recommendation quality • However, most of the previous studies only use single type of relationship between users or items (e.g., social network Ma,WSDM’ 11 , trust relationship Ester, KDD’ 10 , service membership Yuan, RecSys’ 11 ) 5

  7. The Heterogeneous Information Network View of Recommender System Revolution Avatar Titanic Aliens -ary Road James Romance Cameron Zoe Leonardo Kate Adventure Saldana Dicaprio Winslet 6

  8. Relationship Heterogeneity Alleviates Data Sparsity Collaborative filtering methods suffer from data sparsity issue # of ratings A small number Most users and items have of users and items a small number of ratings have a large number of ratings # of users or items • Heterogeneous relationships complement each other • Users and items with limited feedback can be connected to the network by different types of paths • Connect new users or items (cold start) in the information network 7

  9. Relationship Heterogeneity Based Personalized Recommendation Models Different users may have different behaviors or preferences Two levels of personalization Data level James Cameron fan • Most recommendation methods use Aliens one model for all users and rely on personal feedback to achieve 80s Sci-fi fan personalization Model level Sigourney Weaver fan • With different entity relationships, we can learn personalized models for Different users may be interested in the same different users to further distinguish movie for different reasons their differences 8

  10. Preference Propagation-Based Latent Features genre: drama King Kong Bob Naomi Watts Charlie tag: Oscar Nomination Ralph Fiennes Alice Titanic skyfall revolutionary Kate Winslet Sam Mendes road Calculate latent- Generate L different Propagate user features for users meta-path (path th typ ypes) es) implicit feedback and items for each connecting users along each meta- meta-path with NMF and items path related method 9

  11. Recommendation Models Observation 1 : Different meta-paths may have different importance Global Recommendation Model features for user i and item j ranking score (1) the q-th meta-path Observation 2 : Different users may require different models Personalized Recommendation Model user-cluster similarity L (2) c total soft user clusters 10

  12. Parameter Estimation • Bayesian personalized ranking (Rendle UAI’ 09) • Objective function sigmoid function min (3) Θ for each correctly ranked item pair i.e., 𝑣 𝑗 gave feedback to 𝑓 𝑏 but not 𝑓 𝑐 Generate For each user Soft cluster users personalized model cluster, learn one with NMF + k-means for each user on the model with Eq. (3) fly with Eq. (2) Learning Personalized Recommendation Model 11

  13. Experiment Setup • Datasets • Comparison methods: • Popularity: recommend the most popular items to users • Co-click: conditional probabilities between items • NMF: non-negative matrix factorization on user feedback • Hybrid-SVM: use Rank-SVM with plain features (utilize both user feedback and information network) 12

  14. Performance Comparison HeteRec personalized recommendation (HeteRec-p) provides the best recommendation results 13

  15. Performance under Different Scenarios p p user HeteRec – p consistently outperform other methods in different scenarios better recommendation results if users provide more feedback better recommendation for users who like less popular items 14

  16. Entity Recommendation in Information Contributions Networks with Implicit User Feedback (RecSys’13, WSDM’14a) • Propose latent representations for users and items by propagating user preferences along different meta-paths • Employ Bayesian ranking optimization technique to correctly evaluate recommendation models • Further improve recommendation quality by considering user differences at model level and define personalized recommendation models • Two levels of personalization 15

  17. Outline • Why Heterogeneous Information Networks? • Entity Recommendation • Information Diffusion • Ideology Detection • Summary 16

  18. Information Diffusion in Networks • Action of a node is triggered by the actions of their neighbors 17

  19. Linear Threshold Model • [Granovetter, 1978] • If the weighted activation number of its neighbors is bigger than a pre-specified threshold 𝜄 𝑣 , the node u is going to be activated • In other words • 𝑞 𝑣 (𝑢 + 1) = 𝐹[1 𝑤∈Γ 𝑣 𝑥 𝑤,𝑣 𝜀 𝑣, 𝑢 > 𝜄 𝑣 ] 18

  20. Heterogeneous Bibliographic Network • Multiple types of objects • Multiple types of links 19

  21. Derived Multi-Relational Bibliographic Network • Collaboration: Author-Paper-Author • Citation: Author-Paper->Paper-Author • Sharing Co-authors: Author-Paper-Author-Paper-Author • Co-attending venues: Author-Paper-Venue-Paper-Author How to generate these meta-paths ? PathSim : Sun et.al, VLDB’11 20

  22. How Topics Are Propagated among Authors? • To Apply Existing approaches • Select one relation between authors (say, A-P-A) • Use all the relations, but ignore the relation types • Do different relation types play different roles? • Need new models! 21

  23. Two Assumptions for Topic Diffusion in Multi- Relational Networks • Assumption 1: Relation independent diffusion Model-level aggregation 22

  24. • Assumption 2: Relation interdependent diffusion Relation-level aggregation 23

  25. Two Models under the Two Assumptions • Two multi-relational linear threshold models • Model 1: MLTM-M • Model-level aggregation • Model 2: MLTM-R • Relation-level aggregation 24

  26. MLTM-M • For each relation type k • The activation probability for object i at time t+1: • The collective model • The final activation probability for object i is an aggregation over all relation types 25

  27. Properties of MLTM-M 26

  28. MLTM-R • Aggregate multi-relational network with different weights • Treat the activation as in a single-relational network • To make sure the activation probability non-negative, weights 𝛾 ′ 𝑡 are required non-negative 27

  29. Properties of MLTM-R 28

  30. How to Evaluate the Two Models? • Test on the real action log on multiple topics! • 𝐵𝑑𝑢𝑗𝑝𝑜 𝑚𝑝𝑕: {< 𝑣 𝑗 , 𝑢 𝑗 > } • Diffusion model learning from action log • MLE estimation over 𝛾 ′ 𝑡 29

  31. Two Real Datasets • DBLP • Computer Science • Relation types • APA, AP->PA, APAPA, APVPA • APS • Physics • Relation types • APA, AP->PA, APAPA, APOPA 30

  32. Topics Selected • Select topics with increasing trends 31

  33. Evaluation Methods • Global Prediction • How many authors are activated at t+1 • Error rate = ½(predicted#/true# + true#/predicted#)-1 • Local Prediction • Which author is likely to be activated at t+1 • AUPR (Area under Precision-Recall Curve) 32

  34. Global Prediction 33

  35. Local Prediction - AUPR • 1: Different Relation Play Different Roles in Diffusion Process • 2: Relation-Level Aggregation is better than Model- Level Aggregation 34

  36. Case Study 35

  37. Prediction Results on “social network” Diffusion 36

  38. 37

  39. WIN! 38

  40. Outline • Why Heterogeneous Information Networks? • Entity Recommendation • Information Diffusion • Ideology Detection • Summary 39

  41. • Topic-Factorized Ideal Point Estimation Model for Legislative Voting Network (KDD’14, Gu, Sun et al.) 40

  42. Background Federal The House Legislation Law Senate (bill) …… Bill 1Bill 2 United Stated Congress Ronald Paul Barack Obama The House Senate Politician Ronald Paul Republican Democrat Barack Obama liberal conservative 41

  43. Legislative Voting Network 42

Recommend


More recommend