outline
play

Outline Background Conditional Link Model Discriminative Content - PowerPoint PPT Presentation

Tianbao Yang 1 , Rong Jin 1 , Yun Chi 2 , Shenghuo Zhu 2 1 Michigan State University 2 NEC Laboratories America Presenter: April Hua LIU Outline Background Conditional Link Model Discriminative Content Model Optimization Algorithms


  1. Tianbao Yang 1 , Rong Jin 1 , Yun Chi 2 , Shenghuo Zhu 2 1 Michigan State University 2 NEC Laboratories America Presenter: April Hua LIU

  2. Outline  Background  Conditional Link Model  Discriminative Content Model  Optimization Algorithms  Extensions  Experiments  Conclusion

  3. Background  Community detection in network  Community:  Densely connected in links  Common topic in contents  Network data  Links between nodes: e.g. citation between papers  Content describing nodes: e.g. bag-of words for papers

  4. Background(Cont.)  Most work on community detection  Link analysis, but links are sparse and noisy  Content analysis, but content can be misleading  Combing link and content  Most are based on generative models  Link-model (PHITS)+ topic-model (PLSA)  Connected by the community memberships (hidden variable)

  5. Our contribution  Problems with existing models  Community membership is insufficient to model links  Our contribution: introduce popularity of nodes  Generative model, vulnerable to irrelevant attributes  Our contribution: discriminative content model

  6. Notations 𝒲 = *1, … , 𝑜+ nodes ℰ = *(𝑗 → 𝑘)|𝑡 𝑗𝑘 ≠ 0+ directed links ℒ𝒫 𝑗 ∈ 𝒲 link-out space of node i ℒℐ 𝑗 ∈ 𝒲 link-in space of node i 𝒫 𝑗 ∈ 𝒲 nodes cited by node i ℐ 𝑗 ∈ 𝒲 nodes cites node i 𝑨 𝑗 ∈ 1, … , 𝐿 community of node i 𝛿 𝑗 = 𝛿 𝑗1 , … , 𝛿 𝑗𝐿 community membership of node i 𝑦 𝑗 ∈ ℝ 𝑒 content vector of node i

  7. Conditional link model  Popularity-based conditional link model(PCL)  Model conditional link probability: Pr(j|i)  Probability of linking node i to node j  Popularity of node i : 𝑐 𝑗 ≥ 0  Large 𝑐 𝑗  high probability cited by other nodes 𝐿 Pr 𝑘 𝑗 = Pr 𝑨 𝑗 = 𝑙 𝑗 Pr (𝑘|𝑨 𝑗 = 𝑙) 𝑙=1 𝐿 𝛿 𝑘𝑙 𝑐 𝑘 = 𝛿 𝑗𝑙 𝛿 𝑘𝑙 𝑐 𝑘 𝑘∈ℒ𝒫(𝑗) 𝑙=1

  8. Analysis of PCL model  PCL model 𝐿 𝛿 𝑘𝑙 𝑐 𝑘 Pr(j|i) = 𝛿 𝑗𝑙 𝛿 𝑘𝑙 𝑐 𝑘 𝑘∈ℒ𝒫(𝑗) 𝑙=1 𝐿 𝛿 𝑘𝑙 𝑐 𝑘𝑙 Pr(j|i) = 𝛿 𝑗𝑙 𝛿 𝑘𝑙 𝑐 𝑘𝑙 𝑘∈ℒ𝒫(𝑗) 𝑙=1 𝐿 Pr 𝑘 𝑗 = Pr 𝑨 = 𝑙 𝑗 Pr 𝑘 𝑨 = 𝑙 = 𝛿 𝑗𝑙 𝛾 𝑘𝑙 PHITS model 𝑙 𝑙=1

  9. Maximum Likelihood Estimation  The log-likelihood:  We find optimal 𝛿, 𝑐 by maxmizing the log-likelihood

  10. Discriminative Content (DC) model  A discriminative model that determines community memberships by node contents Where 𝑥 𝑙 ∈ ℝ 𝑒 weights different content features PCL + DC 𝛿 𝑘𝑙 𝑐 𝑘 𝐿 Pr(j|i) = 𝛿 𝑗𝑙 𝛿 𝑗𝑙 = 𝑙=1 𝛿 𝑘𝑙 𝑐 𝑘 𝑘∈ℒ𝒫(𝑗)

  11. Optimization Algorithm  We maximize the log-likelihood over the free parameters w and b  EM algorithm

  12. Experiments  Data sets Data set #node #links Content Labels K Description s Political 1490 19090 No Yes 2 Blog network Blog Wikipedia 105 799 No No 20 Webpages hyperlinks Cora 2708 5429 Yes Yes 7 Paper citation Citeseer 3312 4732 Yes Yes 6 Paper citation

  13. Experiments  Performance Metrics  Supervised metrics  normalized mutual information (NMI)  pairwise F-measure (PWF)  Unsupervised metrics  modularity (Modu)  normalized cut (Ncut)

  14. Experiments: link prediction  Baselines: PHITS, PCL-b=1 (constant popularity)  Recall measure  PCL performs better than PHITS  Modeling popularity better than without modeling

  15. Experiments  Community detection on two paper citation data sets

  16. Experiments  Link model: PCL is better than PHITS  On combining link with content:  PCL + content-model performs better than link-models + content model  Link-models + DC performs better than link-model + topic-models  PCL + DC performs better than the other combination models

  17. Conclusion  A conditional link model capture popularity of nodes  A discriminative model for content analysis  A unified model to combine link and content  Link structure  noisy estimation of community memberships 𝑧 (PCL)  𝑧 used as supervised information  high-quality memberships 𝑧 (DC)  Encouraging empirical results

  18. Thanks Q&A?

Recommend


More recommend