cqarank jointly model topics and expertise in community
play

CQARank:Jointly Model Topics and Expertise in Community Question - PowerPoint PPT Presentation

CQARank:Jointly Model Topics and Expertise in Community Question Answering Liu Yang, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, Zhong Chen Peking University Singapore Management University Community Question Answering


  1. CQARank:Jointly Model Topics and Expertise in Community Question Answering Liu Yang, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, Zhong Chen Peking University Singapore Management University

  2. Community Question Answering • Open platforms for sharing expertise • Large repositories of valuable knowledge CIKM2013 2

  3. Existing CQA Mechanism Challenges • Poor expertise matching • Low-quality answers • Under-utilized archived questions • Fundamental question: how to model topics and expertise in CQA sites CIKM2013 3

  4. Motivation • A case study of Stack Overflow Question Tag Vote User Answer CIKM2013 4

  5. Motivation • Propose a principle approach to jointly model topics and expertise in CQA – No one is expert in all topical interests – Each new question should be routed to answerers interested in related topics with the right level of expertise • Achieve better understanding of both user topical interest and expertise by leveraging tagging and voting information – Tags are important user-generated category information of Q&A posts – Votes indicate a CQA community’s long term review result for a given user’s expertise under a specific topic CIKM2013 5

  6. Roadmap • Motivation • Related Work • Our Method – Method Overview – Topic Expertise Model – CQARank • Experiments • Summery CIKM2013 6

  7. Related Work • Link Analysis – HITS (Jurczyk and Agichtein, CIKM07) – Expertise Rank and Z-score (Zhang et al., WWW07) – Find global experts without model of user interests • Latent Topical Analysis – UQA Model ( Guo et al. CIKM08) – Fail to capture to what extent these users’ expertise match the questions with similar topical interest • Topic Sensitive PageRank – TwitterRank (Weng et al. WSDM10) – Topic-sensitive probabilistic model for expert finding (Zhou et al. CIKM12) CIKM2013 7

  8. Roadmap • Motivation • Related Work • Our Method – Method Overview – Topic Expertise Model – CQARank • Experiments • Summery CIKM2013 8

  9. Method Overview • Concepts – Topical Interest – Topical Expertise – Q&A Graph • Our Approach – Topic Expertise Model – CQARank to combine learning results from TEM with link analysis of Q&A graph CIKM2013 9

  10. Method Overview • CQARank Recommendation Framework CIKM2013 10

  11. Roadmap • Motivation • Related Work • Our Method – Method Overview – Topic Expertise Model – CQARank • Experiments • Summery CIKM2013 11

  12. Topic Expertise Model User topical expertise distribution User specific topic distribution β 𝜚 𝑙,𝑣 𝜄 𝑣 α U K*U • 𝑉 : # of users • 𝑂 𝑣 : # of posts • 𝑀 𝑣,𝑜 : # of words e z • 𝑄 𝑣,𝑜 : # of tags • z: topic label • e: expertise label v w t • v: a vote P L u,n u,n • w: a word N u U • t: a tag Expertise Topic specific 𝜈 𝑓 Σ e specific vote φ 𝑙 𝜔 𝑙 word and tag distribution distribution E K K η 𝜈 0 𝑙 0 𝛽 0 𝛾 0 γ CIKM2013 12

  13. Roadmap • Motivation • Related Work • Our Method – Method Overview – Topic Expertise Model – CQARank • Experiments • Summery CIKM2013 13

  14. CQARank • CQARank combines textual content learning result of TEM with link analysis to enforce user topical expertise learning • Construct Q&A Graph 𝐻 = (𝑊, 𝐹) – 𝑊 is a set of nodes representing users – 𝐹 is a set of directed edges from the asker to the answerer • 𝑓 = 𝑣 𝑗 , 𝑣 𝑘 𝑣 𝑗 ∈ 𝑊, 𝑣 𝑘 ∈ 𝑊 • Weight 𝑋 𝑗𝑘 is the number of all answers answered by 𝑣 𝑘 for questions of 𝑣 𝑗 CIKM2013 14

  15. CQARank  For each topic 𝑨 , the transition probability from asker 𝑣 𝑗 to answerer 𝑣 𝑘 is defined as: 𝑋 𝑗𝑘 ∙𝑡𝑗𝑛 𝑨 (𝑗→𝑘) 𝑋 𝑗𝑙 ∙𝑡𝑗𝑛 𝑨 (𝑗→𝑙) 𝑗𝑔 o 𝑄 𝑨 𝑗 → 𝑘 = 𝑥 𝑗,𝑛 𝑋 ≠ 0 𝑛 𝑊 Σ 𝑙=1 o 𝑄 𝑨 𝑗 → 𝑘 = 0 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓 • 𝑡𝑗𝑛 𝑨 (𝑗 → 𝑘) is the similarity between 𝑣 𝑗 and 𝑣 𝑘 under topic 𝑨 , which is defined as ′ − 𝜄 ′ o 𝑡𝑗𝑛 𝑨 𝑗 → 𝑘 = 1 − 𝜄 𝑗𝑨 𝑘𝑨 • The row-normalized transition matrix M is defined as o 𝐍 𝑗𝑘 = 𝑄 𝑨 𝑗 → 𝑘 CIKM2013 15

  16. CQARank • Given topic 𝑨 , the CQARank saliency score of 𝑣 𝑗 is computed based on the following formula: o 𝐒 𝑨 𝑣 𝑗 = 𝜇 𝑘:𝑣 𝑘 →𝑣 𝑗 𝐒 𝑨 𝑣 𝑘 ∙ 𝐍 𝑗𝑘 + 1 − 𝜇 ∙ 𝜄 𝑣 𝑗 𝑨 ∙ 𝐅(𝑨, 𝑣 𝑗 ) o 𝐅(𝑨, 𝑣 𝑗 ) is the estimated expertise score of 𝑣 𝑗 under topic 𝑨 , which is defined as the expectation of user topical expertise distribution learnt by TEM. 𝐅 𝑨, 𝑣 𝑗 = 𝜚 𝑨,𝑣 𝑗 ,𝑓 ∙ 𝜈 𝑓 𝑓 o 𝜇 ∈ 0,1 is a parameter to control the probability of teleportation operation. CIKM2013 16

  17. Roadmap • Motivation • Related Work • Our Method – Method Overview – Topic Expertise Model – CQARank • Experiments • Summery CIKM2013 17

  18. Experiments • Stack Overflow Data Set – All Q&A posts in three months (May 1 𝑡𝑢 to August 1 𝑡𝑢 , 2009) – Training data: 8,904 questions and 96,629 answers posted by 663 users.( 10,689 unique tags and 135 unique votes) – Testing data: 1,173 questions and 9,883 answers • Data Preprocessing – Tokenize text and discard all code snippets – Remove stop words and HTML tags in text • Parameters Setting 50 – 𝐿 = 15, 𝐹 = 10, 𝛽 = 𝐿 , 𝛾 = 0.01, 𝛿 = 0.01, 𝜃 = 0.001, 𝜇 = 0.2 – Norma-Gamma parameters – 500 iterations of Gibbs Sampling CIKM2013 18

  19. TEM Results • Topic Analysis - topic tags – Top tags provide phrase level features to distill richer topic information CIKM2013 19

  20. TEM Results • Topic Analysis - topic words – Top words have strong correlation with top tags under the same topic CIKM2013 20

  21. TEM Results • Expertise Analysis – TEM learns different user expertise levels by clustering votes using GMM component. – 10 Gaussian distributions with various means for the generation of votes in data. – The higher the mean is, the lower the precision is. CIKM2013 21

  22. Recommend Expert Users • Task – Given a new question 𝑟 and a set of users 𝐕 , Rank users by their interests and expertise to answer question 𝑟 . – Recommendation score function 𝑇 𝑣, 𝑟 = 𝑇𝑗𝑛 𝑣, 𝑟 ∙ 𝐹𝑦𝑞𝑓𝑠𝑢 𝑣, 𝑟 = (1 − 𝐾𝑇(𝜄 𝑣 , 𝜄 𝑟 )) ∙ 𝜄 𝑟,𝑨 ∙ 𝐹𝑦𝑞𝑓𝑠𝑢(𝑣, 𝑨) 𝑨 – 𝜄 𝑟,𝑨 is the estimated posterior topic distribution of question 𝑟 𝜄 𝑟,𝑨 ∝ 𝑞 𝑨 𝐱 𝑟 , 𝐮 𝑟 , 𝑣 = 𝑞 𝑨 𝑣 𝑞 𝐱 𝑟 𝑨 𝑞 𝐮 𝑟 𝑨 = 𝜄 𝑣,𝑨 𝜒 𝑨, 𝑥 𝜔(𝑨, 𝑢) 𝑥:𝐱 𝑟 𝑢:𝐮 𝑟 CIKM2013 22

  23. Recommend Expert Users • Our method – CQARank • Baselines Link analysis method – In Degree( ID ) – PageRank( PR ) Probabilistic generative model – TEM (Part of our method) – UQA ( Guo et al. CIKM08) Combine link analysis and topic model – Topic Sensitive PageRank( TSPR )(Zhou et al. CIKM12) CIKM2013 23

  24. Recommend Expert Users • Evaluation Criteria – Ground truth: User rank list by average votes for answering 𝑟 – Metrics: 𝑜𝐸𝐷𝐻 , Pearson/Kendall correlation coefficients • Results CIKM2013 24

  25. Recommend Answers • Task – Give a new question 𝑟 and a set of answers 𝐁 , Rank all answers in 𝐁 . – Recommendation score function 𝑇 𝑏, 𝑟 = 𝑇𝑗𝑛 𝑏, 𝑟 ∙ 𝐹𝑦𝑞𝑓𝑠𝑢 𝑣, 𝑟 = (1 − 𝐾𝑇(𝜄 𝑏 , 𝜄 𝑟 )) ∙ 𝜄 𝑟,𝑨 ∙ 𝐹𝑦𝑞𝑓𝑠𝑢(𝑣, 𝑨) 𝑨 • Baselines and evaluation criteria are the same with expert recommendation task • We use each answer’s vote to generate ground truth rank list CIKM2013 25

  26. Recommend Answers • Result CIKM2013 26

  27. Recommend Similar Questions • When a user asks a new question(referred as query question ), the user will often get replies of links to other similar questions • Crawl 1000 questions as query question set whose similar questions exist in the training data set • For each query question with 𝑜 similar questions , we randomly select another 𝑛 (m = 1000) questions from the training data set to form candidate similar questions CIKM2013 27

  28. Recommend Similar Questions • All comparing methods rank these 𝑛 + 𝑜 candidate similar questions according to their similarity with the query question • The higher the similar questions are ranked, the better the performance of the method is. • Recommendation score is computed based on JS- divergence between topic distributions of the query question and candidate similar questions CIKM2013 28

  29. Recommend Similar Questions • Baseline – TSPR(LDA), UQA, SimTag • Evaluation Criteria – Precision@K, Average rank of similar questions, Mean reciprocal rank (MRR), Cumulative distribution of ranks (CDR) CIKM2013 29

  30. Parameter Sensitivity Analysis • Performance in expert users recommendation of CQARank by varying the number of expertise ( 𝐹 ) and topics ( 𝐿 ) CIKM2013 30

  31. Roadmap • Motivation • Related Work • Our Method – Method Overview – Topic Expertise Model – CQARank • Experiments • Summery CIKM2013 31

Recommend


More recommend