CQARank:Jointly Model Topics and Expertise in Community Question Answering Liu Yang, Minghui Qiu, Swapna Gottipati, Feida Zhu, Jing Jiang, Huiping Sun, Zhong Chen Peking University Singapore Management University
Community Question Answering • Open platforms for sharing expertise • Large repositories of valuable knowledge CIKM2013 2
Existing CQA Mechanism Challenges • Poor expertise matching • Low-quality answers • Under-utilized archived questions • Fundamental question: how to model topics and expertise in CQA sites CIKM2013 3
Motivation • A case study of Stack Overflow Question Tag Vote User Answer CIKM2013 4
Motivation • Propose a principle approach to jointly model topics and expertise in CQA – No one is expert in all topical interests – Each new question should be routed to answerers interested in related topics with the right level of expertise • Achieve better understanding of both user topical interest and expertise by leveraging tagging and voting information – Tags are important user-generated category information of Q&A posts – Votes indicate a CQA community’s long term review result for a given user’s expertise under a specific topic CIKM2013 5
Roadmap • Motivation • Related Work • Our Method – Method Overview – Topic Expertise Model – CQARank • Experiments • Summery CIKM2013 6
Related Work • Link Analysis – HITS (Jurczyk and Agichtein, CIKM07) – Expertise Rank and Z-score (Zhang et al., WWW07) – Find global experts without model of user interests • Latent Topical Analysis – UQA Model ( Guo et al. CIKM08) – Fail to capture to what extent these users’ expertise match the questions with similar topical interest • Topic Sensitive PageRank – TwitterRank (Weng et al. WSDM10) – Topic-sensitive probabilistic model for expert finding (Zhou et al. CIKM12) CIKM2013 7
Roadmap • Motivation • Related Work • Our Method – Method Overview – Topic Expertise Model – CQARank • Experiments • Summery CIKM2013 8
Method Overview • Concepts – Topical Interest – Topical Expertise – Q&A Graph • Our Approach – Topic Expertise Model – CQARank to combine learning results from TEM with link analysis of Q&A graph CIKM2013 9
Method Overview • CQARank Recommendation Framework CIKM2013 10
Roadmap • Motivation • Related Work • Our Method – Method Overview – Topic Expertise Model – CQARank • Experiments • Summery CIKM2013 11
Topic Expertise Model User topical expertise distribution User specific topic distribution β 𝜚 𝑙,𝑣 𝜄 𝑣 α U K*U • 𝑉 : # of users • 𝑂 𝑣 : # of posts • 𝑀 𝑣,𝑜 : # of words e z • 𝑄 𝑣,𝑜 : # of tags • z: topic label • e: expertise label v w t • v: a vote P L u,n u,n • w: a word N u U • t: a tag Expertise Topic specific 𝜈 𝑓 Σ e specific vote φ 𝑙 𝜔 𝑙 word and tag distribution distribution E K K η 𝜈 0 𝑙 0 𝛽 0 𝛾 0 γ CIKM2013 12
Roadmap • Motivation • Related Work • Our Method – Method Overview – Topic Expertise Model – CQARank • Experiments • Summery CIKM2013 13
CQARank • CQARank combines textual content learning result of TEM with link analysis to enforce user topical expertise learning • Construct Q&A Graph 𝐻 = (𝑊, 𝐹) – 𝑊 is a set of nodes representing users – 𝐹 is a set of directed edges from the asker to the answerer • 𝑓 = 𝑣 𝑗 , 𝑣 𝑘 𝑣 𝑗 ∈ 𝑊, 𝑣 𝑘 ∈ 𝑊 • Weight 𝑋 𝑗𝑘 is the number of all answers answered by 𝑣 𝑘 for questions of 𝑣 𝑗 CIKM2013 14
CQARank For each topic 𝑨 , the transition probability from asker 𝑣 𝑗 to answerer 𝑣 𝑘 is defined as: 𝑋 𝑗𝑘 ∙𝑡𝑗𝑛 𝑨 (𝑗→𝑘) 𝑋 𝑗𝑙 ∙𝑡𝑗𝑛 𝑨 (𝑗→𝑙) 𝑗𝑔 o 𝑄 𝑨 𝑗 → 𝑘 = 𝑥 𝑗,𝑛 𝑋 ≠ 0 𝑛 𝑊 Σ 𝑙=1 o 𝑄 𝑨 𝑗 → 𝑘 = 0 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓 • 𝑡𝑗𝑛 𝑨 (𝑗 → 𝑘) is the similarity between 𝑣 𝑗 and 𝑣 𝑘 under topic 𝑨 , which is defined as ′ − 𝜄 ′ o 𝑡𝑗𝑛 𝑨 𝑗 → 𝑘 = 1 − 𝜄 𝑗𝑨 𝑘𝑨 • The row-normalized transition matrix M is defined as o 𝐍 𝑗𝑘 = 𝑄 𝑨 𝑗 → 𝑘 CIKM2013 15
CQARank • Given topic 𝑨 , the CQARank saliency score of 𝑣 𝑗 is computed based on the following formula: o 𝐒 𝑨 𝑣 𝑗 = 𝜇 𝑘:𝑣 𝑘 →𝑣 𝑗 𝐒 𝑨 𝑣 𝑘 ∙ 𝐍 𝑗𝑘 + 1 − 𝜇 ∙ 𝜄 𝑣 𝑗 𝑨 ∙ 𝐅(𝑨, 𝑣 𝑗 ) o 𝐅(𝑨, 𝑣 𝑗 ) is the estimated expertise score of 𝑣 𝑗 under topic 𝑨 , which is defined as the expectation of user topical expertise distribution learnt by TEM. 𝐅 𝑨, 𝑣 𝑗 = 𝜚 𝑨,𝑣 𝑗 ,𝑓 ∙ 𝜈 𝑓 𝑓 o 𝜇 ∈ 0,1 is a parameter to control the probability of teleportation operation. CIKM2013 16
Roadmap • Motivation • Related Work • Our Method – Method Overview – Topic Expertise Model – CQARank • Experiments • Summery CIKM2013 17
Experiments • Stack Overflow Data Set – All Q&A posts in three months (May 1 𝑡𝑢 to August 1 𝑡𝑢 , 2009) – Training data: 8,904 questions and 96,629 answers posted by 663 users.( 10,689 unique tags and 135 unique votes) – Testing data: 1,173 questions and 9,883 answers • Data Preprocessing – Tokenize text and discard all code snippets – Remove stop words and HTML tags in text • Parameters Setting 50 – 𝐿 = 15, 𝐹 = 10, 𝛽 = 𝐿 , 𝛾 = 0.01, 𝛿 = 0.01, 𝜃 = 0.001, 𝜇 = 0.2 – Norma-Gamma parameters – 500 iterations of Gibbs Sampling CIKM2013 18
TEM Results • Topic Analysis - topic tags – Top tags provide phrase level features to distill richer topic information CIKM2013 19
TEM Results • Topic Analysis - topic words – Top words have strong correlation with top tags under the same topic CIKM2013 20
TEM Results • Expertise Analysis – TEM learns different user expertise levels by clustering votes using GMM component. – 10 Gaussian distributions with various means for the generation of votes in data. – The higher the mean is, the lower the precision is. CIKM2013 21
Recommend Expert Users • Task – Given a new question 𝑟 and a set of users 𝐕 , Rank users by their interests and expertise to answer question 𝑟 . – Recommendation score function 𝑇 𝑣, 𝑟 = 𝑇𝑗𝑛 𝑣, 𝑟 ∙ 𝐹𝑦𝑞𝑓𝑠𝑢 𝑣, 𝑟 = (1 − 𝐾𝑇(𝜄 𝑣 , 𝜄 𝑟 )) ∙ 𝜄 𝑟,𝑨 ∙ 𝐹𝑦𝑞𝑓𝑠𝑢(𝑣, 𝑨) 𝑨 – 𝜄 𝑟,𝑨 is the estimated posterior topic distribution of question 𝑟 𝜄 𝑟,𝑨 ∝ 𝑞 𝑨 𝐱 𝑟 , 𝐮 𝑟 , 𝑣 = 𝑞 𝑨 𝑣 𝑞 𝐱 𝑟 𝑨 𝑞 𝐮 𝑟 𝑨 = 𝜄 𝑣,𝑨 𝜒 𝑨, 𝑥 𝜔(𝑨, 𝑢) 𝑥:𝐱 𝑟 𝑢:𝐮 𝑟 CIKM2013 22
Recommend Expert Users • Our method – CQARank • Baselines Link analysis method – In Degree( ID ) – PageRank( PR ) Probabilistic generative model – TEM (Part of our method) – UQA ( Guo et al. CIKM08) Combine link analysis and topic model – Topic Sensitive PageRank( TSPR )(Zhou et al. CIKM12) CIKM2013 23
Recommend Expert Users • Evaluation Criteria – Ground truth: User rank list by average votes for answering 𝑟 – Metrics: 𝑜𝐸𝐷𝐻 , Pearson/Kendall correlation coefficients • Results CIKM2013 24
Recommend Answers • Task – Give a new question 𝑟 and a set of answers 𝐁 , Rank all answers in 𝐁 . – Recommendation score function 𝑇 𝑏, 𝑟 = 𝑇𝑗𝑛 𝑏, 𝑟 ∙ 𝐹𝑦𝑞𝑓𝑠𝑢 𝑣, 𝑟 = (1 − 𝐾𝑇(𝜄 𝑏 , 𝜄 𝑟 )) ∙ 𝜄 𝑟,𝑨 ∙ 𝐹𝑦𝑞𝑓𝑠𝑢(𝑣, 𝑨) 𝑨 • Baselines and evaluation criteria are the same with expert recommendation task • We use each answer’s vote to generate ground truth rank list CIKM2013 25
Recommend Answers • Result CIKM2013 26
Recommend Similar Questions • When a user asks a new question(referred as query question ), the user will often get replies of links to other similar questions • Crawl 1000 questions as query question set whose similar questions exist in the training data set • For each query question with 𝑜 similar questions , we randomly select another 𝑛 (m = 1000) questions from the training data set to form candidate similar questions CIKM2013 27
Recommend Similar Questions • All comparing methods rank these 𝑛 + 𝑜 candidate similar questions according to their similarity with the query question • The higher the similar questions are ranked, the better the performance of the method is. • Recommendation score is computed based on JS- divergence between topic distributions of the query question and candidate similar questions CIKM2013 28
Recommend Similar Questions • Baseline – TSPR(LDA), UQA, SimTag • Evaluation Criteria – Precision@K, Average rank of similar questions, Mean reciprocal rank (MRR), Cumulative distribution of ranks (CDR) CIKM2013 29
Parameter Sensitivity Analysis • Performance in expert users recommendation of CQARank by varying the number of expertise ( 𝐹 ) and topics ( 𝐿 ) CIKM2013 30
Roadmap • Motivation • Related Work • Our Method – Method Overview – Topic Expertise Model – CQARank • Experiments • Summery CIKM2013 31
Recommend
More recommend