NERank: Bringing Order to Named Entities from Texts Chengyu Wang 1 , Rong Zhang 1 , Xiaofeng He 1 , Guomin Zhou 2 , Aoying Zhou 1 1) Institute for Data Science and Engineering, East China Normal University 2) Zhejiang Police College
Outline • Introduction • Problem Statement • Proposed Approach • Experiments • Conclusion 2
Entity Ranking • Ranking entities from texts – Input: a text collection – Output: a ranked order of named entities • Why entity ranking? – Entity-oriented Web search • given a query, retrieve a list of entities from relevant documents – Web semantification • add semantic tags to Web documents – Knowledge base population • extract and rank entities and then link them to knowledge bases 3
Outline • Introduction • Problem Statement • Proposed Approach • Experiments • Conclusion 4
Problem Statement • Given a document collection 𝐸 and a normalized named entity collection 𝐹 detected from 𝐸 , the goal is to give each entity 𝑓 ∈ 𝐹 a rank 𝑠(𝑓) to denote the relative importance such that – 0 ≤ 𝑠 𝑓 ≤ 1 – ∑ = 1 𝑠 𝑓 .∈/ 5
General Framework 6
Outline • Introduction • Problem Statement • Proposed Approach • Experiments • Conclusion 7
Topical Tripartite Graph Modeling • Topics in Egypt Revolution • TTG construction 8
Prior Topic Rank Estimation Three Quality Metrics • Probabilities derived from TTG modeling – 𝜄 2,4 : probability of topic 𝑢 4 in document 𝑒 2 – 𝜒 8 2,4 : probability of normalized entity 𝑓 4 in topic 𝑢 2 • Quality metrics – Prior probability = 1 |<| |𝐸| ; 𝑞𝑠 𝑢 2 𝜄 2,4 4=> – Entity richness 𝑓𝑠 𝑢 2 = 1 |/| ; 𝜒 8 2,4 𝑎 .@ 4=> – Topic specificity 0, (𝑞𝑠 𝑢 2 < 𝜁) 𝑢𝑡 𝑢 2 = B > < E FG ∑ 𝜄 2,4 log K 𝜄 2,4 (𝑞𝑠 𝑢 2 ≥ 𝜁) 4=> 9
Prior Topic Rank Estimation Ranking Function • Linear ranking function M 𝑢 2 = 𝑋 O P 𝐺(𝑢 2 ) 𝑠 – 𝐺 𝑢 2 =< 𝑞𝑠 𝑢 2 ,𝑓𝑠 𝑢 2 ,𝑢𝑡 𝑢 2 > – ∑ 𝑥 2 = 1 2 • Parameter learning – For two topics 𝑢 2 and 𝑢 4 , if 𝑢 2 is a more important topic than 𝑢 4 , we have 𝑠 M 𝑢 2 > 𝑠 M 𝑢 4 K + 𝐷 P ∑ – Optimization objective: 𝑋 𝜊 2,4 K 2,4 – Constraints: 𝑋 O P 𝐺 𝑢 2 − 𝑋 O P 𝐺 𝑢 4 ≥ 1 − 𝜊 2,4 – Train a linear SVM classifier to learn the weights 10
Meta-Path Constrained Random Walk Algorithm • Initialization – 𝑠 𝑢 2 = 𝑠 M 𝑢 2 • Probability propagation – Following TDT (Topic-Doc-Topic) meta path (with prob. 𝛽 > 0 ) Y Z,[ ∑ Y ]\∈^ \,[ Y 𝑢 2 𝑒 4 → 𝑢 ` [,\ – Following TET (Topic-Entity-Topic) meta path (with prob. 𝛾 > 0 ) b c Z,[ b c \,[ ∑ ∑ b c Z,\ b c f,[ d\∈e Ff∈g 𝑢 2 𝑓 𝑢 ` 4 – Random jump (with prob. 1 − 𝛽 − 𝛾 > 0 ) 11
Proof of Convergence (1) • Update rule of NERank O P 𝑈 il> + (1 − 𝛽 − 𝛾)𝑈 M n o Φ n k O Θ P 𝑈 il> + 𝛾 P Φ 𝑈 i = 𝛽 P Θ k • Non-recursive form of NERank il> i = 𝑁 i 𝑈 M + (1 − 𝛽 − 𝛾) ; 𝑁 2 𝑈 M 𝑈 2=M n o Φ n k O Θ + 𝛾 P Φ O – where 𝑁 = 𝛽 P Θ k • Matrix limit of 𝑈 i il> i→s 𝑁 i 𝑈 M + (1 − 𝛽 − 𝛾) lim 𝑁 2 𝑈 M i→s ∑ – i→s 𝑈 lim i = lim 2=M O are transition matrices n o Φ n k i→s 𝑁 i 𝑈 M = 0 (because Θ k O Θ and Φ – lim with 0 < 𝛽 + 𝛾<1) il> 𝑁 2 𝑈 M = (𝐽 − 𝑁) l> 𝑈 M i→s ∑ – lim 2=M 12
Proof of Convergence (2) • Matrix limit of 𝑈 i i = (1 − 𝛽 − 𝛾)(𝐽 − 𝑁) l> 𝑈 i→s 𝑈 lim M • Close form of 𝑈 i 𝑈 ∗ = (1 − 𝛽 − 𝛾)(𝐽 − 𝛽 P Θ k O ) l> 𝑈 M O Θ + 𝛾 P Φ 8 o Φ 8 k • Close form of 𝐹 i 𝐹 ∗ = (1 − 𝛽 − 𝛾)Φ n k n o Φ n k O (𝐽 − 𝛽 P Θ k O Θ + 𝛾 P Φ O ) l> 𝑈 M 13
Outline • Introduction • Problem Statement • Proposed Approach • Experiments • Conclusion 14
Experiments (1) • Datasets – 50 newswire collections from TimelineData and CrisisData, each related to an international event – Example events: Egypt Revolution, Iraq War, BP Oil Spill, etc. • Hyper-parameter settings 15
Experiments (2) • Comparative study – Baselines: TF-IDF, TextRank, LexRank and Kim et al. – Variants of our approaches: NERank Uni and NERank α =0 16
Experiments (3) • Case studies 17
Outline • Introduction • Problem Statement • Proposed Approach • Experiments • Conclusion 18
Conclusion • NERank – Effective to rank named entities in documents with little human intervention • Future work – A general framework for entity ranking from different types of texts (i.e., documents, tweets, etc.) – A complete benchmark for evaluating entity ranking 19
Thanks! Questions & Answers
Recommend
More recommend