Nders at NTCIR-13 Short Text Conversation Han Ni, Liansheng Lin, Ge Xu NetDragon Websoft Inc. Dec. 2017
System Architecture Figure 1: System Architecture 1
Preprocessing • Traditional-Simplifjed Chinese conversion • Convert Full-width characters into half-width ones • Word segmentation (PKU standard) • Replace number, time, url with token <_NUM>, <_TIME>, <_URL> respectively • Filter meaningless words and special symbols 2
Short Text ID test-post-10440 去 到 美 國 , 还 是 吃 中 餐 ! 宮 保 雞 丁 家 的 感 覺 ~ Raw Text Go to the USA, still eat Chinese food, Kung Pao Chicken, feeling like at home 去 到 美 國 , 还 是 吃 中 餐 ! 宮 保 雞 丁 家 的 感 覺 ˜ Without T-S Conversion 去 到 美 国 , 还 是 吃 中 餐 ! 宫 保 鸡 丁 家 的 感 觉 ˜ With T-S Conversion 去 到 美 国 还 是 吃 中 餐 宫 保 鸡 丁 家 的 感 觉 Clean Result Short Text ID test-post-10640 汶 川 大 地 震 9 周 年 : 29 个 让 人 泪 流 满 面 的 瞬 间 。 Raw Text 9th Anniversary of Wenchuan Earthquake: 29 moments making people tearful 汶 川 大 地 震 9 周 年 : 29 个 让 人 泪 流 满 面 的 瞬 间 。 Without token replacement 汶 川 大 地 震 < NUM > 周 年 : < NUM > 个 让 人 泪 流 满 面 的 瞬 间 。 With token replacement 汶 川 大 地 震 < NUM > 周 年 < NUM > 个 让 人 泪 流 满 面 的 瞬 间 Clean Result
Similarity Features • TF-IDF • LSA (Latent Semantic Analysis) • LDA (Latent Dirichlet Allocation) • Word2Vec (skip-gram) • LSTM-Sen2Vec We combine each post with its corresponding comments to be a document, then train LSA and LDA models on these documents. 3
LSTM (4) Zaremba, Wojciech, I. Sutskever, and O. Vinyals. Recurrent Neural Network Regularization. Eprint Arxiv (2014). Mikolov, Toma’s. Statistical Language Models Based on Neural Networks. Ph.D. thesis, Brno University of Technology.(2012) Figure 2: The LSTM Cell (6) (5) C t (3) (2) (1) 4 f t = σ ( W f · [ h t − 1 , x t ] + b f ) i t = σ ( W i · [ h t − 1 , x t ] + b i ) ˜ C t = tanh ( W C · [ h t − 1 , x t ] + b C ) C t = f t ∗ C t − 1 + i t ∗ ˜ o t = σ ( W o · [ h t − 1 , x t ] + b o ) h t = o t ∗ tanh ( C t )
Attention weight Figure 3: Unidirectional weight distribution Figure 4: bidirectional weight distribution 5
LSTM-Sen2Vec Figure 5: The Unidirectional LSTM Figure 6: The Traditional Bidirectional LSTM 6
LSTM-Sen2Vec Figure 7: The Modifjed Bidirectional LSTM 7
Candidates Generation • Similar Posts Score 1 (7) Score 2 (8) • Comment Candidates Score 1 (9) Score 2 (10) 8 q , p ( q , p ) = Sim LDA ( q , p ) ∗ Sim W 2 V ( q , p ) ∗ Sim LSTM ( q , p ) q , p ( q , p ) = Sim LSA ( q , p ) ∗ Sim W 2 V ( q , p ) ∗ Sim LSTM ( q , p ) q , c ( q , c ) = Sim LSA ( q , c ) ∗ Sim W 2 V ( q , c ) q , c ( q , c ) = Sim LDA ( q , c ) ∗ Sim W 2 V ( q , c )
Ranking • TextRank (Words as vertices) • Pattern-IDF • Pattern-IDF + TextRank (Sentences as vertices) 9
TextRank - A graph-based ranking model vertices that linked with it. The score of a vertex Vi is defjne as follow: (11) Where d is a damping factor 1 that is usually set to 0.85. 1 Brin, Sergey, and L. Page. The anatomy of a large-scale hypertextual Web search engine. International Conference on World Wide Web Elsevier Science Publishers B. V. 1998:107-117. 10 Formally, let G = ( V ; E ) be a undirected graph with the set of vertices V and and set of edges E , where E is a subset of V × V . For a given V i , let link ( V i ) be the set of ∑ WS ( V i ) = ( 1 − d ) + d ∗ w ij ∗ WS ( V j ) j ∈ link ( V i )
TextRank - Vertices and Edges • Vertices: each unique word in candidates • Edges: a co-occurrence relation • Weighted by: word2vec similarity between two words and the number of their cooccurrences 11
TextRank - Calculate Iteratively . M 1 k M 21 M 22 M 2 k . . . . . . M 12 . . ... . . . M k 1 M k 2 M k 3 M kk M 13 M 23 M 11 12 For N candidates, k words in total, we construct k × k matrix M . M ij = cnt ∗ sim ( D i , D j ) . Then we compute iteratively . . . ( 1 − d ) / k . . . ( 1 − d ) / k R ( t + 1 ) = + d R ( t ) . . . . . . . . . ( 1 − d ) / k . . . Stop when | R ( t + 1 ) − R ( t ) | < ϵ , ϵ = 10 − 7 . Here, cnt refers to the number of co-ocurrences within a sentence for D i and D j .
TextRank - Ranking comment candidate c is calculated as: (12) 13 Since we get the score R ( D i ) for each word D i in candidates, the score for each ∑ D i ∈ c R ( D i ) Rank TextRank ( c ) = len ( c ) Here, len ( c ) refers to the number of words in comment c.
Pattern-IDF the post, we defjne ( D j , D i ) as a pattern. Inspired by the IDF, we calculate the Pattern-IDF as: (13) eliminated. 14 For word D i (minor word) in corresponding comment given word D j (major word) in count c ( D i ) ∗ count p ( D j ) PI ( D i | D j ) = 1 / log 2 count pair ( D i , D j ) Here count c refers to the number of occurrence in comments, count p in posts, count pair in post-comment pair. The PI whose count pair ( D i , D j ) less than 3 are
Pattern-IDF Figure 8: log(X) Figure 9: 1/log(x) 15 Let X = count c ( D i ) ∗ count p ( D j ) , then X ∈ [ 1 , ∞ ) . count pair ( D i , D j )
PI - Example 1.083438 0.026346 Table 1: The example of Pattern-IDF MajorWord H 眼病 (eye disease) 0.889971 丰收年 (harvest year) 0.988191 血浆 (plasma) 1.033668 脊椎动物 (vertebrate) 水粉画( gouache painting ) 1.180993 中国移动 ... ... 现在 (now) 9.767768 什么 (what) 10.219045 是 (be) 10.934950 (14) n (15) 的 (of) Table 2: The entropy of Pattern-IDF for each Major Word 0.027642 0.062408 MajorWord MinorWord PI 中国移动 (China Mobile) 接通 (connect) 0.071725 中国移动 cmcc 是 (be) 中国移动 资费 (charges) 0.067261 中国移动 ... 中国移动 0.028889 我 (me) 营业厅 (business hall ) 中国移动 16 ... ... 0.059234 漫游 (roamimg) 中国移动 0.059949 PI ( D i | D j ) PI norm ( D i | D j ) = ∑ n i = 1 PI ( D i | D j ) ∑ H ( D j ) = − PI norm ( D i | D j ) log 2 PI norm ( D i | D j ) i = 1
PI - Ranking For each comment c in candidates, given a query (new post) q , we calculate the score by PI as follow: (16) Then we defjne rank score as follow: (17) 17 ∑ ∑ D i ∈ c PI ( D i | D j ) D j ∈ q Score PI ( q , c ) = len ( c ) ∗ len ( q ) Score PI ( q , c ) Rank PI = ( 1 + max Score PI ( q , c )) ∗ Sim W 2 V ( q , c ) ∗ Sim LSA ( q , c )
TextRank + Pattern-IDF In this method, We add each comment sentence in candidates as a vertex in the graph and use sentence Word2Vec similarity as edges between vertices in the graph. candidates. And each entry of P is defjned as the score of Pattern-IDF between the (18) 18 For N candidates, we construct N × N matrix M . M ij = Sim w 2 v ( candidate i , candidate j ) . At time t = 0 , We initiate a N-dimension vector P , here N is the number of comment query (new post) q and corresponding comment c i in candidates: P i = Score PI ( q , c i )
TextRank + Pattern-IDF . M 21 M 22 Then we compute iteratively M 2 N . . . . . . M 13 . . ... . . . M N 1 M N 2 M N 3 M NN M 1 N M 23 M 12 19 M 11 . . . ( 1 − d ) / N . . . ( 1 − d ) / N R ( t + 1 ) = + d R ( t ) . . . . . . . . . ( 1 − d ) / N . . . Stop when | R ( t + 1 ) − R ( t ) | < ϵ , ϵ = 10 − 7 Finally, we get the score P i for each comment in candidates.
Experiment • Nders-C-R5: LDA + Word2Vec + LSTM-Sen2Vec • Nders-C-R4: LSA + Word2Vec + LSTM-Sen2Vec • Nders-C-R3: R4 + TextRank (Words as vertices) • Nders-C-R2: R4 + Pattern-IDF • Nders-C-R1: R4 + Pattern-IDF + TextRank (Sentences as vertices) 20
Offjcial Result Table 3: The offjcial results of fjve runs for Nders team R2 vs. R4 0.5868 0.5495 0.4550 Nders-C-R5 0.5809 0.5338 0.4780 Nders-C-R4 0.5768 0.5317 0.4647 Nders-C-R3 0.5882 Run Mean nG@1 Mean P+ Mean nERR@10 Nders-C-R1 0.4593 0.5394 0.5805 Nders-C-R2 0.4743 0.5497 21 ↓ 0.77% ↑ 2.98% ↑ 1.26%
Questions? 21
Recommend
More recommend