chinese hypernym hyponym extraction from user generated
play

Chinese Hypernym-Hyponym Extraction from User Generated Categories - PowerPoint PPT Presentation

Chinese Hypernym-Hyponym Extraction from User Generated Categories Chengyu Wang, Xiaofeng He School of Computer Science and Software Engineering, East China Normal University Shanghai, China Outline Introduction Background and Related


  1. Chinese Hypernym-Hyponym Extraction from User Generated Categories Chengyu Wang, Xiaofeng He School of Computer Science and Software Engineering, East China Normal University Shanghai, China

  2. Outline • Introduction • Background and Related Work • Proposed Approach • Experiments • Conclusion 2

  3. Chinese Is-A Relation Extraction • Chinese is-a relation extraction – Chinese is-a relations are essential to construct large-scale Chinese taxonomies and knowledge graphs. – It is difficult to extract such relations due to the flexibility of language expression. • User generated categories – User generated categories are valuable knowledge sources, providing fine- grained candidate hypernyms of entities. – The semantic relations between an entity and its categories are not clear. 3

  4. Baidu Baike: one of the largest online encyclopedias in China, with 13M+ entries Barack Obama Categories: Political figure, Foreign country, Leader, Person 4

  5. The task : distinguishing is-a Barack Obama and not-is-a relations between Chinese words/phases Is-a Is-a Not- Is-a is-a Categories: Political figure, Foreign country, Leader, Person 5

  6. Outline • Introduction • Background and Related Work • Proposed Approach • Experiments • Conclusion 6

  7. Background • Taxonomy: a hierarchical type system for knowledge graphs, consisting of is-a relations among classes and entities – Example Entitiy Classes Person Country Political Leader Scientist Developed Country Entities 7

  8. Describing the Task • Learning is-a relations for taxonomy expansion Entitiy Entitiy Person Country Learning Person Country Political Leader Scientist Developed Country Algorithm Political Leader Scientist Developed Country Key challenge : identify is-a relations from user generated categories 8

  9. Modeling the Task • Taxonomy – Direct acyclic graph 𝐻 = (𝐹, 𝑆) ( 𝐹 : entities/classes, 𝑆 : is-a relations) • User generated categories – Collection of entities 𝐹 ∗ – Set of user generated categories: 𝐷𝑏𝑢 𝑓 for 𝑓 ∈ 𝐹 ∗ • Goal – Predict whether there is an is-a relation between 𝑓 and 𝑑 where 𝑓 ∈ 𝐹 ∗ and 𝑑 ∈ 𝐷𝑏𝑢 𝑓 based on the taxonomy 𝐻 9

  10. Previous Approaches • Pattern matching-based approaches – Handcraft patterns: high accuracy, low coverage • Hearst Patterns: NP 1 such as NP 2 – Automatic generated patterns: higher coverage, lower accuracy – Not suitable for Chinese with flexible expression • Thesauri and encyclopedia based approaches – Taxonomy construction based on existing knowledge sources • YAGO: Wikipedia + WordNet • More precise but have limited scope constrained by sources – Chinese: relatively low-resourced • No Chinese version of WordNet and Freebase available 10

  11. Previous Approaches • Text inference based approach – Infer relations using distributed similarity measures • Assumption: a hyponym can only appear in some of the contexts of its hypernym and a hypernym can appear in all contexts of its hyponyms – Not suitable for Chinese with flexible and sparse contexts • Word embedding based approach – Represent words as dense, low-dimensional vectors – Learn semantic projection models from hyponyms to hypernyms – State-of-the-art approach for Chinese is-a relation extraction (ACL’14) 11 Figures taken from Mikolov et al., 2013

  12. Learning from Previous Work • Lessons learned from “state-of-the art” – Use word embeddings to represent words – Learn relations between hyponyms and hypernyms in the embedding space • Basic approaches – Vector offsets – Linear projection 12 Figures taken from Mikolov et al., 2013

  13. Observations • Word vector offsets between Chinese is-a pairs – Multiple linguistic regularities may exist in is-a pairs • Different levels of hypernyms • Different types of is-a relations (instanceOf vs. subClassOf) • Different domains 13

  14. Outline • Introduction • Background and Related Work • Proposed Approach • Experiments • Conclusion 14

  15. General Framework • Initial stage – Train piecewise linear projection models based on the Chinese taxonomy • Iterative learning stage – Extract new is-a relations and adjust model parameters based on an incremental learning approach – Use Chinese Hypernym/Hyponym patterns to prevent “semantic drift” in each iteration 15

  16. Initial Model Training • Linear projection model – Projection model: 𝑁𝑤 ⃗ 𝑦 3 + 𝑐 = 𝑤 ⃗ 𝑧 3 Projection matrix Word vector Offset vector • Piecewise linear projection model – Partition a collection of is-a relations 𝑆 7 ⊂ 𝑆 ∗ into 𝐿 clusters ( 𝐷 : ,⋯ ,𝐷 < ,⋯ ,𝐷 = ) – Each cluster 𝐷 < share projection matrix 𝑁 < and offset vector 𝑐 < – Optimization function: 1 C 𝐾 𝑁 < ,𝑐 < ; 𝐷 < = A 𝑁 < 𝑤 ⃗ 𝑦 3 + 𝑐 < − 𝑤 ⃗ 𝑧 3 𝐷 < (D E ,F E )∈G H 16

  17. Iterative Learning (1) • Initialization – Word pairs: positive is-a set 𝑆 7 , unlabeled set 𝑉 – Model parameters: 𝑁 < and 𝑐 < for each cluster • Iterative process ( 𝑢 = 1, ⋯ , 𝑈 ) Sample δ 𝑉 word pairs from 𝑉 , denoted as 𝑉 (L) . 1. 2. Use the model to predict the relation between words. Denote “positive” (L) . word pairs as 𝑉 M (L) Use pattern-based relation selection method to select a subset of 𝑉 M 3. (L) . which have high confidence, denoted as 𝑉 N (L) from 𝑉 and add it to 𝑆 7 . Remove 𝑉 N 4. 17

  18. Iterative Learning (2) • Iterative process ( 𝑢 = 1, ⋯ , 𝑈 ) (L) . Update cluster centroids incrementally based on 𝑉 N 5. 1 (LN:) = 𝑑 (L) + 𝜇 Q (L) A 𝑑 ⃑ < ⃑ < 𝑤 ⃑ 𝑦 3 − 𝑤 ⃑ 𝑧 3 − 𝑑 ⃑ < (L) 𝑉 < (S) (D E ,F E )∈R H New centroid Distance from centroid Old centroid Learning rate of centroid shift 6. Update model parameters based on new cluster assignments. 1 C (L) − 𝑤 (L) ,𝑐 < (L) ; 𝐷 < (L) (L) 𝑤 A 𝐾 𝑁 < = 𝑁 < ⃗ 𝑦 3 + 𝑐 < ⃗ 𝑧 3 (L) 𝐷 < (S) (D E ,F E )∈G H 18

  19. Iterative Learning (3) • Model prediction – The prediction of the final piecewise linear projection models – The transitivity closure of existing is-a relations • Discussion – Combination of semantic and lexical extraction of is-a relations • Sematic level: word embedding based projection models • Lexical level: pattern-based relation selection – Incremental learning • Update of cluster centroids • Update of model parameters 19

  20. Pattern-based Relation Selection (1) • Two observations Examples of Chinese – Positive evidence Hypernym/Hyponym Patterns • Is-A patterns Category Example • Such-As patterns Is-A 𝑦 3 是一个 𝑧 (between 𝑦 3 /𝑦 V and 𝑧 ) 𝑦 3 is a kind of 𝑧 Hypothesis : 𝑦 3 /𝑦 V is-a 𝑧 𝑧 ,例如 𝑦 3 、 𝑦 V Such-As – Negative evidence 𝑧 , such as 𝑦 3 and 𝑦 V • Such-As patterns 𝑦 3 、 𝑦 V 等 Co-Hyponym (between 𝑦 3 and 𝑦 V ) 𝑦 3 , 𝑦 V and others • Co-Hyponym patterns Hypothesis : 𝑦 3 not-is-a 𝑦 V 𝑦 V not-is-a 𝑦 3 20

  21. Pattern-based Relation Selection (2) • Positive and negative evidence scores – Positive score 𝑒 (L) 𝑦 3 , 𝑧 3 𝑜 : 𝑦 3 ,𝑧 3 + 𝛿 𝑄𝑇 𝑦 3 ,𝑧 3 = 𝛽 1 − + (1 − 𝛽) 𝑒 (L) 𝑦, 𝑧 max max 𝑜 : 𝑦,𝑧 + 𝛿 D,F ∈R ^ D,F ∈R ^ Confidence of model prediction Statistics of ”positive” patterns – Negative score 𝑜 C 𝑦 3 , 𝑧 3 + 𝛿 𝑂𝑇 𝑦 3 , 𝑧 3 = log (𝑜 C 𝑦 3 + 𝛿) Q (𝑜 C 𝑧 3 + 𝛿) • Relation selection via optimization (L) to generate 𝑉 N (L) – Target: select 𝑛 word pairs from 𝑉 M (L) ⊆ 𝑉 M (L) = 𝑛 L , 𝑉 N A A max 𝑄𝑇 𝑦 3 ,𝑧 3 s.t. 𝑂𝑇 𝑦 3 ,𝑧 3 < 𝜄, 𝑉 N (S) (S) D E ,F E ∈R f D E ,F E ∈R f 21

  22. Pattern-based Relation Selection (3) • Relation selection algorithm 22

  23. Outline • Introduction • Background and Related Work • Proposed Approach • Experiments • Conclusion 23

  24. Experimental Data • Text corpus – Text contents from Baidu Baike, 1.088B words – Train 100-dimensional word vectors using Skip-gram model • Is-a relation sets – Training: A subset of is-a relations derived from a Chinese taxonomy – Unlabeled: Entities and categories from Baidu Baike – Testing: publicly available labeled dataset (ACL’14) Unlabeled set statistics 24

  25. Model Performance • With pattern-based relation selection – The performance increases first and becomes relatively stable. – A few false positive pairs are still inevitably selected by our approach. • Without pattern-based relation selection – The performance drops quickly despite the improvement in the first few iterations. 25

  26. Comparative Study • Comparing with state-of-the-art Pattern-based Dictionay-based Distributed similarity-based Word embedding- based 26

Recommend


More recommend