Chinese Hypernym-Hyponym Extraction from User Generated Categories Chengyu Wang, Xiaofeng He School of Computer Science and Software Engineering, East China Normal University Shanghai, China
Outline • Introduction • Background and Related Work • Proposed Approach • Experiments • Conclusion 2
Chinese Is-A Relation Extraction • Chinese is-a relation extraction – Chinese is-a relations are essential to construct large-scale Chinese taxonomies and knowledge graphs. – It is difficult to extract such relations due to the flexibility of language expression. • User generated categories – User generated categories are valuable knowledge sources, providing fine- grained candidate hypernyms of entities. – The semantic relations between an entity and its categories are not clear. 3
Baidu Baike: one of the largest online encyclopedias in China, with 13M+ entries Barack Obama Categories: Political figure, Foreign country, Leader, Person 4
The task : distinguishing is-a Barack Obama and not-is-a relations between Chinese words/phases Is-a Is-a Not- Is-a is-a Categories: Political figure, Foreign country, Leader, Person 5
Outline • Introduction • Background and Related Work • Proposed Approach • Experiments • Conclusion 6
Background • Taxonomy: a hierarchical type system for knowledge graphs, consisting of is-a relations among classes and entities – Example Entitiy Classes Person Country Political Leader Scientist Developed Country Entities 7
Describing the Task • Learning is-a relations for taxonomy expansion Entitiy Entitiy Person Country Learning Person Country Political Leader Scientist Developed Country Algorithm Political Leader Scientist Developed Country Key challenge : identify is-a relations from user generated categories 8
Modeling the Task • Taxonomy – Direct acyclic graph 𝐻 = (𝐹, 𝑆) ( 𝐹 : entities/classes, 𝑆 : is-a relations) • User generated categories – Collection of entities 𝐹 ∗ – Set of user generated categories: 𝐷𝑏𝑢 𝑓 for 𝑓 ∈ 𝐹 ∗ • Goal – Predict whether there is an is-a relation between 𝑓 and 𝑑 where 𝑓 ∈ 𝐹 ∗ and 𝑑 ∈ 𝐷𝑏𝑢 𝑓 based on the taxonomy 𝐻 9
Previous Approaches • Pattern matching-based approaches – Handcraft patterns: high accuracy, low coverage • Hearst Patterns: NP 1 such as NP 2 – Automatic generated patterns: higher coverage, lower accuracy – Not suitable for Chinese with flexible expression • Thesauri and encyclopedia based approaches – Taxonomy construction based on existing knowledge sources • YAGO: Wikipedia + WordNet • More precise but have limited scope constrained by sources – Chinese: relatively low-resourced • No Chinese version of WordNet and Freebase available 10
Previous Approaches • Text inference based approach – Infer relations using distributed similarity measures • Assumption: a hyponym can only appear in some of the contexts of its hypernym and a hypernym can appear in all contexts of its hyponyms – Not suitable for Chinese with flexible and sparse contexts • Word embedding based approach – Represent words as dense, low-dimensional vectors – Learn semantic projection models from hyponyms to hypernyms – State-of-the-art approach for Chinese is-a relation extraction (ACL’14) 11 Figures taken from Mikolov et al., 2013
Learning from Previous Work • Lessons learned from “state-of-the art” – Use word embeddings to represent words – Learn relations between hyponyms and hypernyms in the embedding space • Basic approaches – Vector offsets – Linear projection 12 Figures taken from Mikolov et al., 2013
Observations • Word vector offsets between Chinese is-a pairs – Multiple linguistic regularities may exist in is-a pairs • Different levels of hypernyms • Different types of is-a relations (instanceOf vs. subClassOf) • Different domains 13
Outline • Introduction • Background and Related Work • Proposed Approach • Experiments • Conclusion 14
General Framework • Initial stage – Train piecewise linear projection models based on the Chinese taxonomy • Iterative learning stage – Extract new is-a relations and adjust model parameters based on an incremental learning approach – Use Chinese Hypernym/Hyponym patterns to prevent “semantic drift” in each iteration 15
Initial Model Training • Linear projection model – Projection model: 𝑁𝑤 ⃗ 𝑦 3 + 𝑐 = 𝑤 ⃗ 𝑧 3 Projection matrix Word vector Offset vector • Piecewise linear projection model – Partition a collection of is-a relations 𝑆 7 ⊂ 𝑆 ∗ into 𝐿 clusters ( 𝐷 : ,⋯ ,𝐷 < ,⋯ ,𝐷 = ) – Each cluster 𝐷 < share projection matrix 𝑁 < and offset vector 𝑐 < – Optimization function: 1 C 𝐾 𝑁 < ,𝑐 < ; 𝐷 < = A 𝑁 < 𝑤 ⃗ 𝑦 3 + 𝑐 < − 𝑤 ⃗ 𝑧 3 𝐷 < (D E ,F E )∈G H 16
Iterative Learning (1) • Initialization – Word pairs: positive is-a set 𝑆 7 , unlabeled set 𝑉 – Model parameters: 𝑁 < and 𝑐 < for each cluster • Iterative process ( 𝑢 = 1, ⋯ , 𝑈 ) Sample δ 𝑉 word pairs from 𝑉 , denoted as 𝑉 (L) . 1. 2. Use the model to predict the relation between words. Denote “positive” (L) . word pairs as 𝑉 M (L) Use pattern-based relation selection method to select a subset of 𝑉 M 3. (L) . which have high confidence, denoted as 𝑉 N (L) from 𝑉 and add it to 𝑆 7 . Remove 𝑉 N 4. 17
Iterative Learning (2) • Iterative process ( 𝑢 = 1, ⋯ , 𝑈 ) (L) . Update cluster centroids incrementally based on 𝑉 N 5. 1 (LN:) = 𝑑 (L) + 𝜇 Q (L) A 𝑑 ⃑ < ⃑ < 𝑤 ⃑ 𝑦 3 − 𝑤 ⃑ 𝑧 3 − 𝑑 ⃑ < (L) 𝑉 < (S) (D E ,F E )∈R H New centroid Distance from centroid Old centroid Learning rate of centroid shift 6. Update model parameters based on new cluster assignments. 1 C (L) − 𝑤 (L) ,𝑐 < (L) ; 𝐷 < (L) (L) 𝑤 A 𝐾 𝑁 < = 𝑁 < ⃗ 𝑦 3 + 𝑐 < ⃗ 𝑧 3 (L) 𝐷 < (S) (D E ,F E )∈G H 18
Iterative Learning (3) • Model prediction – The prediction of the final piecewise linear projection models – The transitivity closure of existing is-a relations • Discussion – Combination of semantic and lexical extraction of is-a relations • Sematic level: word embedding based projection models • Lexical level: pattern-based relation selection – Incremental learning • Update of cluster centroids • Update of model parameters 19
Pattern-based Relation Selection (1) • Two observations Examples of Chinese – Positive evidence Hypernym/Hyponym Patterns • Is-A patterns Category Example • Such-As patterns Is-A 𝑦 3 是一个 𝑧 (between 𝑦 3 /𝑦 V and 𝑧 ) 𝑦 3 is a kind of 𝑧 Hypothesis : 𝑦 3 /𝑦 V is-a 𝑧 𝑧 ,例如 𝑦 3 、 𝑦 V Such-As – Negative evidence 𝑧 , such as 𝑦 3 and 𝑦 V • Such-As patterns 𝑦 3 、 𝑦 V 等 Co-Hyponym (between 𝑦 3 and 𝑦 V ) 𝑦 3 , 𝑦 V and others • Co-Hyponym patterns Hypothesis : 𝑦 3 not-is-a 𝑦 V 𝑦 V not-is-a 𝑦 3 20
Pattern-based Relation Selection (2) • Positive and negative evidence scores – Positive score 𝑒 (L) 𝑦 3 , 𝑧 3 𝑜 : 𝑦 3 ,𝑧 3 + 𝛿 𝑄𝑇 𝑦 3 ,𝑧 3 = 𝛽 1 − + (1 − 𝛽) 𝑒 (L) 𝑦, 𝑧 max max 𝑜 : 𝑦,𝑧 + 𝛿 D,F ∈R ^ D,F ∈R ^ Confidence of model prediction Statistics of ”positive” patterns – Negative score 𝑜 C 𝑦 3 , 𝑧 3 + 𝛿 𝑂𝑇 𝑦 3 , 𝑧 3 = log (𝑜 C 𝑦 3 + 𝛿) Q (𝑜 C 𝑧 3 + 𝛿) • Relation selection via optimization (L) to generate 𝑉 N (L) – Target: select 𝑛 word pairs from 𝑉 M (L) ⊆ 𝑉 M (L) = 𝑛 L , 𝑉 N A A max 𝑄𝑇 𝑦 3 ,𝑧 3 s.t. 𝑂𝑇 𝑦 3 ,𝑧 3 < 𝜄, 𝑉 N (S) (S) D E ,F E ∈R f D E ,F E ∈R f 21
Pattern-based Relation Selection (3) • Relation selection algorithm 22
Outline • Introduction • Background and Related Work • Proposed Approach • Experiments • Conclusion 23
Experimental Data • Text corpus – Text contents from Baidu Baike, 1.088B words – Train 100-dimensional word vectors using Skip-gram model • Is-a relation sets – Training: A subset of is-a relations derived from a Chinese taxonomy – Unlabeled: Entities and categories from Baidu Baike – Testing: publicly available labeled dataset (ACL’14) Unlabeled set statistics 24
Model Performance • With pattern-based relation selection – The performance increases first and becomes relatively stable. – A few false positive pairs are still inevitably selected by our approach. • Without pattern-based relation selection – The performance drops quickly despite the improvement in the first few iterations. 25
Comparative Study • Comparing with state-of-the-art Pattern-based Dictionay-based Distributed similarity-based Word embedding- based 26
Recommend
More recommend