Integrating Semantic Knowledge to Tackle Zero-shot Text - PowerPoint PPT Presentation

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang*, Piyawat Lertvittayakumjorn* 1 , and Yike Guo Data Science Institute, Imperial College London, UK Email 1 : pl1515@imperial.ac.uk * Both authors contributed equally to this work The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2019). 1

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo Motivations • Insufficient or even unavailable training data of emerging classes is a big challenge in real-world text classification. • Zero-shot text classification – recognising text documents of classes that have never been seen in the learning stage • In this paper, we propose a two-phase framework together with data augmentation and feature augmentation to solve this problem. 2

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo Contents • Introduction to Zero-shot Text Classification • Our Proposed Framework • Experiments and Discussions • Conclusions and Future Work 3

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo Zero-shot Text Classification • Let 𝐷 𝑇 and 𝐷 𝑉 be disjoint sets of seen and unseen classes of the classification respectively. • In the learning stage, a training set { 𝑦 1 , 𝑧 1 , … , (𝑦 𝑜 , 𝑧 𝑜 )} is given where – 𝑦 𝑗 is the 𝑗 𝑢ℎ document containing a sequence of words [𝑥 1 𝑗 , 𝑥 2 𝑗 , … , 𝑥 𝑢 𝑗 ] – 𝑧 𝑗 ∈ 𝐷 𝑇 is the class of 𝑦 𝑗 • In the inference stage, the goal is to predict the class of each document, ෝ 𝑧 𝑗 , in a testing set – 𝑧 𝑗 comes from 𝐷 𝑇 ∪ 𝐷 𝑉 • Supportive semantic knowledge is needed to generally infer the features of unseen classes using patterns learned from seen classes. 4

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo Our Proposed Framework: Overview • We integrate four kinds of semantic knowledge into our framework: – Word embeddings – Class descriptions – Class hierarchy – General knowledge graph 5

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo Our Proposed Framework: Overview • Data augmentation technique helps the classifiers be aware of the existence of unseen classes without accessing their real data. • Feature augmentation provides additional information which relates the document and the unseen classes to generalise the zero-shot reasoning. 6

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo Phase 1: Coarse-grained Classification • Each seen class 𝑑 𝑡 has its own CNN text classifier to predict 𝑞(ෝ 𝑧 𝑗 = 𝑑 𝑡 |𝑦 𝑗 ) – The classifier is trained with all documents of its class in the training set as positive examples and the rest as negative examples. • For a test document 𝑦 𝑗 , this phase computes 𝑞( ෝ 𝑧 𝑗 = 𝑑 𝑡 |𝑦 𝑗 ) for every seen class 𝑑 𝑡 ∈ 𝐷 𝑇 . – If there exists a class 𝑑 𝑡 such that 𝑞 ෝ 𝑧 𝑗 = 𝑑 𝑡 𝑦 𝑗 > 𝜐 𝑡 , it predicts ෝ 𝑧 𝑗 ∈ 𝐷 𝑇 – Otherwise, ෝ 𝑧 𝑗 ∉ 𝐷 𝑇 . – 𝜐 𝑡 is a classification threshold for the class 𝑑 𝑡 , calculated based on the threshold adaptation method from (Shu et al., 2017) 7

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo Phase 1: Data Augmentation • We use the idea of “Topic translation” – translating an original document from a seen class into an augmented document of an unseen class. Animal Athlete Mitra perdulca is a species of sea Mira perdulca is a swimmer of snail a marine gastropod mollusk sailing sprinter an Olympian in the family Mitridae the miters or limpets gastropod in the basketball miter snails. Middy the miters or miter skater. • Using analogy questions, e.g., animal:species :: athlete:? → ? = swimmer – Solved by the 3CosMul method by Levy and Goldberg (2014) 8

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo Phase 2: Fine-grained Classification • The traditional classifier is a multi-class classifier ( |𝐷 𝑇 | classes) with a softmax 𝑗 as an input. output, so it requires only the word embeddings 𝑤 𝑥 • The zero-shot classifier is a binary classifier with a sigmoid output. It takes a text document 𝑦 𝑗 and a class 𝑑 as inputs and predicts the confidence 𝑞 ෝ 𝑧 𝑗 = 𝑑 𝑦 𝑗 . 9

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo Phase 2: Zero-shot Classifier • The zero-shot classifier predicts 𝑞 ෝ 𝑧 𝑗 = 𝑑 𝑦 𝑗 , 𝑗 , 𝑤 𝑑 – Input features: 𝑤 𝑥 – Augmented features: 𝑤 𝑥,𝑑 𝑗 • 𝑗 𝑤 𝑥 𝑘 ,𝑑 shows how the word 𝑥 𝑘 and the class 𝑑 are related considering the relations in a general knowledge graph – ConceptNet • This classifier is trained with a training data from seen classes only. 10

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo Phase 2: Feature Augmentation • Step 1: represent a class 𝑑 as three sets of nodes in ConceptNet – (1) the_class_nodes – (2) superclass_nodes – (3) description_nodes • If 𝑑 is the class “Educational Institution” – (1) educational_institution, educational, institution – (2) organization, agent – (3) place, people, ages, education. 𝑗 • Step 2: To construct 𝑤 𝑥 𝑘 ,𝑑 , we consider whether the word 𝑥 𝑘 is connected to the members of the three sets within 𝐿 hops. 11

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo Experiments • Datasets: – DBpedia ontology : 14 classes – 20newsgroups : 20 classes 12

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo An Experiment for Phase 1 • Compare with DOC – a state-of-the-art open-world text classification • For seen classes, our framework outperformed DOC on both datasets. • The augmented data improved the accuracy of detecting documents from unseen classes clearly and led to higher overall accuracy in every setting. 13

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo An Experiment for Phase 2 • 𝑗 Using [𝑤 𝑥 𝑘 ,𝑑 ] only could not find out the correct unseen class 𝑗 ; 𝑤 𝑥 𝑘 ,𝑑 𝑗 and neither [𝑤 𝑥 𝑘 ] and 𝑗 [𝑤 𝑑 ; 𝑤 𝑥 𝑘 ,𝑑 ] could do. 𝑗 ; 𝑤 𝑑 ] increased the • [𝑤 𝑥 𝑘 accuracy of predicting unseen classes clearly 𝑗 ; 𝑤 𝑑 ; 𝑤 𝑥 𝑘 ,𝑑 • 𝑗 [𝑤 𝑥 𝑘 ] achieved the highest accuracy in all settings. 14

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo An Experiment for the Whole Framework 15

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo Conclusions • To tackle zero-shot text classification, we proposed a novel CNN-based two- phase framework together with data augmentation and feature augmentation. • The experiments show that – data augmentation improved the accuracy in detecting instances from unseen classes – feature augmentation enabled knowledge transfer from seen to unseen classes – our work achieved the highest overall accuracy compared with all the baselines and recent approaches in all settings. • Possible future works: – multi-label classification with a larger amount of data – utilise semantic units defined by linguists in the zero-shot scenario 16

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo Thank you --------------------------------- Q&A Jingqing Zhang*, Piyawat Lertvittayakumjorn* 1 , and Yike Guo Data Science Institute, Imperial College London, UK Email 1 : pl1515@imperial.ac.uk 17

Integrating Semantic Knowledge to Tackle Zero-shot Text - PowerPoint PPT Presentation

Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn, and Yike Guo Integrating Semantic Knowledge to Tackle Zero-shot Text Classification Jingqing Zhang, Piyawat Lertvittayakumjorn 1

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Zero-knowledge Arguments Proving circuit satisfaibility in zero-knowledge Zero-knowledge In

Zero-Shot Learning for Word Translation: Successes and Failures Ndapa Nakashole, University of

SHOT Brand Price NOTES WEST COAST MAGNUM SIZES 4 - 9 $ 39.20 Eagle shot prices may not be

Semantic Spaces for Zero-Shot Behaviour Analysis Xun Xu Computer Vision and Interactive Media

References Zero Knowledge Proofs on Wikipedia, Zero Knowledge

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

One-Shot Verifiable Encryption from Lattices Vadim Lyubashevsky and Gregory Neven IBM Research

Outline 1 Zero-Knowledge MTAT.07.005 Cryptographic Protocols First Lecture: Main Notions Second

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Interactive proof and zero knowledge protocols Zero-knowledge: definition Probabilistic

Zero-Knowledge Proofs I Lelantus Oct. 16, 2019 Overview Zero-Knowledge Proving a

Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions Jimmy Lei Ba,

Federated Zero-Shot Learning: A Proposal Francesco Odierna CS PhD student @ University of Pisa

Zero Waste at The Nat Zero Waste Zero Waste Zero Waste is a philosophy that encourages the

Coordinator 1.Advice and Advocacy Outreach 2. Youth Engagement 3. Family Engagement 4.

Wave and Tidal Energy Richard Gorman National Institute of Water and Atmospheric Research

Wave and Tidal - SgurrEnergy David OHare 8 June 2016 About SgurrEnergy Leading

Predators on Community Structure and Dynamics Levi S. Lewis Scripps Institution of Oceanography

Solent Maritime SAC Condition Assessment and improving water quality in the Solent Sue Burton,

Health and Safety Executive, Port of London Authority and Medway Ports that met annually but

This file has been cleaned of potential threats. If you confirm that the file is coming from a

CHANDERPUR GROUP Head Office