Few-Shot Learnin ing For Text xt Cla lassif ificatio ion Master’s Thesis by Shaour Haider First Referee : Prof. Dr. Benno Stein Second Referee : Prof. Dr. Volker Rodehorst 1
Overview • Introduction • Approaches And Results • Related Work • Future Work 2
Introduction What is is text xt cla lassif ification? • For given input: • a paragraph A • a fixed set of classes C = {c 1 , c 2 , … , c n } • Output: a predicted class c ∈ C Why Text Classification? 3
Introduction • Sentiment Analysis • Spam Detection • Topic Classification Image: Sentiment Analysis Image: Spam Detection Image: Topic Classification 4
Introduction Situation Few-Shot Learning • Limited data Few-shot learning aims to learning a classifier with limited amount of labeled examples (<10) Few-shot task 4-way 1-shot task Train Set Test Set Class: Paragraph When you apply styles, your headings change to match the new theme. Video: Video provides a powerful way to help you prove your point. Save time in Word with new buttons that show up where you need them. Document: You can also type a keyword to search online for the video that To change the way a picture fits in your document, click it and a button for layout best fits your document. options appears next to it. Themes: Themes and styles also help keep your document coordinated. When you work on a table, click where you want to add a row or a column, and Design: When you click Design and choose a new Theme, the pictures, charts, then click the plus sign. and SmartArt graphics change to match your new theme. 5
Introduction Terminologies Datasets Target Dataset: • Train Set t (f (few-shot tr training set) t) • Tes est t Set t ( ( tes estin ting set) t) Ba Base Dataset: Addit itional dataset t th that t is is dis isjoint to o tr train in and tes est t set t of of target dataset 6
Let's 's Im Implement Approaches And Results Bag of words Target Dataset Train Target Training Data Classifier Feature Extraction Loss & Update (Few) Target Fixed Weights Test Classifier Accuracy Target Testing Data Feature Extraction Target Assessment 7
Approaches And Results K=1 K=3 K=9 BOW 0.48 0.58 0.74 BOW 0.8 0.74 0.7 0.58 0.6 Baseline 0.48 0.5 Bag of words 0.4 0.3 0.2 0.1 0 K=1 K=3 K=9 8
Approaches And Results Problem wit ith the bag of words! • Overfitting • Vocabulary mismatch Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal. [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 1 1 0 0 2 0 1 0 0 0] While football continued to be played in various forms throughout Britain, its public schools (equivalent to private schools in other countries) are widely credited with four Sports key achievements in the creation of modern football codes. [0 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1 0 0 1 0 0 2 1 1 0 0 0 3 0 0 1 1 0 0 1 1 0 1 1 1 1 2 0 0 0 0 1 1 2 1 0 1 1 1] Baseball evolved from older bat-and-ball games already being played in England by the mid-18th century. [1 0 1 1 0 1 1 1 0 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0] [0 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1 0 0 1 0 0 2 1 1 0 0 0 3 0 0 1 1 0 0 1 1 0 1 1 1 1 2 0 0 0 0 1 1 2 1 0 1 1 1] 9
Better representations Approaches And Results Pre-Trained FastText Or Bert Model Target Dataset Train Target Training Data Model Feature Extraction Loss & Update (Few) Target Fixed Weights Test Classifier Accuracy Target Testing Data Feature Extraction Target Assessment 10
K=1 K=3 K=9 Approaches And Results BOW 0.48 0.58 0.74 FastText 0.66 (+ 0.18) 0.78 (+ 0.20) 0.84 (+ 0.10) Bert 0.73 (+ 0.25) 0.84 (+ 0.26) 0.89 (+ 0.15) BOW FastText Bert 1 0.89 0.9 0.84 0.84 0.78 0.8 0.74 0.73 Baseline 0.7 0.66 0.58 0.6 FastText & Bert 0.48 0.5 0.4 0.3 0.2 0.1 0 K=1 K=3 K=9 11
Approaches And Results Can we im improve any further? Image: Transfer Learning 12
Approaches And Results Approach: Transfer Learning Bag of words Base Dataset Pre-Training Data Feature Model Loss & Update (Many) Extraction Pre-Training Fixed Weights Target Training Train Feature Model Classifier Data Loss & Update Extraction Pre-Training Target (Few) Target Dataset Fixed Weights Fixed Weights Test Target Testing Feature Model Classifier Accuracy Data Extraction Pre-Training Target Assessment 13
Approaches And Results Model Standard 14
Results Approaches And Results Transfer Learning - Standard Model 0.8 BOW 0.71 0.7 Transfer K=1 K=3 K=9 0.6 0.52 0.49 Learning 0.5 BOW (Standard) 0.4 0.3 BOW 0.49 0.52 0.71 0.2 (+ 0.01) (- 0.06) (- 0.03) 0.1 0 K=1 K=3 K=9 15
Approaches And Results Pretrained FastText & Bert Model Base Dataset Pre-Training Data Feature Model Loss & Update (Many) Extraction Pre-Training Fixed Weights Target Training Train Feature Model Classifier Data Loss & Update Extraction Pre-Training Target (Few) Target Dataset Fixed Weights Fixed Weights Test Target Testing Feature Model Classifier Accuracy Data Extraction Pre-Training Target Assessment 16
Results Approaches And Results Transfer Learning - Standard Model 0.8 0.74 BOW 0.7 0.58 0.6 0.48 Transfer K=1 K=3 K=9 0.5 BOW Learning 0.4 (Standard) 0.3 0.2 BOW 0.49 0.52 0.71 0.1 (+ 0.01) (- 0.06) (- 0.03) 0 FastText 0.62 0.75 0.81 K=1 K=3 K=9 (- 0.04) (- 0.03) (- 0.03) 1 FastText Bert 0.88 0.84 0.81 Bert 0.73 0.84 0.88 0.75 0.8 0.73 ( 0.00) ( 0.00) ( -0.01) 0.62 0.6 FastText And Bert 0.4 0.2 0 17 K=1 K=3 K=9
Approaches And Results Model Modified 18
Results Approaches And Results Transfer Learning - Modified Model BOW FastText Bert 1 0.87 0.9 0.84 0.83 0.81 Transfer K=1 K=3 K=9 0.78 0.8 0.75 0.73 Learning 0.69 0.68 0.7 (Modified) 0.6 BOW 0.68 0.75 0.84 0.5 (+ 0.20) (+ 0.17) (+ 0.10) 0.4 FastText 0.69 0.78 0.83 (+ 0.03) ( 0.00) (- 0.01) 0.3 Bert 0.73 0.81 0.87 0.2 ( 0.00) (- 0.03) (- 0.02) 0.1 0 K=1 K=3 K=9 19
Complete Results Approaches And Results BOW- Baseline FastText- Baseline Bert- Baseline BOW- Standard Transfer Leaning FastText- Standard Transfer Leaning Bert- Standard Transfer Leaning BOW- Modified Transfer Learning FastText- Modified Transfer Learning Bert- Modified Transfer Learning 1 0.89 0.88 0.87 0.9 0.84 0.84 0.84 0.84 0.83 0.81 0.81 0.78 0.78 0.8 0.75 0.75 0.74 0.73 0.73 0.73 0.71 0.69 0.68 0.7 0.66 0.62 0.58 0.6 0.52 0.49 0.48 0.5 0.4 0.3 0.2 0.1 0 K=1 K=3 K=9 20
Approaches And Results Results Summary ry • An average improvement of 10-20% in the modified transfer learning using bow representations as compared to the baseline scores of the bow model. • A general increase in the accuracy with the increase in the size of training task. • No real improvements when fine-tuning the representations from both the advanced pre-trained models fasttext and bert. • Bow representation can be improved by pre-training on Wikipedia section heading classification task. 21
Related Work Few-shot learning approaches: • Metric Learning • Meta Learning 22
Related Work Metric Learning Relation Network Advances in few-shot learning Siamese
Related Work Meta Learning MAML
Future Work • Using other few-shot learning approaches such as meta learning and metric learning. • Increasing the dataset by not just limiting to the level 2 section heading- Would require having increased computation resources. • Using bert-large model instead of bert-base. • Finding peak accuracy score for bert model. • Testing the trained classifier on topic classification data other than Wikipedia. 25
Thank you 26
Additional Slides 42
Additional Slides 43
Related Work: Metric Learning • Siamese • Matching Networks Support Set Instances Distance Metric Neural Neural Network Network Query Set Instance Input 2 Input 1 44
Related Work: Metric Learning • Prototypical Networks & Relation Networks 45
Related Work: Meta Learning W Θ 46
Related Work: Transfer Learning • Baseline 47
Recommend
More recommend