A Dataset for Troll Classification of TamilMemes Shardul - PowerPoint PPT Presentation

A Dataset for Troll Classification of TamilMemes Shardul Suryawanshi, Bharathi Raja Chakravarthi, Pranav Varma, Mihael Arcan, John P. McCrae, Paul Buitelaar Data Science Institute, National University of Ireland Galway (shardul.suryawanshi@insight-centre.org) 1

Outline ● Troll Meme ● Challenges ● Dataset Annotation ● Experimental Setup ● Methodology ● Result ● Conclusion and Future work 2

Troll Meme ● Troll meme contains ○ offensive text and non-offensive images ○ offensive images with non-offensive text ○ sarcastically offensive text with non-offensive images ● It provokes, distracts, and has a digressive or off-topic content ● and intends to demean or offend an individual or a group. Translation: “If you buy one packet of air, then 5 chips free” 3

Challenges: Context ● Same image but different text Translation: “can not understand what you are saying” Translation: “I am confused” 4

Challenges: Data imbalances and Low Resource ● After collection, number of troll memes were more than not-troll memes ● Hence, added images from Flickr [1] dataset in not-troll category ● Due to lesser data, we used ImageNet weights for fine tuning Example from Flickr dataset [1] https://www.kaggle.com/hsankesara/flickr-image-dataset 5

Challenges: Emotional Toll on Annotators ● Voluntary annotators were onboarded ● To reduce the burden of annotation, annotators were allowed to leave at their will 6

Dataset Annotation ● Amongst several volunteers, only native Tamil speakers were selected ● Substantial agreement between annotators (Cohen’s kappa = 0.62) ● Data Statistics ○ Total memes: 2,969 ■ # troll: 1,951 ■ # not-troll: 1,018 7

Experimental Setup ● ResNet and MobileNet classifier trained on ○ Imbalanced dataset ■ TamilMemes ■ TamilMemes + ImageNet* ■ TamilMemes + ImageNet* + Flickr30k ○ Balanced dataset ■ TamilMemes + ImageNet* + Flickr1k (*pre-trained on ImageNet weights) 8

Methodology ● Benchmark results using convolutional neural network (CNN) for image classification. 9

Result: ResNet [2] variation TamilMemes TamilMemes + ImageNet Precision Recall F1-score Precision Recall F1-Score troll 0.37 0.33 0.35 0.36 0.35 0.35 not-troll 0.68 0.71 0.70 0.68 0.69 0.68 macro-avg 0.52 0.52 0.52 0.52 0.52 0.52 weighted-avg 0.58 0.58 0.58 0.57 0.57 0.57 TamilMemes + ImageNet + Flickr1k TamilMemes + ImageNet + Flickr30k troll 0.30 0.34 0.32 0.36 0.35 0.35 not-troll 0.64 0.59 0.62 0.68 0.69 0.68 macro-avg 0.47 0.47 0.47 0.52 0.52 0.52 weighted-avg 0.53 0.51 0.52 0.52 0.52 0.52 [2] He, Kaiming, et al. "Deep residual learning for image recognition. " Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. 10 10

Result: MobileNet [3] variation TamilMemes TamilMemes + ImageNet Precision Recall F1-score Precision Recall F1-Score troll 0.28 0.27 0.28 0.34 0.43 0.38 not-troll 0.64 0.66 0.65 0.67 0.58 0.62 macro-avg 0.46 0.46 0.46 0.50 0.51 0.50 weighted-avg 0.52 0.53 0.52 0.56 0.53 0.54 TamilMemes + ImageNet + Flickr1k TamilMemes + ImageNet + Flickr30k troll 0.33 0.55 0.41 0.31 0.34 0.33 not-troll 0.66 0.45 0.53 0.65 0.62 0.64 macro-avg 0.50 0.50 0.47 0.48 0.48 0.48 weighted-avg 0.55 0.48 0.49 0.54 0.53 0.53 [3] Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017). 11 11 11

Overall Results ● Macro averaged F1-score with or without data imbalance ranged from 0.47 to 0.58 ● Overall the precision for troll class identification lies in the range of 0.28 and 0.37 ● ResNet is not hampered by imbalanced settings ● MobileNet shows poor performance in imbalanced settings 12 12 12 12

Conclusion and Future Work • Image classifier does not give significant result • Text embedded on meme gives it meaning • This text is code-mixed with English • It is challenging to train classifier just on the basis of image • Rather same meme could be used in different context • We plan to use OCR technique to capture textual data and treat this problem in multimodal way 13

Thank you !! Questions? 14

A Dataset for Troll Classification of TamilMemes Shardul - PowerPoint PPT Presentation

A Dataset for Troll Classification of TamilMemes Shardul Suryawanshi, Bharathi Raja Chakravarthi, Pranav Varma, Mihael Arcan, John P. McCrae, Paul Buitelaar Data Science Institute, National University of Ireland Galway

1 | Core SMA Dataset Review 2020 Core SMA Dataset for TREAT-NMD affiliated Registries First

Graph Classification Classification Outline Introduction, Overview Classification using

Surprise Billing Surprise Billing Dataset Review Dataset Review October 9, October 9, 2019

The Problem I K G J E C H F A D B = dataset In dataset creation, if each step is

Mina Kwon 2020. 04. 09. vs vs Preference Gaze influence Fixation Choice A HIGH B LOW

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

The cluster Chronologically first in the development were two lexical repositories, TROLL in the

PATENT TROLL LEGISLATION How it could affect your IP portfolio Presented by John B. Scherling

PART OF THE CLIMATE SOLUTION By Kate Troll Alaska Common Ground Climate Forum November 14, 2015

DR. KAI TROLL BEST BUDDIES EUROPE, MIDDLE EAST, AFRICA REGION HISTORY EUNICE KENNEDY SHRIVER

troll batul How might you model a battle between two trolls? How might you model a battle between

Contr troll llable le Level el Blen endin ing be betw tween een Ga Games es us using

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

Library of Congress Classification: Module 1.3 1 Library of Congress Classification: Module 1.3

Chosen-Ciphertext Security from Subset Sum PKC 2016, 07.03.2016 Sebastian Faust 1 Daniel Masny 1

Functions (Alice In Action, Ch 3) 17 July 2013 Slides Credit: Joel Adams, Alice in Action

Patent Troll Terminology -Patent assertion entity (PAE) -Non-practicing entity (NPE)

Bad Actors in Social Media Francesca Spezzano Boise State University

The Number of Meanings of English Number Words Chris Kennedy University of Chicago University

NJIPLA Presentation JANUARY 24, 2013 by Anthony S. Volpe 30 South 17 th Street Philadelphia | Pa

Node.js at Cloudkick Paul Querna July 26, 2011 What is Cloudkick? Cloud Server Management

Contextual Identity: Freedom to be All Your Selves Monica Chew, Sid Stamm Mozilla

Sambuz

Useful Links

Newsletter

Mail Us