Deep Text Mining of Instagram Data Without Strong Supervision WI 2018 Santiago | International Conference on Web intelligence Kim Hammar, Shatha Jaradat, Nima Dokoohaki, and Mihhail Matskin KTH Royal Institute of Technology kimham@kth.se December 4, 2018 Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 1 / 19
Key enabler for Deep Learning: Data growth Annual Size of the Global Datasphere. Source: IDC 150 Zettabytes 100 50 0 2009 2012 2015 2017 2020 2023 2026 Year Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 2 / 19
Key enabler for Deep Learning: Data growth Annual Size of the Global Datasphere. Source: IDC 150 Zettabytes 100 50 0 2009 2012 2015 2017 2020 2023 2026 Year But what about Labeled Data? Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 2 / 19
b 0 b 1 x 0 , 1 x 1 , 1 y ˆ x 0 , 2 x 1 , 2 Prediction x 0 , 3 x 1 , 3 Supervised learning: Iteratively Minimize The Loss Function: L (ˆ y , y ) Ground truth � Labeled Training Data is Still a Bottleneck Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 3 / 19
Research Problem: Clothing Prediction on Instagram Instagram Post Image Model Clothing Prediction dress = 0 coat = 1 . . . skirt = 0 Text Model b 0 b 1 x 0 , 1 x 1 , 1 y ˆ x 0 , 2 x 1 , 2 x 0 , 3 x 1 , 3 Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 4 / 19
This Paper: Text Classification Without Labeled Data Text Mining Analytics Mention of brand “foo” over time post 1 post 2 30 b 0 b 1 w 1 , 1 . . . w 1 ,n post 3 Mentions post n 20 . ... . x 0 , 1 x 1 , 1 . . . . ˆ y 10 x 0 , 2 x 1 , 2 w n, 1 . . . w n,n 0 x 0 , 3 x 1 , 3 04.2017 05.2017 06.2017 07.2017 08.2017 09.2017 10.2017 11.2017 12.2017 01.2018 02.2018 03.2018 Word Embeddings Neural Networks Trends detection User recommendations Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 5 / 19
Example Instagram Post Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 6 / 19
Challenge: Noisy Text and No Labels A case study of a corpora with 143 fashion accounts, 200K posts, 9M comments Challenge 1: Noisy Text with a Long-Tail Distribution Log-Log plot over the frequency of text per post 10 4 Comments Posts with Words 0 comments 10 3 Text Statistic Fraction of corpora size Average/post Log frequency Emojis 0 . 15 48 . 63 Posts with 0 words 10 2 Hashtags 0 . 03 9 . 14 (comments+caption+tags) User-handles 0 . 06 18 . 62 Google-OOV words 0 . 46 145 . 02 Aspell-OOV words 0 . 47 147 . 61 10 1 10 0 10 0 10 1 10 2 10 3 10 4 10 5 Log count Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 7 / 19
Challenge: Noisy Text and No Labels A case study of a corpora with 143 fashion accounts, 200K posts, 9M comments Challenge 1: Noisy Text with a Long-Tail Distribution Log-Log plot over the frequency of text per post 10 4 Comments Posts with Words 0 comments 10 3 Text Statistic Fraction of corpora size Average/post Log frequency Emojis 0 . 15 48 . 63 Posts with 0 words 10 2 Hashtags 0 . 03 9 . 14 (comments+caption+tags) User-handles 0 . 06 18 . 62 Google-OOV words 0 . 46 145 . 02 Aspell-OOV words 0 . 47 147 . 61 10 1 10 0 10 0 10 1 10 2 10 3 10 4 10 5 Log count Challenge 2: Lack of Expensive Labeled Training Data Raw Instagram Text Human Annotations � � Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 7 / 19
Alternative Sources of Supervision That Are Cheap but Weak Sources of Weak Supervision Strong supervision: Manual annotation by expert Domain Heuristics Weak supervision: A Strong supervision Database Combiner signal that does not have full APIs coverage/perfect accuracy Crowdworkers Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 8 / 19
Weak Supervision in the Fashion Domain Open APIs: 1 https://github.com/jolibrain/deepdetect Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 9 / 19
Weak Supervision in the Fashion Domain Open APIs: Pre-trained Clothing Classificiation Models: DeepDetect 1 1 https://github.com/jolibrain/deepdetect Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 9 / 19
Weak Supervision in the Fashion Domain Open APIs: Pre-trained Clothing Classificiation Models: DeepDetect 1 Text mining system based on a fashion ontology and word embeddings: Instagram Post p ∈ P Ontology O Word Rankings w 1 , 1 . . . w 1 , n Caption . . ... . . Happy Monday! Here is my outfit of . . Brands Ranked Noisy Labels � r the day #streetstyle #me #canada #goals w n , 1 . . . w n , n Items: � ( bag , 0 . 63 ) , Word Embeddings V #chic #denim ( jeans , 0 . 3 ) , ( top , 0 . 1 ) � Items Brands: Tags Edit-distance Linear � ( Gucci , 0 . 8 ) , ( Zalando , 0 . 3 ) � Zalando user1 user2 Combination Material: � ( Denim , 1 . 0 ) � Patterns tfidf ( w i , p , P ) . . Comments . term-score t ∈ I love the bag! Is it Gucci? Materials { caption , comment , #goals @username user-tag , hashtag } I #want the #baaag Wow! The #jeans You are suclh Styles ProBase an inspirationn, can you follow me back? 1 https://github.com/jolibrain/deepdetect Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 9 / 19
How To Combine Several Sources Of Weak Supervision? Simplest way to combine many weak signals: Majority Vote Recent research on combination of weak signals: Data Programming 2 2 Alexander J Ratner et al. “Data Programming: Creating Large Training Sets, Quickly”. In: Advances in Neural Information Processing Systems 29 . Ed. by D. D. Lee et al. Curran Associates, Inc., 2016, pp. 3567–3575. URL: http://papers.nips.cc/paper/6523-data-programming-creating-large-training-sets-quickly.pdf . Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 10 / 19
Model Weak Supervision With Generative Model Weak labels Combined labels w 1 , 1 . . . w 1 , n Generative Model w 1 unlabeled Labeling functions . ... . . . . . . . . π α,β (Λ , Y ) data λ 1 . . . λ n w n , 1 . . . w n , n w n Model weak supervision as labeling functions λ i λ i ( unlabeled data ) → label Learn Generative Model π α,β (Λ , Y ) over the labeling process. Based on conflicts between labeling functions assign the functions an estimated accuracy α i . Based on empirical coverage of labeling functions assign the functions a coverage β i . Given α and β for each labeling function, it can be used to combine labels into a single probabilistic label Give more weight to high-accuracy functions If there is a lot of disagreement → low probability label If all labeling functions agree → high probability label Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 11 / 19
Data Programming Intuition Low accuracy labeling functions High accuracy labeling functions � � “it is a coat” � “it is not a coat” � Probabilistic Label: 0 . 6 probability that it is a coat Majority Vote: 1 . 0 probability that it is not a coat Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 12 / 19
Extension of Data Programming to Multi-Label Classification Problem: Data programming only defined for binary classification in original paper To make it work for multi-class setting: model labeling function as λ i → k i ∈ { 0 , . . . , N } instead of λ i → k i ∈ {− 1 , 0 , 1 } . Idea 1 for multi-label: model labeling function as λ i → � k i = { v 0 , . . . , v n } ∧ v j ∈ {− 1 , 0 , 1 } Idea 2 for multi-label: learn a separate generative model for each class, and let each labeling function give binary output for each class λ i , j → k i , j ∈ {− 1 , 0 , 1 } . Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 13 / 19
Trained Generative Models: Labeling Functions’ Accuracy Differ Between Classes Predicted accuracy in generative model 1 . 0 Clarifai Deepomatic 0 . 8 DeepDetect Accuracy Google Cloud Vision SemCluster 0 . 6 KeywordSyntactic KeywordSemantic 0 . 4 accessories bags blouses coats dresses jackets jeans cardigans shoes skirts tights tops trousers Classes Figure: Multiple generative models can capture a different accuracy for labeling functions for different classes. Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 14 / 19
Putting Everything Together 1 Apply weak supervision to unlabeled data (open APIs, pre-trained models, domain heuristics etc.) Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 15 / 19
Putting Everything Together 1 Apply weak supervision to unlabeled data (open APIs, pre-trained models, domain heuristics etc.) 2 Combine labels using majority voting or generative modelling (data programming) Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 15 / 19
Putting Everything Together 1 Apply weak supervision to unlabeled data (open APIs, pre-trained models, domain heuristics etc.) 2 Combine labels using majority voting or generative modelling (data programming) 3 Use the combined labels for training a discriminative model using supevised machine learning. Kim Hammar (KTH) Text Mining in Social Media December 4, 2018 15 / 19
Recommend
More recommend