Sharing is Caring in the Land of The Long Tail Samy Bengio Real - PowerPoint PPT Presentation

Sharing is Caring in the Land of The Long Tail Samy Bengio

Real life setting “Real problems rarely come packaged as 1M images uniformly belonging to a set of 1000 classes…” 2

The long tail • Well known phenomena where a small number of generic objects/entities/words appear very often and most others appear more rarely. • Also knows as Zipf or Power law, or Pareto distribution. • The web is littered by this kind of distributions: • the frequency of each unique query on search engines, • the occurrences of each unique word in text documents, • etc. 3

Example of a long tail Frequency of words in Wikipedia 1e+09 1e+08 1e+07 1e+06 Frequency 100000 10000 1000 100 10 the anyways trickiest h-plane Words 4

Representation sharing • How do we design a classifier or a ranker when data follows a long tail distribution? • If we train one model per class, it is hard for poor classes to be well trained. • How come we humans are able to recognize objects we have seen only once or even never? • Most likely answer: representation sharing: all class models share/learn a joint representation. • Poor classes can then benefit from knowledge learned from semantically similar but richer classes. • Extreme case: zero-shot setting! 5

Outline In this talk, I will cover the following ideas: • Wsabie: a joint embedding space of images and labels • The many facets of text embeddings • Zero-shot setting through embeddings • Incorporate Knowledge Graph constraints • Use of a language model I will NOT cover the following important issues: • Prediction time issues for extreme classification • Memory issues 6

Wsabie Learn to embed images & labels to optimize top-ranked items. Labels Obama Eiffel Tower Shark Dolphin Lion ... 100-dimensional embedding space Wsabie: J. Weston et al, ECML 2010, IJCAI 2011 7

Wsabie: summary sim(i,x) = <Wi,Vx> W V 000100 real values Label i Image x Triplet Loss: sim( , dolphin) > sim( , obama) + 1 Trained by stochastic gradient descent and smart sampling of negative examples 8

Wsabie: experiments - results Method ImageNet 2010 Web prec@1 prec@10 prec@1 prec@10 approx 1.55% 0.41% 0.30% 0.34% kNN One-vs- 2.27% 1.02% 0.52% 0.29% Rest 4.03% 1.48% 1.03% 0.44% Wsabie Ensemble of 10 10.03% 3.02% Wsabies ImageNet 2010: 16000 labels and 4M images Web: 109000 labels and 16M images 9

Wsabie: embeddings Label Nearest Neighbors barack obama barak obama, obama, barack, barrack obama, bow wow david beckham beckham, david beckam, alessandro del piero, del piero santa santa claus, papa noel, pere noel, santa clause, joyeux noel dolphin delphin, dauphin, whale, delfin, delfini, baleine, blue whale cows cattle, shire, dairy cows, kuh, horse, cow, shire horse, kone rose rosen, hibiscus, rose flower, rosa, roze, pink rose, red rose eiffel tower eiffel, tour eiffel, la tour eiffel, big ben, paris, blue mosque ipod i pod, ipod nano, apple ipod, ipod apple, new ipod f18 f 18, eurofighter, f14, fighter jet, tomcat, mig 21, f 16 10

Wsabie: annotations delfini, orca, dolphin , mar, delfin, dauphin, whale, cancun, killer whale, sea world blue whale, whale shark, great white shark, underwater, white shark, shark, manta ray, dolphin , requin, blue shark, diving barrack obama, barak obama, barack hussein obama, barack obama , james marsden, jay z, obama, nelly, falco, barack eiffel, paris by night, la tour eiffel, tour eiffel, eiffel tower , las vegas strip, eifel, tokyo tower, eifel tower 11

“Why not an embedding of text only?”

Skip-Gram (Word2Vec) Learn dense embedding vectors from an unannotated text corpus, e.g. Wikipedia wing chair Obama W tuna E E bull shark “an exceptionally large male tiger shark can grow up to” tiger shark http://code.google.com/p/word2vec Tomas Mikolov, Kai Chen, Greg Corrado, Je ff Dean (ICLR 2013) 13

Skip-Gram Wikipedia t-SNE visualization of ImageNet Skip-gram trained on Wikipedia, labels 155K terms tiger shark car bull shark cars blacktip shark muscle car shark sports car oceanic whitetip shark compact car sandbar shark autocar dusky shark automobile blue shark pickup truck requiem shark racing car great white shark passenger car reptiles lemon shark dealership birds insects food musical instruments clothing dogs aquatic life animals transportation 14

Embeddings are powerful hot Berlin Germany big Rome hotter Italy bigger E( Rome ) - E( Italy ) + E( Germany ) ≈ E( Berlin ) E( hotter ) - E( hot ) + E( big ) ≈ E( bigger ) 15

Let’s go back to images!

Deep convolutional models for images But what about the long tail of classes? Layer 7 What about using our ... semantic embeddings Layer 1 for that? Input 17

ConSE: Convex Combination of Semantic Embeddings from Skip-Gram for instance: s (y) = embedding position of y X f ( x ) = p ( y i | x ) s ( y i ) i f ( x ) = p (Lion | x ) s (Lion)+ p (Apple | x ) s (Apple)+ p (Orange | x ) s (Orange)+ p (Tiger | x ) s (Tiger)+ p (Bear | x ) s (Bear) Do a nearest neighbor search around f ( x ) to find the corresponding label 19

ConSE(T): Convex Combination of Semantic Embeddings In practice, consider the average of only a few labels: top ( T ) = { i | p ( y i | x ) is among top T probabilities } f ( x ) = 1 X p ( y i | x ) s ( y i ) Z i ∈ top ( T ) 20

ConSE(T): experiments on ImageNet • Model trained with 1.2M 3-hops ILSVRC 2012 images from 1,000 classes 2-hops • Evaluated on images from same classes. Training • Results are measured as hit@ k.

ConSe(T) experiments 22

Knowledge Graph 23

Multiclass Classifiers Softmax GoogleLeNet model Logistic 24

Object labels have rich relations Exclusion Hierarchical Dog Dog Cat Cat Corgi Puppy Corgi Puppy Overlap 25

Visual Model + Knowledge Graph Dog 0.9 Visual Corgi 0.8 Knowledge Joint Model Graph Puppy 0.9 Inference Cat 0.1 Hierarchy and Exclusion (HEX) Graph Exclusion Hierarchical Dog Cat [Deng et al, ECCV 2014] Corgi Puppy 26

HEX Classification Model x ∈ R n y ∈ {0,1} n Input scores Binary Label vector 1 ∏ ψ i , j ( y i , y j ) Pr( y | x ) = ∏ φ i ( x i , y i ) Z ( x ) i , j i If violates constraints if y i = 1 sigmoid ( x i ) 0 φ i ( x i , y i ) = ψ i , j ( y i , y j ) = if y i = 0 1 − sigmoid ( x i ) 1 Otherwise Unary: same as logistic regression Pairwise: set illegal configuration to zero � All illegal configurations have probability zero. 27

Exp: Learning with weak labels • ILSVRC 2012: “relabel” or “weaken” a portion of fine-grained leaf labels to basic level labels. • Evaluate on fine-grained recognition Animal Animal Animal Relabel Dog Dog Dog … … … Corgi Husky Corgi Husky Corgi Husky Training Test Original ILSVRC 2012   (“weakened” labels) (leaf labels) 28

Exp: Learning with weak labels • ILSVRC 2012: “relabel” or “weaken” a portion of fine-grained leaf labels to basic level labels. • Evaluate on fine-grained recognition. • Consistently outperforms baselines. Top 1 accuracy (top 5 accuracy) 29

What about textual descriptions? • We have considered the long tail of objets. • What about more complex descriptions, involving multi-word descriptions, or captions? • We can use language models to help. 30

Neural Image Caption Generator [Vinyals et al, CVPR 2015] Vision Language 1. Two pizzas sitting on top of Deep CNN Generating a stove top RNN oven. 2. A pizza sitting on top of a pan on top of a stove. A group of people Language ! Vision ! shopping at an Generating ! Deep CNN outdoor market. RNN ! There are many vegetables at the fruit stand. 31

NIC: objective • Let I be an image (pixels). • Let S be the corresponding sentence (sequence of words). • Likelihood of producing the right sentence given the image: N X log p ( S | I ) = log p ( S t | I, S 0 , . . . , S t − 1 ) t =0 • We maximize the likelihood of producing the right sentence given the image: θ ? = arg max X log p ( S | I ; θ ) ✓ ( I,S ) 32

NIC: model P(word 1) P(word 2) P(<end>) Embedding word 1 word N Recurrent Image Convolution Neural Net Neural Net 33

Examples A person riding a A skateboarder does a trick A dog is jumping to catch a Two dogs play in the grass. motorcycle on a dirt road. on a ramp. frisbee. A refrigerator filled with lots of A group of young people Two hockey players are A little girl in a pink hat is food and drinks. playing a game of frisbee. fighting over the puck. blowing bubbles. A herd of elephants walking A close up of a cat laying A yellow school bus parked A red motorcycle parked on the across a dry grass field. on a couch. side of the road. in a parking lot. Describes without errors Describes with minor errors Somewhat related to the image Unrelated to the image 34

It doesn’t always work… Human: A blue and black dress ... No! I see white and gold! Our model: A close up of a vase with flowers.

Scheduled Sampling [NIPS 2015] 36

Scheduled Sampling 37

Sharing is Caring in the Land of The Long Tail Samy Bengio Real - PowerPoint PPT Presentation

Sharing is Caring in the Land of The Long Tail Samy Bengio Real life setting Real problems rarely come packaged as 1M images uniformly belonging to a set of 1000 classes 2 The long tail Well known phenomena where a small number

The Long Tail as a Pow g er Curve 120 100 80 60 40 40 20 0 1 1 11 11 21 21 31 31

Caring For the Plants/ Caring For the Land: Indigenous Peoples knowledge and use of plants

Caring Together & Caring Together & Getting it Right for Young Carers Getting it Right

Day 3 Long Tail SEO Google Analytics How Google Analytics can help with our Long Tail

Sharing [Kotlin code across platforms] is caring! Eugenio Marletti @workingkills

We Value: Farming sustainably Caring for our local community Sharing the highest

the USDA Forest Service # WorkForNature Caring for the land and serving people Land Management

Sharing is Caring Capitalization Do Not Capitalize Do Capitalize Civil Rights movement

Tail Recursion FlashBack goToWall Iteration A tail recursive solution: public void

Race Condition Shared Data: 4 5 6 1 8 5 6 20 9 ? Synchronization and Deadlocks tail

TXTing 101: Finding Security Issues in the Long Tail of DNS TXT Records O. van der Toorn 1 R. van

The Long Tail(s) of the Law: An exploratory study Graham Greenleaf, Philip Chung & Andrew

IAIR T TDS V VI: D Deali ling wit ith Long T Tail il Cla laim ims October 12, 2018

Based Care UPMC St. Margaret Caring Theories What is caring? What does it look like in

Building Caring - Sharing March 29, 2020 What a Friend 526 We Have in Jesus WORDS: Joseph

Non-Cumulation Clauses and Long-Tail Claims in CGL Policies: Latest Developments Allocating

Accept the Risk and Continue: Measuring the Long Tail of Government https Adoption Sudheesh

Accept the Risk and Continue: Measuring the Long Tail of Government https Adoption Sudheesh

Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders Terra

Economics of Information Storage: The Value in Storing the Long Tail James Hughes 1975 History

Race Condition Shared Data: 5 6 4 1 8 5 6 20 9 ? InterProcess Communication tail A[]

Accept the Risk and Continue: Measuring the Long Tail of Government https Adoption Sudheesh

Bringing Long-Tail Microscopy & Characterisation Data into the Light RAiD service

Catalog Classification at the Long Tail using IR and ML Neel Sundaresan (nsundaresan@ebay.com)

Sharing is Caring in the Land of The Long Tail Samy Bengio Real - PowerPoint PPT Presentation

Sharing is Caring in the Land of The Long Tail Samy Bengio Real life setting Real problems rarely come packaged as 1M images uniformly belonging to a set of 1000 classes 2 The long tail Well known phenomena where a small number

The Long Tail as a Pow g er Curve 120 100 80 60 40 40 20 0 1 1 11 11 21 21 31 31

Caring For the Plants/ Caring For the Land: Indigenous Peoples knowledge and use of plants

Caring Together &amp; Caring Together &amp; Getting it Right for Young Carers Getting it Right

Day 3 Long Tail SEO Google Analytics How Google Analytics can help with our Long Tail

Sharing [Kotlin code across platforms] is caring! Eugenio Marletti @workingkills

We Value: Farming sustainably Caring for our local community Sharing the highest

the USDA Forest Service # WorkForNature Caring for the land and serving people Land Management

Sharing is Caring Capitalization Do Not Capitalize Do Capitalize Civil Rights movement

Tail Recursion FlashBack goToWall Iteration A tail recursive solution: public void

Race Condition Shared Data: 4 5 6 1 8 5 6 20 9 ? Synchronization and Deadlocks tail

TXTing 101: Finding Security Issues in the Long Tail of DNS TXT Records O. van der Toorn 1 R. van

The Long Tail(s) of the Law: An exploratory study Graham Greenleaf, Philip Chung &amp; Andrew

IAIR T TDS V VI: D Deali ling wit ith Long T Tail il Cla laim ims October 12, 2018

Based Care UPMC St. Margaret Caring Theories What is caring? What does it look like in

Building Caring - Sharing March 29, 2020 What a Friend 526 We Have in Jesus WORDS: Joseph

Non-Cumulation Clauses and Long-Tail Claims in CGL Policies: Latest Developments Allocating

Accept the Risk and Continue: Measuring the Long Tail of Government https Adoption Sudheesh

Accept the Risk and Continue: Measuring the Long Tail of Government https Adoption Sudheesh

Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders Terra

Economics of Information Storage: The Value in Storing the Long Tail James Hughes 1975 History

Race Condition Shared Data: 5 6 4 1 8 5 6 20 9 ? InterProcess Communication tail A[]

Accept the Risk and Continue: Measuring the Long Tail of Government https Adoption Sudheesh

Bringing Long-Tail Microscopy &amp; Characterisation Data into the Light RAiD service

Catalog Classification at the Long Tail using IR and ML Neel Sundaresan (nsundaresan@ebay.com)

Caring Together & Caring Together & Getting it Right for Young Carers Getting it Right

The Long Tail(s) of the Law: An exploratory study Graham Greenleaf, Philip Chung & Andrew

Bringing Long-Tail Microscopy & Characterisation Data into the Light RAiD service